Description

5/5 - (4 votes)

Question 1. Assume we have a simplified version of the Animal Classification dataset1 which
includes properties of animals as descriptive features and the animal species as target feature. In
our dataset, the animals are classified as being Mammals or Reptiles based on whether they are
toothed and have legs, as shown in Table 1. In this question, you are asked to develop a decision
tree based on this simplified dataset.
Table 1: Animal Classification dataset
Instance Toothed Legs Species
1 T T Mammal
2 T T Mammal
3 T F Reptile
4 F T Mammal
5 T T Mammal
6 T T Mammal
7 T F Reptile
8 T F Reptile
9 T T Mammal
10 F T Reptile
(a) Calculate the resulting Gini index when splitting on the attribute “Toothed” and “Legs”,
respectively (i.e. 𝐺𝑖𝑛𝑖𝑠𝑝𝑙𝑖𝑡”𝑇𝑜𝑜𝑡ℎ𝑒𝑑” and 𝐺𝑖𝑛𝑖𝑠𝑝𝑙𝑖𝑡”𝐿𝑒𝑔𝑠”
). Show your calculation details. Which
attribute would be chosen as the first splitting attribute? (10 pts)
(b) Based on the decision in Question 1.a, draw a two-level decision tree if needed using both
attributes for splitting. Mark the class label in each leaf node. In case of a tie on the “Mammal”
and “Reptile” instances in a leaf node, mark the node as “-”. (4 pts)
(c) WEKA Tool Practice. Use the WEKA tool to classify the data with decision tree (J48) under
the test option “Use training set”. Copy the result in ‘classifier output’ window to your
assignment. (6 pts)

1
the UCI Zoo Dataset
2
Question 2. In class, we learn how to solve the sparse recovery problem:
min |𝑥1
| + |𝑥2
| + |𝑥3
| + |𝑥4
| + |𝑥5
|
𝑠.𝑡. [
0 −1 0
−2 1 0
0 1 0
−1 1 0
2 0 −1
0 0 1
0 0 1 0 − 1 2
]
[

𝑥1
𝑥2
𝑥3
𝑥4
𝑥5
𝑥6]

= [
2
1
1
−3
]
𝐴 𝑥 𝑏
We find a solution 𝑥 = (0,1,0,0,3,0) with four entries being zero. Now, instead of finding a
solution 𝑥 to 𝐴𝑥 = 𝑏 with as many zero entries as possible, we want to find a solution to 𝐴𝑥 = 𝑏
that minimizes the first two entries. This motivates the following optimization formulation:
min |𝑥1
| + |𝑥2
|
𝑠.𝑡. [
0 −1 0
−2 1 0
0 1 0
−1 1 0
2 0 −1
0 0 1
0 0 1 0 − 1 2
]
[

𝑥1
𝑥2
𝑥3
𝑥4
𝑥5
𝑥6]

= [
2
1
1
−3
]
Please use “cvxpy” to solve the above sparse recovery problem. (10 pts: 8pts for code, 2pts for
answer)
Question 3. In lecture we have learned two ideas to tackle the problem of background extraction.
These two ideas lead to an optimization formulation as follows:
(a) We have three figures extracted from one video, i.e., m = 3. The three figures are denoted
as M1, M2 and M3, with the same size of 130*160, i.e., n = 130*160. Part of the cvxpy
code is shown as below. Can you modify the code to obtain an implementation of the above
formulation? You can find the figures in attachment. (10 pts)
3
(b) Please run your code for (S1) with your own figure(s) and examine the result. How is the
background extraction? Please attach the figures you use and the results you obtain. (bonus:
10 pts)
Cvxpy Code for Background-Extraction Formulation (S)
import numpy as np
import cv2
import cvxpy as cp
from cvxpy import *
import matplotlib.pyplot as plt
im1 = cv2.imread(‘/content/Figure1.png’,cv2.IMREAD_GRAYSCALE)
im2 = cv2.imread(‘/content/Figure2.png’,cv2.IMREAD_GRAYSCALE)
im3 = cv2.imread(‘/content/Figure3.png’,cv2.IMREAD_GRAYSCALE)
M_size = im1.shape
size_a = M_size[0]
size_b = M_size[1]
n = size_a*size_b
M1 = im1.reshape(n,-1)
M2 = im2.reshape(n,-1)
M3 = im3.reshape(n,-1)
w = cp.Variable((n,1))
# Please trying to implementing you code here:
###############################################
plt.figure(figsize=(6,6))
plt.imshow((M1 – w.value).reshape(size_a, size_b), cmap=’gray’)
plt.figure(figsize=(6,6))
4
plt.imshow((M2 – w.value).reshape(size_a, size_b), cmap=’gray’)
plt.figure(figsize=(6,6))
plt.imshow((M3 – w.value).reshape(size_a, size_b), cmap=’gray’)
plt.figure(figsize=(6,6))
plt.imshow((w.value).reshape(size_a, size_b), cmap=’gray’)
–END–

SEEM2460 Introduction to Data Science Lab Assignment

Description

Related products

CS 63113 Statistical Methods for Data Science Mini Project 6 solved

CS 6313 Statistical Methods for Data Science Mini Project 4 solved

CptS 475/575: Data Science Assignment 2: R Basics and Exploratory Data Analysis