SEEM2460 Introduction to Data Science Lab Assignment

$30.00

Category: Tags: , , , , You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (4 votes)

Question 1. Assume we have a simplified version of the Animal Classification dataset1 which
includes properties of animals as descriptive features and the animal species as target feature. In
our dataset, the animals are classified as being Mammals or Reptiles based on whether they are
toothed and have legs, as shown in Table 1. In this question, you are asked to develop a decision
tree based on this simplified dataset.
Table 1: Animal Classification dataset
Instance Toothed Legs Species
1 T T Mammal
2 T T Mammal
3 T F Reptile
4 F T Mammal
5 T T Mammal
6 T T Mammal
7 T F Reptile
8 T F Reptile
9 T T Mammal
10 F T Reptile
(a) Calculate the resulting Gini index when splitting on the attribute β€œToothed” and β€œLegs”,
respectively (i.e. 𝐺𝑖𝑛𝑖𝑠𝑝𝑙𝑖𝑑”π‘‡π‘œπ‘œπ‘‘β„Žπ‘’π‘‘” and 𝐺𝑖𝑛𝑖𝑠𝑝𝑙𝑖𝑑”𝐿𝑒𝑔𝑠”
). Show your calculation details. Which
attribute would be chosen as the first splitting attribute? (10 pts)
(b) Based on the decision in Question 1.a, draw a two-level decision tree if needed using both
attributes for splitting. Mark the class label in each leaf node. In case of a tie on the β€œMammal”
and β€œReptile” instances in a leaf node, mark the node as β€œ-”. (4 pts)
(c) WEKA Tool Practice. Use the WEKA tool to classify the data with decision tree (J48) under
the test option β€œUse training set”. Copy the result in β€˜classifier output’ window to your
assignment. (6 pts)

1
the UCI Zoo Dataset
2
Question 2. In class, we learn how to solve the sparse recovery problem:
min |π‘₯1
| + |π‘₯2
| + |π‘₯3
| + |π‘₯4
| + |π‘₯5
|
𝑠.𝑑. [
0 βˆ’1 0
βˆ’2 1 0
0 1 0
βˆ’1 1 0
2 0 βˆ’1
0 0 1
0 0 1 0 βˆ’ 1 2
]
[

 

 

π‘₯1
π‘₯2
π‘₯3
π‘₯4
π‘₯5
π‘₯6]

 

 

= [
2
1
1
βˆ’3
]
𝐴 π‘₯ 𝑏
We find a solution π‘₯ = (0,1,0,0,3,0) with four entries being zero. Now, instead of finding a
solution π‘₯ to 𝐴π‘₯ = 𝑏 with as many zero entries as possible, we want to find a solution to 𝐴π‘₯ = 𝑏
that minimizes the first two entries. This motivates the following optimization formulation:
min |π‘₯1
| + |π‘₯2
|
𝑠.𝑑. [
0 βˆ’1 0
βˆ’2 1 0
0 1 0
βˆ’1 1 0
2 0 βˆ’1
0 0 1
0 0 1 0 βˆ’ 1 2
]
[

 

 

π‘₯1
π‘₯2
π‘₯3
π‘₯4
π‘₯5
π‘₯6]

 

 

= [
2
1
1
βˆ’3
]
Please use β€œcvxpy” to solve the above sparse recovery problem. (10 pts: 8pts for code, 2pts for
answer)
Question 3. In lecture we have learned two ideas to tackle the problem of background extraction.
These two ideas lead to an optimization formulation as follows:
(a) We have three figures extracted from one video, i.e., m = 3. The three figures are denoted
as M1, M2 and M3, with the same size of 130*160, i.e., n = 130*160. Part of the cvxpy
code is shown as below. Can you modify the code to obtain an implementation of the above
formulation? You can find the figures in attachment. (10 pts)
3
(b) Please run your code for (S1) with your own figure(s) and examine the result. How is the
background extraction? Please attach the figures you use and the results you obtain. (bonus:
10 pts)
Cvxpy Code for Background-Extraction Formulation (Sο‚₯)
import numpy as np
import cv2
import cvxpy as cp
from cvxpy import *
import matplotlib.pyplot as plt
im1 = cv2.imread(‘/content/Figure1.png’,cv2.IMREAD_GRAYSCALE)
im2 = cv2.imread(‘/content/Figure2.png’,cv2.IMREAD_GRAYSCALE)
im3 = cv2.imread(‘/content/Figure3.png’,cv2.IMREAD_GRAYSCALE)
M_size = im1.shape
size_a = M_size[0]
size_b = M_size[1]
n = size_a*size_b
M1 = im1.reshape(n,-1)
M2 = im2.reshape(n,-1)
M3 = im3.reshape(n,-1)
w = cp.Variable((n,1))
# Please trying to implementing you code here:
###############################################
plt.figure(figsize=(6,6))
plt.imshow((M1 – w.value).reshape(size_a, size_b), cmap=’gray’)
plt.figure(figsize=(6,6))
4
plt.imshow((M2 – w.value).reshape(size_a, size_b), cmap=’gray’)
plt.figure(figsize=(6,6))
plt.imshow((M3 – w.value).reshape(size_a, size_b), cmap=’gray’)
plt.figure(figsize=(6,6))
plt.imshow((w.value).reshape(size_a, size_b), cmap=’gray’)
–END–