CSE552 – Machine Learning Homework #4

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

Part I: Apply PCA to MNIST Data
Use the function pca(X) from Homework 3 or from an existing library. Recall that this function takes an
𝑛 × 𝑑 matrix 𝑿 and returns mean, weights and vectors. 𝑿 has in each row the pixel of an input
image. The mean is the mean of the columns of X. The principle components of X are in vectors.
The corresponding eigenvalues are in weights. Using only a number of components, obtain a new
data matrix 𝑿′. Use this new matrix 𝑿′ in the next part.
Part II: Using PCA before Classification
Use an existing k-means algorithm with three different distannce metric: 1) L2 norm (Euclidean
distance), 2) L1 norm (Manhattan distance), and 3) Cosine distance.
Using the transformed data apply k-means algorithm (use k=10 for ten digits) to cluster 80% of the
data and test the result on the remaining 20% of the data (repeat this 5 times for cross validation).
Report the performance of the clustering using the following measurement.
• Labeling of clusters:
o Using the given labels for the training data form the following table:
𝐶1 𝐶2 𝐶3 𝐶4 𝐶5 𝐶6 𝐶7 𝐶8 𝐶9 𝐶10
Label 0 𝑛1,0 𝑛2,0 𝑛3,0
Label 1
Label 2
Label 3
Label 4 𝑛5,4
Label 5
Label 6
Label 7
Label 8
Label 9 𝑛10,9
Where 𝑛𝑖,𝑗
indicates how many of the training data with label j falls in to the cluster i.
o Find the maximum 𝑛𝑖,𝑗
in the table and label cluster 𝑖 with label 𝑗. Find the next
maximum 𝑛𝑖,𝑗 and if cluster 𝑖 is not already labeled or label j is not yet assigned, label
it with 𝑗. Otherwise move to the next maximum 𝑛𝑖,𝑗 and label if not already labeled or
the label is not yet assigned. Repeat this until all the clusters are labeled.
For example, the following incomplete table of clustering result will have the given
labels.
𝐶1 𝐶2 𝐶3 𝐶4
L0
𝐶5 𝐶6 𝐶7
L2
𝐶8
L1
𝐶9 𝐶10
L3
Label 0 0 0 100 300 100 100 0 0 0 0
Label 1 0 100 0 0 0 0 0 400 100 0
Label 2 190 0 0 100 0 0 310 0 0 0
Label 3 100 100 100 0 140 0 0 0 0 160
The maximum 400 will label cluster 8 as label 1. The next maximum 310 will label
cluster 7 as label 2. The next maximum 300 will label cluster 4 as label 0. The next
maximum 190 will not label cluster 1 as label 2 since label 2 is already assigned. The
next maxumum 160 will label cluster 10 as label 3.
• Training error:
o Once the clusters are labeled, for each training data, construct the confusion matrix
and calculate the accuracy.
• Test error:
o For the test data, use 1-nn to decide which cluster the data will fall into. And construct
the confusion matrix and calculate the accuracy.
What to hand in: You are expected to hand in one of the following
HW4_lastname_firstname_studentnumber_code.ipynb. Your notebook should have:
Part I: Code
Results:
Conclusions:
Part II: Code
Results:
Conclusions: