Description
Problem 1: PCA and Feature Selection (50pts)
In this problem, we will explore ways that we can use PCA for the problem of generating or selecting
“good” features.
SVMs and PCA (25pts)
Consider the UCI Sonar data set (attached to this problem set).
• Perform PCA on the training data to reduce the dimensionality of the data set (ignoring the
class labels for the moment). What are the top six eigenvalues of the data covariance matrix?
• For each k ∈ {1, 2, 3, 4, 5, 6}, project the training data into the best k dimensional subspace
(with respect to the Frobenius norm) and use the SVM with slack formulation to learn a
classifier for each c ∈ {1, 10, 100, 1000}. Report the error of the learned classifier on the
validation set for each k and c pair.
• What is the error of the best k/c pair on the test data? How does it compare to the best
classifier (with the same possible c choices) without feature selection? Explain your observations.
• If you had to pick a value of k before evaluating the performance on the validation set (e.g.,
if this was not a supervised learning problem), how might you pick it?
PCA for Feature Selection (25pts)
If we performed PCA directly on the training data as we did in the first part of this question, we
would generate new features that are linear combinations of our original features. If instead, we
wanted to find a subset of our current features that were good for classification, we could still use
PCA, but we would need to be more clever about it.
The primary idea in this approach is to select
features from the data that are good at explaining as much of the variance as possible.
To do this,
we can use the results of PCA as a guide. Implement the following algorithm for a given k and s:
1. Compute the top k eigenvalues and eigenvectors of the covariance matrix corresponding to
the data matrix omitting the labels (recall that the columns of the data matrix are the input
data points). Denote the top k eigenvectors as v
(1), . . . , v(k)
.
1
2. Define πj =
1
k
Pk
i=1 v
(i)
j
2
.
3. Sample s columns independently from the probability distribution defined by π.
• Why does π define a probability distribution?
• Again, using the UCI Sonar data set, for each k ∈ {1, . . . , 10} and each s ∈ {1, . . . , 20},
report the average test error of the SVM with slack classifier over 100 experiments. For each
experiment use only the s selected features (note that there may be some duplicates, so only
include each feature once).
• Does this provide a reasonable alternative to SVM with slack formulation without feature
selection on this data set? What are the pros and cons of this approach?
Problem 2: Spectral Clustering (50pts)
In this problem, we will take a look at a simple clustering algorithm based on the eigenvalues of
a matrix. This approach to clustering is typically referred to as spectral clustering. The basic
approach is as follows, given a collection of n points, x1, . . . , xn ∈ R
m, we construct a matrix of
A ∈ R
n×n of similarities between them. Here, Aij = Aji = e
− 1
2σ2
||xi−xj ||2
is the similarity between
xi and xj for some σ ∈ R.
The Basic Algorithm (20pts)
Write a function in MATLAB or Python that, given the matrix of similarities, performs the following
operations.
1. Compute the “Laplacian matrix”, L = D−A, where D is a diagonal matrix with Dii =
P
j Aij
for all i. Argue that this matrix is positive semidefinite.
2. Compute the eigenvectors of the Laplacian using eig() in MATLAB (numpy in Python).
3. Construct a matrix V ∈ R
n×k whose columns are the eigenvectors that correspond to the k
smallest eigenvalues of L.
4. Let y1, . . . , yn denote the rows of V . Use the kmeans() algorithm in MATLAB (scikit-learn
in Python) to cluster the rows of V into clusters S1, . . . , Sk.
5. The final clusters C1, . . . , Ck should be given by assigning vertex i of the input set to cluster
Cj if yi ∈ Sj .
A Simple Comparison (15pts)
1. Use the spectral clustering algorithm above to compute the clustering for the matrix of twodimensional points returned by the function circs() (attached to this problem set) above
with k = 2 and different values of σ. Use the k-means algorithm in MATLAB/Python to
compute an alternative clustering.
2. Use the scatter() function in MATLAB to output the points colored by which cluster that
they belong to for both algorithms. Include your plot in your submission.
2
3. Find a choice of σ such that the spectral method outperforms k-means. How do you know
that there is no k-means solution (i.e., a choice of centers and clusters) that performs this
well? Include the output of your code in your submission.
Partitioning Images (15pts)
1. We can use the same spectral technique to partition images. Here, we consider each pixel of
a grayscale image as a single intensity and construct a similarity matrix for pairs of pixels
just as before.
2. Perform the same comparison of spectral clustering and k-means as before using the image
bw.jpg that was attached as part of the homework. Again, set k = 2. You can use imread()
to read an image from a file in MATLAB.
3