# CMSC25400 Homework 2

\$30.00

## Description

1. (10 points)
(a) Let v1, v2, . . . , vd be d mutually orthogonal unit vectors in R
d
(i.e., a basis for R
d
), let λ1, . . . , λd
be real numbers, and
A =

d
i=1
λiviv

i
. (1)
Show that v1, . . . , vd are eigenvectors of A with corresponding eigenvalues λ1, . . . , λd.
(b) Conversely, show that if v1, . . . , vd is an orthonormal system of eigenvectors and λ1, . . . , λd are
the corresponding eigenvalues of a symmetric matrix A ∈ R
d×d
, then A is of the form (1).
2. (20 points) Let A ∈ R
d×d be a symmetric matrix, and for simplicity assume that all its eigenvalues are
positive and distinct, 0 < λ1 < λ2 < . . . < λd. Let v1, v2, . . . , vn be the corresponding (normalized)
eigenvectors.
(a) Prove that any two of the eigenvectors vi and vj (assuming i ̸= j) are orthogonal (you may wish
to compare viAvj and vjAvi).
(b) Explain why this implies that v1, v2, . . . , vd form an orthonormal basis for R
d
.
(c) The Rayleigh quotient of A is
R(w) = w⊤Aw
w⊤w
w ∈ R
n
.
Prove that the maximum of R(w) is λd, and that the maximum is attained at w = vd.
3. (20 points) Recall that the empirical covariance matrix (sample covariance matrix) of a dataset
{x1, x2, . . . , xn} with xi ∈ R
d
(assuming that 1
n
∑n
i=1 xi = 0) is
Σb =
1
n
∑n
i=1
xix

i
.
(a) Since Σb is symmetric, it has an orthonormal basis of eigenvectors v1, v2, . . . , vd with λ1 ≤ λ2 ≤
. . . ≤ λd, and it can be expressed as
Σb =

d
i=1
λivivi
.
Let Σb
(1)
be the reduced empirical covariance matrix
Σb
(1)
=
1
n
∑n
i=1
(xi − (xi
· vd)vd)(xi − (xi
· vd)vd)
⊤.
Show that Σb
(1)
vd = 0, while Σb
(1)
vi = λivi for all i < d. What are the eigenvalues and eigenvectors
of Σb
(1)
then? Use this to show that the second principal component is vd−1.
(b) Use induction to show that the k’th principal component of the data is vd−k+1.
1
4. (25 points) The file 3Ddata.txt is a dataset of 500 points in R
3
sampled from a manifold with some
added noise. The last number in each line is just an index in {1, 2, 3, 4} related to the position of the
point on the manifold to make the visualization prettier (for example, you can plot those points with
index 1 in green, index 2 in yellow, 3 in blue and 4 in red).
Apply PCA and Isomap to map this data to R
2
. To construct the graph (mesh) for Isomap you can
use k = 10 nearest neighbors. Plot the results and comment on the differences. For both methods you
need to write your own code (it shouldn’t be more than a few lines each) and submit it together with
the write-up.
5. (25 points) The file train35.digits contains 2000 images of 3’s and 5’s from the famous MNIST
database of handwritten digits in text format. The size of each image is 28 × 28 pixels. Each row
of the file is a representation one image, with the 28 × 28 pixels flattened into a vector of size 784.
A value of 1 for a pixel represents black, and value of 0 represents white. The corresponding row of
train35.labels is the class label: +1 for the digit 3, or −1 for the digit 5. The file test35.digits
contains 200 testing images in the same format as train35.digits.
Implement the perceptron algorithm and use it to label each test image in test35.digits. Submit the
predicted labels in a file named test35.predictions. In the lectures, the perceptron was presented
as an online algorithm. To use the perceptron as a batch algorithm, train it by simply feeding it
the training set M times. The value of M can be expected to be less than 10, and should be set by
cross validation. Naturally, in this context, the “mistakes” made during training are not really errors.
Nonetheless, it is intructive to see how the frequency of mistakes decreases as the hypothesis improves.
Include in your write-up a plot of the cumulative number of “mistakes” as a function of the number of
examples seen. You may find that it improves performance to normalize each example to unit norm.
2