## Description

1. Suppose A is an n-by-n symmetric, rank-one matrix with right singular vectors vk, k =

1, 2, . . . , n. Show that the power method converges to v1 and determine the number of

iterations needed to converge within 1% of v1 if your initial vector b0 = √

1

n

1

1

.

.

.

1

.

2. Use the starter script available for download on the course page. The script loads

the 1000 three-dimensional data points in sdata.csv into a 1000-by-3 matrix X. The

second section of the code displays the data using a scatter plot format. We wish to

approximate the data using a subspace. Use the rotate tool to view the data cloud

from different perspectives.

a) Does the data appear to lie in a low-dimensional subspace? Why or why not?

Remember the definition of a subspace.

b) What could you do to the data so that it lies (approximately) in a low-dimensional

subspace?

c) The third section of the code removes the mean (average) value of the 1000 data

points. Use the rotate tool to inspect the scatterplot of the data with the mean

removed. Does the mean-removed data appear to lie in a low-dimensional subspace?

d) The fourth section of the code finds the SVD of the mean-removed data. If a is

a unit-norm vector representing the best one-dimensional subspace for the meanremoved data, complete the line of code to define a in terms of the SVD matrices.

The fifth section displays the one-dimensional subspace with the mean-removed

data. Note that the length of the vector representing the subspace is scaled by

the root-mean-squared value of the data for display purposes. Use the rotate tool

to inspect the relationship between the subspace and the data, and comment on

how well a one-dimensional subspace captures the data.

e) Let xzi, i = 1, 2, . . . , 1000 be the individual mean-removed data points and a the

unit-norm vector representing the best one-dimensional subspace for the data.

Thus, xzi ≈ awi

. Find wi

in terms of the SVD matrices U, S, and V .

f) Now write the original data xi

, i = 1, 2, . . . , 1000 as xi ≈ awi + b. What is b?

g) Let E be the difference between X and the rank-one approximation. Find a

mathematical expression for ||E||2

F

in terms of the singular values of the meanremoved data Xz.

h) Now try a rank-two approximation. Use the SVD to find an orthonormal basis

for the best plane containing the mean-removed data. Display the mean-removed

data and the bases for the plane.

i) Your rank-two approximation for the original data is xi ≈ a1w1i + a2w2i + b, i =

1, 2, . . . , 1000. Express w2i

, i = 1, 2, . . . , 1000 in terms of the SVD of the meanremoved data matrix Xz. Display a scatter plot of the original data (in red) and

the rank-two approximations in blue. Does the rank-two approximation lie in a

plane? Does that plane capture the dominant components of the data?

j) Let E be the difference between X and the rank-two approximation. Find a

mathematical expression for ||E||2

F

in terms of the singular values of the meanremoved data Xz.

k) Find and compare the numerical values for ||E||2

F using both the rank-1 and

rank-2 approximation.

3. Consider the face emotion classification problem studied previously. Design and compare the performances of the classifiers proposed in a and b, below. In each case,

divide the dataset into 8 equal sized subsets (e.g., examples 1 − 16, 17 − 32, etc).

Use

6 sets of the data to estimate w for each choice of the regularization parameter, then

select the best value for the regularization parameter by estimating the error on one

of the two sets of data held out from training, and finally use the w corresponding to

the best value of the regularization parameter to predict the labels of the remaining

set of data that was held out.

Compute the number of mistakes made on this hold-out

set and divide that number by 16 (the size of the set) to estimate the error rate. Note

there are 8 × 7 = 56 different choices of the two hold-out sets, so repeat this process

56 times and average the error rates across the 56 cases to obtain a final estimate.

a) Truncated SVD. Use the pseudo-inverse V Σ−1

r UT

, where Σ−1

r

is computed by

inverting the r largest singular values. Hence the regularization parameter r

takes values r = 1, 2, . . . , 9.

b) Ridge Regression. Let wbλ = arg minw ky − Xwk

2

2 + λkwk

2

2

, for the following

values of the regularization parameter λ = 0, 2

−1

, 2

0

, 2

1

, 2

2

, 2

3

, and 24

. Show that

wbλ can be computed using the SVD and use this fact in your code.