CS/ECE/ME532 Assignment 6

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (5 votes)

1. Suppose A is an n-by-n symmetric, rank-one matrix with right singular vectors vk, k =
1, 2, . . . , n. Show that the power method converges to v1 and determine the number of
iterations needed to converge within 1% of v1 if your initial vector b0 = √
1
n






1
1
.
.
.
1






.
2. Use the starter script available for download on the course page. The script loads
the 1000 three-dimensional data points in sdata.csv into a 1000-by-3 matrix X. The
second section of the code displays the data using a scatter plot format. We wish to
approximate the data using a subspace. Use the rotate tool to view the data cloud
from different perspectives.
a) Does the data appear to lie in a low-dimensional subspace? Why or why not?
Remember the definition of a subspace.
b) What could you do to the data so that it lies (approximately) in a low-dimensional
subspace?
c) The third section of the code removes the mean (average) value of the 1000 data
points. Use the rotate tool to inspect the scatterplot of the data with the mean
removed. Does the mean-removed data appear to lie in a low-dimensional subspace?
d) The fourth section of the code finds the SVD of the mean-removed data. If a is
a unit-norm vector representing the best one-dimensional subspace for the meanremoved data, complete the line of code to define a in terms of the SVD matrices.
The fifth section displays the one-dimensional subspace with the mean-removed
data. Note that the length of the vector representing the subspace is scaled by
the root-mean-squared value of the data for display purposes. Use the rotate tool
to inspect the relationship between the subspace and the data, and comment on
how well a one-dimensional subspace captures the data.
e) Let xzi, i = 1, 2, . . . , 1000 be the individual mean-removed data points and a the
unit-norm vector representing the best one-dimensional subspace for the data.
Thus, xzi ≈ awi
. Find wi
in terms of the SVD matrices U, S, and V .
f) Now write the original data xi
, i = 1, 2, . . . , 1000 as xi ≈ awi + b. What is b?
g) Let E be the difference between X and the rank-one approximation. Find a
1 of 2
mathematical expression for ||E||2
F
in terms of the singular values of the meanremoved data Xz.
h) Now try a rank-two approximation. Use the SVD to find an orthonormal basis
for the best plane containing the mean-removed data. Display the mean-removed
data and the bases for the plane.
i) Your rank-two approximation for the original data is xi ≈ a1w1i + a2w2i + b, i =
1, 2, . . . , 1000. Express w2i
, i = 1, 2, . . . , 1000 in terms of the SVD of the meanremoved data matrix Xz. Display a scatter plot of the original data (in red) and
the rank-two approximations in blue. Does the rank-two approximation lie in a
plane? Does that plane capture the dominant components of the data?
j) Let E be the difference between X and the rank-two approximation. Find a
mathematical expression for ||E||2
F
in terms of the singular values of the meanremoved data Xz.
k) Find and compare the numerical values for ||E||2
F using both the rank-1 and
rank-2 approximation.
3. Consider the face emotion classification problem studied previously. Design and compare the performances of the classifiers proposed in a and b, below. In each case,
divide the dataset into 8 equal sized subsets (e.g., examples 1 − 16, 17 − 32, etc). Use
6 sets of the data to estimate w for each choice of the regularization parameter, then
select the best value for the regularization parameter by estimating the error on one
of the two sets of data held out from training, and finally use the w corresponding to
the best value of the regularization parameter to predict the labels of the remaining
set of data that was held out. Compute the number of mistakes made on this hold-out
set and divide that number by 16 (the size of the set) to estimate the error rate. Note
there are 8 × 7 = 56 different choices of the two hold-out sets, so repeat this process
56 times and average the error rates across the 56 cases to obtain a final estimate.
a) Truncated SVD. Use the pseudo-inverse V Σ−1
r UT
, where Σ−1
r
is computed by
inverting the r largest singular values. Hence the regularization parameter r
takes values r = 1, 2, . . . , 9.
b) Ridge Regression. Let wbλ = arg minw ky − Xwk
2
2 + λkwk
2
2
, for the following
values of the regularization parameter λ = 0, 2
−1
, 2
0
, 2
1
, 2
2
, 2
3
, and 24
. Show that
wbλ can be computed using the SVD and use this fact in your code.
2 of 2