Description
1. Let the n-by-p rank-r (n > p > r) matrix X have SVD X = UΣV
T where U is
n-by-r, Σ is r-by-r, and V is p-by-r.
a) Find the SVD of Z = XT
in terms of U, Σ, and V .
b) Find the orthonormal basis for the best rank-1 subspace to approximate the rows
of Z in terms of U, V , and Σ.
2. Uniqueness of solutions and Tikhonov regularization (ridge regression).
The least-squares problem is minw ||y − Xw||2
2
. Assume X is n-by-p with p < n.
a) Under what conditions is the solution to the least-squares problem not unique?
b) The Tikhonov-regularized least-squares problem is
min
w
||y − Xw||2
2 + λ||w||2
2
Show that this can be written as an ordinary least-squares problem minw ||yˆ −
Xwˆ ||2
2
and find yˆ and Xˆ .
c) Use the results from the previous part to determine the conditions for which the
Tikhonov-regularized least-squares problem has a unique solution.
3. Psuedoinverse and truncated SVD. The solution to the ridge regression problem
min
w
||y − Xw||2
2 + λ||w||2
2
is given by w∗ = (XTX + λI)
−1XT y . The psuedoinverse of X, denoted X†
, can be
defined by looking at the limit of the ridge regression solution as λ → 0 (from above):
X† = lim
λ↓0
(XTX + λI)
−1XT
.
a) Let X ∈ R
n×p
, p ≤ n, have SVD X = UΣV
T =
Pp
i=1 σiuiv
T
i
. Show that
(XTX + λI)
−1XT =
X
p
i=1
σi
σ
2
i + λ
viu
T
i
.
Hint: Note that XTX = V Σ
2V
T and λI = V λIV T
.
b) Using the limit definition of the psuedoinverse above, show that when XTX is
invertible, then X† = (XTX)
−1XT
.
c) Argue that when X is square and invertible, then X† = X−1
.
d) Argue that if X is rank r < p, then for λ > 0,
(XTX + λI)
−1XT =
Xr
i=1
σi
σ
2
i + λ
viu
T
i
.
e) Now argue that if X is rank r < p,
X† =
Xr
i=1
1
σi
viu
T
i = V Σ
−1
r U
T
where Σ−1
r
is a matrix with 1/σi on the diagonal for i = 1, . . . , r, and zero elsewhere.
4. The data file is available with a matrix X of 100 three-dimensional data points. A
script is available with code to assist you with visualizing and fitting this data. Use the
results of the SVD to find a, a basis for the best (minimum sum of squared distances)
one-dimensional subspace for the data.
a) Run the code to display the data in Figure the first figure. Use the rotate tool
to inspect the scatter plot from different angles. Does the data appear to lie very
close to a one-dimensional subspace? Does the data appear to be zero mean?
b) Figure 2 depicts the centered data and the one-dimensional subspace that contains
the dominant feature you identified using the SVD. Use the rotate tool to inspect
the data and one-dimensional subspace from different angles. Is a one-dimensional
subspace a reasonable fit to the data? Comment on the error.
c) Now comment out (insert %) the line of code that subtracts the mean of the
data. Does the dominant feature identified by SVD continue to be a good fit to
the data? Comment on the importance of removing the mean before performing
PCA.