Description
1. Consider the system of linear equations Xw = y where X =
”
1 1
−2 −2
#
, w =
”
w1
w2
#
and y =
”
2
−4
#
.
a) Sketch the set of all w that satisfy Xw = y in the w1-w2 plane. Is the solution
unique? What is the value of the squared error minw ||Xw − y||2
2
?
b) Use your sketch to find the w of minimum norm that satisfies the system of
equations: minw ||w||2
2
subject to Xw = y. Is this solution unique? What makes
it unique? What is the value of the squared error ||Xw − y||2
2
at this solution?
What is the value of ||w||2
? Hint: The equation ||w||2
2 = c describes a circle in
R
2 with radius √
c.
c) Algebraically find the wˆ that solves the Tikhonov-regularized (or ridge regression)
problem wˆ = arg minw {||Xw − y||2
2 + λ||w||2
2} as a function of λ. Hint: Recall
that
”
a b
c d #−1
=
1
ad − bc ”
d −b
−c a #
d) Sketch the set solution to the Tikhonov-regularized problem in the w1-w2 plane
as a function of λ for 0 < λ < ∞. (Consider the solution for different values
of λ in that range.) Find the squared error ||Xw − y||2
2
and norm squared of
the solution, ||w||2
2
for λ = 0 and λ = 5. Compare the squared error and norm
squared of the solution to those in part b).
2. Let X =
1 γ
1 −γ
1 −γ
1 γ
.
a) Show that the columns of X are orthogonal to each other for any γ.
b) Express X = UΣ where U is a 4-by-2 matrix with orthonormal columns and Σ
is a 2-by-2 diagonal matrix (the non-diagonal entries are zero).
c) Express the solution to the least-squares problem minw ||Xw −y||2
2
as a function
of U, Σ, and y.
d) Let y =
1
0
0
1
. Find the weights w as a function of γ. What happens to ||w||2
2
as γ → 0?
e) The ratio of the largest to the smallest diagonal values in Σ is termed the condition
number of X. Find the condition number if γ = 0.1 and γ = 10−8
. Also find
||w||2
2
for these two values of γ.
f) A system of linear equations with a large condition number is said to be “illconditioned”. One consequence of an ill-conditioned system of equations is solutions with large norms as you found in the previous part of this problem.
A second
consequence is that the solution is very sensitive to small errors in y such as may
result from measurement error or numerical error. Suppose y =
1 +
0
0
1
.
Write
w = wo + w where wo is the solution for arbitrary γ when = 0 and w
is the
perturbation in that solution due to some error 6= 0. How does the norm of the
perturbation due to 6= 0, ||w
||2
2
, depend on the condition number? Find ||w
||2
2
for = 0.01 and γ = 0.1 and γ = 10−8
.
g) Now apply ridge regression, i.e., Tikhonov reqularization. Solve for wo and w as
a function of λ. Find ||wo||2
2
and ||w
||2
2
for λ = 0.1, = 0.01 and γ = 0.1 and
γ = 10−8
. Comment on the impact of regularization.