CS/ECE/ME532 Activity 16

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (1 vote)

1. The squared-error cost function f(w) = ||y − Xw||2
2 = (y − Xw)
T
(y − Xw) may be
rewritten as a perfect square in the form
f(w) = (w − wLS)
TXTX(w − wLS) + c
where wLS = (XTX)
−1XT y and c = y
T y − y
TX(XTX)
−1XT y. This assumes the
n-by-p (p < n) matrix X is full rank. f(w) is called a “quadratic form” in w since it
is a quadratic function of w.

a) Prove that the minimum value of f(w) = c when w = wLS and all other w result
in higher values of f(w).

b) Suppose y =





1
1/2
1
0





and the 4-by-2 X = UΣV
T has singular value decomposition U =





1 0
0 1
0 0
0 0





, Σ =

1 0
0 1/2
#
, and V = I. Sketch a contour plot of f(w)
in the w1-w2 plane.

c) Suppose y =





1
1/5
1
0





, U =





1 0
0 1
0 0
0 0





, Σ =

1 0
0 1/5
#
, and V = I. Sketch a
contour plot of f(w) in the w1-w2 plane. How do the singular values of X affect
the shape of the contours?

d) In this case assume y =






2
0
1
0





, U =





1 0
0 1
0 0
0 0





, Σ =

1 0
0 1/2
#
, and V =

1
2

1 1
1 −1
#
. Sketch a contour plot of f(w) in the w1-w2 plane. How do the
right singular vectors of X affect the contours?

e) Sketch the gradient of f(w) at w = wo on the contour plot for the previous case
when
i. wo =

2
2
#
ii. wo =

0
2
#
iii. wo =

2
1
#

2. The provided script will compute a specified number of iterations of the gradient descent algorithm and help you display the path taken by the weights in the gradient
descent iteration superimposed on a contour plot. Assume y =






2
0
1
0





, the 4-by-2
X = UΣV
T has singular value decomposition U =





1 0
0 1
0 0
0 0





, Σ =

1 0
0 1/2
#
, and
V = √
1
2

1 1
1 −1
#
. Complete 20 iterations of gradient descent in each case specified
below.

Include the plots you generate below with your submission.

a) What is the maximum value for the step size τ that will guarantee convergence?
b) Start gradient descent from the point w =

1.5
−0.5
#
and use a step size of τ = 0.5.

How do you explain the trajectory the weights take toward the optimum, e.g., why
is it shaped this way? What would the trajectory be if you started from the point
w =

0
0
#
? From w =

0
2
#
?

c) Start gradient descent from the point w =

1.5
−0.5
#
and use a step size of τ = 2.5.
Complete 20 iterations. What happens and why?

d) Now change Σ =

1 0
0 1/4
#
, start from w =

1.5
−0.5
#
and use τ = 0.5. What
happens to the cost function? How does the change in the cost function lead to
a change in the trajectory of the gradient descent weights? What would happen
if you further decreased the smaller singular value?

e) Discuss how changing the ratio of the singular values changes the shape of the cost
function and how that might affect the number of iterations it takes for gradient
descent to get close to the optimum.