$30.00
Order NowQuestion 1 (4 Points, implementation involved). Plot some level sets of the following three functions (when we say
8 ”plot something” you need to show the plots in the PDF file):
• f1(x1, x2) = (x1 − x2)
2
9
• f2(x1, x2) = x
2
1 − 3x
2
2
10
• f3(x1, x2, x3) = x
2
1 + 5x
2
2 + 3x
2
3
11
12 for the range of xi ∈ [−10, 10] for all i. You only need to plot the level sets of the function roughly taking integer
13 values, i.e. f(x) = 1, −1, 0 etc. (small numerical errors are allowed, we only care about the shapes of the curves).
Question 2 (8 Points, implementation involved). Consider f(x1, x2) = x
2
1 − x1x2 + 3x
2
14 2 + 5 and let the initial point
15 be x = (2, 2). Implement the following procedures for minimizing f:
16 1. (2 Points) Perform gradient descent using a fixed step size α = 0.3.
17 2. (3 Points) Perform gradient descent using optimal step sizes in each step.
18 3. (3 Points) Perform Newton descent using step size α = 1.
19 For each case, plot two graphs. First, the value of f with respect to the number of iterations (i.e., let the x-axis be the
20 number of iterations starting from 0, and y-axis shows the f value). Second, plot the sequence of points you get in
21 each iteration and the gradient descent direction, in the x1-x2 plane. If there are many iteration steps you only need to
22 show up to 20 steps in each plot.
23 Question 3 (3 Points). Prove that if a function is convex, then any of its local minimum is also a global minimum.
24 Hint: suppose a function has two local minima that correspond to different function values (so not both are global
25 minima), then prove that the function can not be convex.
26 Question 4 (7 Points + 5 Extra Points, coding involved). Consider the following regression problem. We aim to train a
model fA,B(x) = Ax+B where x ∈ R
3
and f(x) ∈ R
3
27 , where the parameter matrices A and B are of the appropriate
28 sizes, to fit the following dataset:
D = {(
1
2
3
,
−1
−3
1
),(
1
2
3
,
1
−3
1
),(
−1
2
−1
,
0
1
2
)}.
Namely, these are the three data points in D (let’s label them (xi
29 , yi) for i = 1, 2, 3, respectively); for each pair in D,
the first component xi
is the input vector, and the second component yi
30 is the label vector that we hope to predict. We
1
31 aim to find the parameter matrices A and B such that the mean squared error over D is minimized. That is, we need
32 to minimize the loss function
L(A, B) = X
3
i=1
(fA,B(xi) − yi)
2
where (xi
33 , yi) ∈ D. Answer the following:
34 1. (3 Points) Prove that any local minimum of L(A, B) is also a global minimum of L(A, B). The proof can either
35 involve numerical calculations or a general argument that only uses the symbolic formulas.
36 2. (3 Points) Perform gradient descent to find one solution A, B that minimizes L(A, B) and plot the value of L
37 over the number of iterations until convergence (you can choose any initial point and any step size you like).
38 3. (1 Point) Can you use Newton directions for this minimization? Why?
39 4. (Extra 5 Points) Perform conjugate gradient descent from the same initial point as what you chose for gradient
40 descent. You can give it more data points for the problem to be solvable with conjugate descent methods. Plot
41 the change of loss over number of iterations.
42 Question 5 (4 Points). Prove that gradient descent with exact line search always takes orthogonal steps. That is,
43 suppose pk is the gradient descent direction in the kth iteration of gradient descent for minimizing an arbitrary continuously differentiable and lower-bounded function f(x) : R 44
n → R. After we use a step size α that minimizes
45 f(xk + αpk) along the direction of pk (i.e. minα≥0 f(xk + αpk)), the next direction pk+1 in gradient descent always
46 satisfies pkpk+1 = 0.
47 Question 6 (4 Points). Follow the proof sketches in the slides and rigorously prove the first-order and second-order
48 necessary conditions for local minima.
2
WhatsApp us