CS/ECE/ME532 Activity 20

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (1 vote)

1. An exponential loss function f(w) is defined as
f(w) = (
e
−2(w−1), w < 1
e
w−1
, w ≥ 1

a) Is f(w) convex? Why? Hint: Graph the function.

b) Is f(w) differentiable everywhere? If not, where not?

c) The “differential set” ∂f(w) is the set of subgradients v ∈ ∂f(w) for which
f(u) ≥ f(w) + (u − w)
T v. Find the differential set for f(w) as a function of w.

2. We are trying to predict whether a certain chemical reaction will take place as a function
of our experimental conditions: temperature, pressure, concentration of catalyst, and
several other factors.

For each experiment i = 1, . . . , m we record the experimental
conditions in the vector xi ∈ R
n and the outcome in the scalar bi ∈ {−1, 1} (+1 if the
reaction occurred and −1 if it did not). We will train our linear classifier to minimize
hinge loss.

Namely, we solve:
minimize
w
Xm
i=1
(1 − bix
T
i w)+ where (u)+ = max(0, u) is the hinge loss operator

a) Derive a gradient descent method for solving this problem. Explicitly give the
computations required at each step. Note: you may ignore points where the
function is non-differentiable.

b) Explain what happens to the algorithm if you land at a wk
that classifies all the
points perfectly, and by a substantial margin.

3. You have four training samples y1 = 1, x1 =

1
−1
#
, y2 = 2, x2 =

1
−2
#
, y3 =
−1, x3 =

−1
0
#
, and y4 = −2, x4 =

−2
1
#
. Use cyclic stochastic gradient descent
to find the first two updates for the LASSO problem
min
w
||y − Xw||2
2 + 2||w||1
assuming a step size of τ = 1 and w(0) = 0. Also indicate the data used for the first
six updates.