Description
1. An exponential loss function f(w) is defined as
f(w) = (
e
−2(w−1), w < 1
e
w−1
, w ≥ 1
a) Is f(w) convex? Why? Hint: Graph the function.
b) Is f(w) differentiable everywhere? If not, where not?
c) The “differential set” ∂f(w) is the set of subgradients v ∈ ∂f(w) for which
f(u) ≥ f(w) + (u − w)
T v. Find the differential set for f(w) as a function of w.
2. We are trying to predict whether a certain chemical reaction will take place as a function
of our experimental conditions: temperature, pressure, concentration of catalyst, and
several other factors.
For each experiment i = 1, . . . , m we record the experimental
conditions in the vector xi ∈ R
n and the outcome in the scalar bi ∈ {−1, 1} (+1 if the
reaction occurred and −1 if it did not). We will train our linear classifier to minimize
hinge loss.
Namely, we solve:
minimize
w
Xm
i=1
(1 − bix
T
i w)+ where (u)+ = max(0, u) is the hinge loss operator
a) Derive a gradient descent method for solving this problem. Explicitly give the
computations required at each step. Note: you may ignore points where the
function is non-differentiable.
b) Explain what happens to the algorithm if you land at a wk
that classifies all the
points perfectly, and by a substantial margin.
3. You have four training samples y1 = 1, x1 =
”
1
−1
#
, y2 = 2, x2 =
”
1
−2
#
, y3 =
−1, x3 =
”
−1
0
#
, and y4 = −2, x4 =
”
−2
1
#
. Use cyclic stochastic gradient descent
to find the first two updates for the LASSO problem
min
w
||y − Xw||2
2 + 2||w||1
assuming a step size of τ = 1 and w(0) = 0. Also indicate the data used for the first
six updates.