Description
1. Regularized linear regression. For this problem, we will use the linear regression model
from lecture:
y =
X
D
j=1
wjxj + b
In lecture, we saw that regression models with too much capacity can overfit the training
data and fail to generalize. One way to improve generalization, which we’ll cover properly
later in this course, is regularization: adding a term to the cost function which favors some
explanations over others. For instance, we might prefer that weights not grow too large in
magnitude. We can encourage them to stay small by adding a penalty
R(w) = λ
2
w>w =
λ
2
X
D
j=1
w
2
j
to the cost function, for some λ > 0. In other words,
Ereg =
1
2N
X
N
i=1
y
(i) − t
(i)
2
| {z }
=E
+
λ
2
X
D
j=1
w
2
j
| {z }
=R
,
where i indexes the data points and E is the same squared error cost function from lecture.
Note that in this formulation, there is no regularization penalty on the bias parameter.
(a) [3 pts] Determine the gradient descent update rules for the regularized cost function
Ereg. Your answer should have the form:
wj ← · · ·
b ← · · ·
This form of regularization is sometimes called “weight decay”. Based on this update
rule, why do you suppose that is?
1
https://markus.teach.cs.toronto.edu/csc321-2017-01
2
http://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/syllabus.pdf
1
CSC321 Homework 1
(b) [3 pts] It’s also possible to solve the regularized regression problem directly by setting
the partial derivatives equal to zero. In this part, for simplicity, we will drop the bias
term from the model, so our model is:
y =
X
D
j=1
wjxj .
In tutorial, and in Section 3.1 of the Lecture 2 notes, we derived a system of linear
equations of the form
∂E
∂wj
=
X
D
j
0=1
Ajj0wj
0 − cj = 0.
It is possible to derive constraints of the same form for Ereg. Determine formulas for Ajj0
and cj .
2. Visualizing the cost function. In lecture, we visualized the linear regression cost function
in weight space and saw that the contours were ellipses. Let’s work through a simple example
of that. In particular, suppose we have a linear regression model with two weights and no
bias term:
y = w1x1 + w2x2,
and the usual loss function L(y, t) = 1
2
(y−t)
2 and cost E(w1, w2) = 1
N
P
i L(y
(i)
, t(i)
). Suppose
we have a training set consisting of N = 3 examples:
• x
(1) = (2, 0), t(1) = 1
• x
(2) = (0, 1), t(2) = 2
• x
(3) = (0, 1), t(3) = 0.
Let’s sketch one of the contours.
(a) [2pts] Write the cost in the form
E = c1(w1 − d1)
2 + c2(w2 − d2)
2 + E0.
(b) [2pts] Since c1, c2 > 0, this corresponds to an axis-aligned ellipse. Sketch the ellipse by
hand for E = 1. Label the center and radii of the ellipse. If you’ve forgotten how to plot
axis-aligned ellipses, see Khan Academy3
.
3
https://www.khanacademy.org/math/algebra-home/alg-conic-sections/alg-center-and-radii-of-an-ellipse/
v/conic-sections-intro-to-ellipses
2