Description
1 Separability
Given a set of points {xn}, we define the convex hull to be the set of all points x given by
x =
X
n
αnxn
where αn ≥ 0 and P
n αn = 1. Consider a second set of points {x
0
m} together with their corresponding convex hull. By definition, the two sets of points will be linearly separable if there exists
a vector w and a scalar w0 such that w>xn + w0 > 0 for all xn, and w>x
0
m + w0 < 0 for all x
0
m.
Show that if the two sets of points are linearly separable, their convex hulls do not intersect.
2 Logistic regression and gradient descent
(a) Let σ(a) = 1
1+e−a be the logistic sigmoid function. Show that σ
0
(a) = σ(a) (1 − σ(a)).
(b) For a training set {(x
(i)
, y(i)
)}
n
i=1 (with y
(i) ∈ {1, 0}), the loss function (commonly referred to
as the cross entropy loss) for logistic regression is
Lw({(x
(i
), y(i)
)}
n
i=1) = Xn
i=1
−y
(i)
log[hw(x
(i)
)] − (1 − y
(i)
) log[1 − hw(x
(i)
)]
where hw(x) = σ(w>x). Here w parametrizes the model. As we have seen in lecture, this
loss is the negated log-likelihood. Compute
∂Lw({(x
(i
), y(i)
)}
n
i=1)
∂wj
.
(c) Show that the cross entropy loss of logistic regression is convex.
(d) As we see from (c), the loss function of logistic regression is convex and thus can naturally be
optimized using gradient descent. Next we implement gradient descent for logistic regression.
Please go through the skeleton code in logistic-regression-2d.ipynb and fill in the holes in the
code (marked with “YOUR CODE HERE”). Make sure that you pass all check points and
answer the questions raised in the skeleton code. The skeleton code is in Python 2. If you are
using another language, please make sure it has all corresponding functions and can handle
all tasks specified in the skeleton code. It is highly recommended to just follow the notebook
provided.
1
3 Boosting
(a) Assume that the weak learning assumption holds (on the training set), show that the boosted
model eventually classifies the training set perfectly (i.e. training error equals to zero).
Hint: look at the class notes.
(b) Suppose that the points are weighted differently in the training set. More specifically, the
training set is {(xi
, yi)}
n
i=1 where each point (xi
, yi) in the training set has weight wi and the
objective is defined as
R
train(λ) = Xn
i=1
wie
−(Mλ)i
Derive the weighted version of AdaBoost based on this new objective.
Hint: the change to AdaBoost is surprisingly small.
(c) In this problem, you will implement your own boosted decision stumps (trees with two leafs).
Follow the skeleton code in adaboost-3c.ipynb and fill in the holes in the code (marked with
“YOUR CODE HERE”). Make sure that you pass all check points and answer the questions
raised in the skeleton code. The skeleton code is in Python 2. If you are using another language,
please make sure it has all corresponding functions and can handle all tasks specified in the
skeleton code. It is highly recommended to just follow the notebook provided.