There are three things you will learn in this homework:
(a) Understand some theoretical properties about logistic regression.
(b) Implement a logistic regression in CVX, and visualize the decision boundary.
(c) Apply kernel trick to logistic regression.
You will be asked some of these questions in Quiz 4.
Exercise 1. Logistic regression + gradient descent
We analyze the convergence behavior of the logistic regression when the data is linearly separable.
Recall that the logistic regression tries to minimize the cross-entropy loss:
θb = argmin
yn log hθ(xn) + (1 − yn) log(1 − hθ(xn))o
where hθ(x) = 1
is the sigmoid function. As usual, we assume that θ = [w, w0]. We consider the
gradient descent algorithm. We know that the objective function is convex, and so we consider the gradient
descent algorithm. The iterations are (if you take the derivative of J(θ) and rearrange the terms):
(k+1) = θ
(k) − αk
(k) (xn) − yn)xn
for some choices of the step size αk.
(i) Prove that if two classes of data in R
d are linearly separable, then the magnitude of the slope ∥w∥2
and the magnitude of the intercept |w0| would tend to ∞.
(ii) What happens if we restrict ∥w∥2 ≤ c1 and |w0| < c2 for some c1, c2 > 0? What other ways can you
come up with to counter the nonconvergence issue?
(iii) Does linear separability of data cause nonconvergence for the other linear classifiers that we have
Exercise 2. Implement logistic regression
Download hw4_data.zip from Brightspace. There are two classes with class labels yn = 1 and yn = 0.
(a) Show that the logistic regression loss is given by
J(θ) = −
1 + e
(b) Introduce a regularization term λ∥θ∥
2 and normalize the data fidelity term so that the loss becomes
J(θ) = −
1 + e
Use CVXPY to minimize this loss function for the dataset I provided. Your θ should be θ = [θ2, θ1, θ0]
Please use λ = 0.0001.
(c) Scatter plot the data points by marking the two classes in two colors. Then plot the decision boundary.
(d) Repeat (c), but this time using the Bayeisn decision rule. Note that since the covariance matrices are
not identical, the decision boundary is not a straight line. To plot the decision boundary, you can
create a grid of testing sites in the range of [−5, 10] × [−5, 10] (with 100 points along each dimension).
Evaluate the decision on these testing sites. And then plot the decision using plt.contour.
Exercise 3. Kernel trick
Let us continue to use the dataset in Exercise 2. Our goal here is to implement the kernel trick.
(a) In Python, construct the kernel matrix K, where
[K]m,n = exp
−∥xm − xn∥
where h = 1. Print K[47:52,47:52].
(b) Let us assume that that θ =
n=1 αnxn for some αn’s. Then
T x =
The kernel trick says that we can replace ⟨xn, x⟩ by a kernel K(xn, x). Apply the kernel trick to the
loss function in (3). Show that the new loss is
J(α) = −
TKα − 1
0 + e
(c) Implement the kernel logistic regression training in CVXPY. Report the first two elements of the
regression coefficients α.
(d) Scatter plot the data points and plot the decision boundary, just like what you did in Exercise 2(d).
Exercise 4. Project Check Point # 4
At this checkpoint, I expect you to have started reimplementing the assigned paper. Here are a couple
important reminders for you.
• Start simple. For the final project, we don’t need you to redo the experiment in the assigned paper.
The purpose of this project is for you to reimplement the method so that it can be used for your own
tasks. Therefore, start with as many assumptions as you need to keep your problem tractable. For
example, you can assume the data you work on is low dimensional, or you can use simple dataset at
first. The most important thing is to demonstrate something that works as soon as possible. Then,
you can gradually make your program more complicated.
• Leverage existing Python packages. You are allowed and encouraged to use many popular Python
packages, such as numpy, TensorFlow, PyTorch, opencv, etc. For many basic operations that you
want to achieve, it is likely that there are already implementations in these packages. So always check
or ask first before you implement something basic by yourself.
• Start early. Start early. Start early.
Here are some good programming habits I appreciate. I know it might be hard to follow at first, but if
you follow them for long enough time, they will gradually become your muscle memory. Furthermore, these
habits will benefit you A LOT in the long run if you plan to work for a coding company.
• Keep good naming conventions for your variables and functions, in which case you will not even need
to write comments. Good code explains themselves. For example, here is a sample Python naming
• Do not write functions that is too long, which will be difficult to debug. Keep a good structure for
your program. Do not copy and paste the same code snippet in your program. If you want to reuse
the same code snippet, write a function for it.
• Do not use a lot of for loops. It will make your Python program super slow. Try to vectorize your program. See this page for more details. https://www.geeksforgeeks.org/vectorization-in-python/.
• Train yourself to master debugging. It is normal that your code does not do what you want. The time
to debug your code is the most valuable in terms of improving your programming skills. Everyone
will have their own way to debug. My favorite debug tool is adding breakpoint in the program using
For this checkpoint, you need to create a new section in your overleaf document. In the section, write down
the following things:
• Write down the assumptions that you are making for your project.
• Draw a diagram that describes the structure of your code. If you don’t know how to do it, simply use a
box to represent each function in your program, specify what are the inputs and outputs, and connect
the boxes according to your code structure. This will help us understand how you design your code.
• Report any errors that you encounter when programming. If you have resolved the problem, what did
you do? If not, what have you tried and why do you suspect that you cannot solve the problem?