## Description

Objective

The objective of this homework are:

(a) Familiarizing yourself with basic tools in Python that can perform the following tasks: reading and

processing of data files (csv format), and numerically solving specified optimization problems;

(b) Review of important concepts from linear algebra and optimization theory, with emphasis on linear

least square estimation;

(c) Implement a linear classification algorithm, visualize its decision boundary, and test its accuracy.

You will be asked some of these questions in Quiz 2.

Get Started

In this homework, you need to numerically solve several convex optimization problems in Python. We

will use CVXPY. CVXPY is a Python-embedded modeling language with a user-friendly API for convex

optimization problems.

In Google Colab, you can call CVXPY. Here is a toy example taken from the software’s website:

import cvxpy as cp

import numpy as np

# Problem data.

m = 30

n = 20

np.random.seed(1)

A = np.random.randn(m, n)

b = np.random.randn(m)

# Construct the problem.

x = cp.Variable(n)

objective = cp.Minimize(cp.sum_squares(A@x – b))

constraints = [0 <= x, x <= 1]

prob = cp.Problem(objective, constraints)

# The optimal objective value is returned by `prob.solve()`.

result = prob.solve()

# The optimal value for x is stored in `x.value`.

1

print(x.value)

# The optimal Lagrange multiplier for a constraint is stored in

# `constraint.dual_value`.

print(constraints[0].dual_value)

For more examples, head over to https://www.cvxpy.org/examples/index.html.

Caution: In the newest version of CVXPY, matrix-matrix and matrix-vector multiplication uses @ (@),

e.g. if A is a matrix and x is a vector, then A@x is the matrix-vector multiplication of the pair. The basic

example on https://www.cvxpy.org/ still uses the deprecated A*x, which could cause instability in certain

scenarios.

Exercise 1: Loading Data via Python

The National Health and Nutrition Examination Survey (NHANES) is a program to assess the health and

nutritional status of adults and children in the United States 1

. The complete survey result contains over

4,000 samples of health-related data of individuals who participated in the survey between 2011 and 2014.

In this homework, we will focus on two categories of the data for each individual: the height (in mm) and

body mass index (BMI). The data is divided into two classes based on gender. Table 1 contains snippets of

the data.

index female bmi female stature mm

0 28.2 1563

1 22.2 1716

2 27.1 1484

3 28.1 1651

index male bmi male stature mm

0 30 1679

1 25.6 1586

2 24.2 1773

3 27.4 1816

Table 1: Male and Female Data Snippets

Use csv.reader to read the training data files for the two classes of data. Specifically, you may call the

commands below:

import numpy as np

import matplotlib.pyplot as plt

import cvxpy as cp

import csv

# Reading csv file for male data

with open(“male_train_data.csv”, “r”) as csv_file:

reader = csv.reader(csv_file, delimiter=’,’)

# Add your code here to process the data into usable form

csv_file.close()

# Reading csv file for female data

with open(“female_train_data.csv”, “r”) as csv_file:

reader = csv.reader(csv_file, delimiter=’,’)

# Add your code here to process the data into usable form

csv_file.close()

Important: Before proceeding to the problems, please be careful that numerical solvers can perform very

badly if the inputs to the problem are badly scaled, e.g. many coefficients being orders of magnitude apart;

this is known as numerical instability. It is exactly the case for our matrix X. To avoid CVXPY (by

default, the ECOS solver in it) returning bizarre solutions, please

1https://www.cdc.gov/nchs/nhanes/index.htm

2

• normalize the number in male_stature_mm and female_stature_mm by dividing them with 1000, and

• normalize that of male_bmi and female_bmi by dividing them with 10.

This will significantly reduce the numerical error.

Print the first 10 elements of each column of the dataset. That is, print

• The first 10 entries of female BMI;

• The first 10 entries of female stature;

• The first 10 entries of male BMI;

• The first 10 entries of male stature.

Please submit your code, and submit your results.

Exercise 2: Build a Linear Classifier via Optimization

Consider a linear model:

gθ = θ

T x, (1)

The regression problem we want to solve is

θb = argmin

θ∈Rd

X

N

n=1

(yn − gθ(xn))2

,

where D = {(xn, yn)}

N

n=1 is the training dataset. Putting the equation into the matrix form, we know that

the optimization is equivalent to

θb = argmin

θ∈Rd

∥y − Xθ∥

2

| {z }

Etrain(θ)

.

(a) Derive the solution θb. State the conditions under which the solution is the unique global minimum in

terms of the rank of X. Suggest two techniques that can be used when XT X is not invertible.

(b) For the NHANES dataset, assign yn = +1 if the n-th sample is a male, and yn = −1 if the n-th sample

is a female. Implement your answer in (a) with Python to solve the problem. Report your answer, and

submit your code.

(c) Repeat (b), but this time use CVXPY. Report your answer, and submit your code.

(d) Derive the gradient descent step for this problem. That is, find the gradient ∇Etrain(θ

k

) and substitute

it into the equation:

θ

k+1 = θ

k − α

k∇Etrain(θ

k

).

If you use the exact line search, what is the optimal step size α

k

(for each iteration k)?

(e) Implement the gradient descent algorithm in Python. Report your answer, and submit your code.

Initialize θ

0 = 0. Use 50000 iterations.

(f) For the gradient descent algorithm you implemented in (e), plot the training loss as a function of

iteration number using plt.semilogx. Use linewidth = 8.

(g) Implement the momentum method, with β = 0.9. Initialize θ

0 = 0. Use 50000 iterations.

θ

k+1 = θ

k − α

k

β∇Etrain(θ

k−1

) + (1 − β)∇Etrain(θ

k

)

.

Here, the step size α

k

can be determined through the exact line search.

(h) For the momentum method you implemented in (g), plot the training loss as a function of iteration

number using plt.semilogx. Use linewidth = 8.

Hint: If you do everything right, the answers found by the analytic expression, CVXPY, and gradient

descent should be the same.

3

Exercise 3: Visualization and Testing

We want to do a classification based on the linear model we found in Exercise 2. The classifier we will use is

predicted label = sign(gθ(x)), (2)

where x ∈ R

d

is the a test sample. Here, we label +1 for male and -1 for female. Because the dataset we

consider in this exercise has only two columns, the linear model is

gθ(x) = θ0 + θ1×1 + θ2×2,

where x = [1, x1, x2]

T

is the input data, and θ = [θ0, θ1, θ2]

T

is the parameter vector.

(a) First, we want to visualize the classifier.

(i) Plot the training data points of the male and female classes using matplotlib.pyplot.scatter.

Mark the male class with blue circles, and the female class as red dots.

(ii) Plot the decision boundary gθ(·), and overlay it with the data plotted in (a). Hint: gθ(·) is a

straight line in 2D. You can express x2 in terms of x1 and other parameter.

(b) Let us report the classification accuracy of male_test_data.csv and female_test_data.csv. To do

so, take a testing data x and compute the prediction according to (2).

(i) What is the Type 1 error (False Alarm) of classifying male? That is, what is the percentage of

testing samples that should be female but you predicted it as a male. You can check the definition

of Type 1 and Type 2 error on Wikipedia2

.

(ii) What is the Type 2 error (Miss) of classifying male? That is, what is the percentage of testing

samples that should be male but you predicted it as a female.

(iii) What is the precision and recall for this classifier? For definition of precision and recall, you can

refer to Prof. Stanley Chan’s book, Chapter 9.5.4, or Wikipedia.

Exercise 4: Regularization

Please be sure to read the tutorial on optimization on the course website before doing this problem. Consider

the following three optimization problems:

θbλ = argmin

θ∈Rd

∥Xθ − y∥

2

2 + λ∥θ∥

2

2

, (3)

θbα = argmin

θ∈Rd

∥Xθ − y∥

2

2

subject to ∥θ∥

2

2 ≤ α, (4)

θbϵ = argmin

θ∈Rd

∥θ∥

2

2

subject to ∥Xθ − y∥

2

2 ≤ ϵ. (5)

(a) Set lambd = np.arange(0.1,10,0.1). Plot

• ∥Xθbλ − y∥

2

2 as a function of ∥θbλ∥

2

2

• ∥Xθbλ − y∥

2

2 as a function of λ

• ∥θbλ∥

2

2 as a function of λ

(b) (i) Write down the Lagrangian for each of the three problems. Note that the first problem does not

have any Lagrange multiplier. For the second and the third problem, you may use the notations:

• γα = the Lagrange multiplier of (4), and

• γϵ = the Lagrange multiplier of (5).

2https://en.wikipedia.org/wiki/Type_I_and_type_II_errors

4

(ii) State the first order optimality conditions (the Karush-Kuhn-Tucker (KKT) conditions) for each

of the three problems. See Tutorial 4 on Optimization, posted on the course website. Express

your answers in terms of X, θ, y, λ, α, ϵ, and the two Lagrange multipliers γα, γϵ.

(iii) Fix λ > 0. We can solve (3) to obtain θbλ. Find α and the Lagrange multiplier γα in (4) such

that θbλ would satisfy the KKT conditions of (4).

(iv) Fix λ > 0. We can solve (3) to obtain θbλ. Find ϵ and the Lagrange multiplier γϵ in (5) such that

θbλ would satisfy the KKT conditions of (5).

(v) This part follows from (iii). Fix λ > 0. By using the α and γα you found in (iii), you can show

that θbλ would satisfy the KKT conditions of (4). Is it enough to claim that θbλ is the solution

of (4)? If yes, why? If no, what else do we need to show? Please elaborate through a proof, if

needed.

Exercise 5: Project Check Point 2

By now, you should hear from us about your project assignment. We ask you to summarize in your own

words the assigned paper in one or two paragraphs. Specifically, answer the following questions in your

paragraph:

• What is problem that the proposed method addresses?

• Why is the problem important?

• What are the innovations of this paper compared to previous methods?

• Are there any existing implementations of the method? Which one will you start playing with?

• Identify at least one key concept in the assigned paper that you are not familiar and you need to learn

to reimplment the paper.

Please keep these questions in your mind as you read through the paper. In this checkpoint, we want you

to have an understanding of the big picture of the project and identify existing implementations that you

can play with. You don’t have to answer them in dense detail. But you must reasonably address all

above points in your paragraph to receive full points of this homework.

5