## Description

1. When we use empirical cost to approximate the expected cost,

Ex∼X [D(M∗

(x),M,x)] ≈

1

N

N

∑

n=1

D(M∗

(x

n

),M,x

n

)

is it okay to weigh each per-example cost equally? Given that we established that

not every data x is equally likely, is taking the sum of all per-example costs and dividing

by N reasonable? Should we weigh each per-example cost differently, depending on

how likely each x is? Justify your answer.

2. A perceptron is defined as follows:

M(x) = sign(w

>x+b),

where w ∈ R

d

,x ∈ R

d

and b ∈ R. Why is the bias b necessary? Provide an example

where it is necessary.

3. We used the following distance function for perceptron in the lecture:

D(M∗

(x),M,x) = −(M∗

(x)−M(x))

w

>x+b

.

This distance function has a problem of a trivial solution. What is the trivial solution?

Propose a solution to this.

1

4. The distance function of logistic regression was defined as

D(y

∗

,w,x) = −(y

∗

logM(x) + (1−y

∗

)log(1−M(x))).

Derive its gradient with respect to the weight vector w step-by-step.

5. (Programming Assignment) Complete the implementation of perceptron and logistic regression using Python and scikit-learn. The completed notebooks must be

submitted together with the answers to the questions above.

When submitting Jupyter notebooks, make sure to save printed outputs as well.

Perceptron https://github.com/nyu-dl/Intro_to_ML_Lecture_Note/

blob/master/notebook/Perceptron1.ipynb

Logistic Regression https://github.com/nyu-dl/Intro_to_ML_Lecture_

Note/blob/master/notebook/Logistic%20Regression%201.ipynb

2