## Description

## Question 1 [10 marks]

(a). [5 point] Consider using linear regression for binary classification on the label {0, 1}.

Here, we use a linear model

ℎ𝜃

(𝑥) = 𝜃1𝑥 + 𝜃0

and squared error loss 𝐿 =

1

2

(ℎ𝜃

(𝑥) − 𝑦)

2

. The threshold of the prediction is set as

0.5, which means the prediction result is 1 if ℎ𝜃

(𝑥) ≥ 0.5 and 0 if ℎ𝜃

(𝑥) < 0.5.

However, this loss has the problem that it penalizes confident correct predictions, i.e.,

ℎ𝜃

(𝑥) is larger than 1 or less than 0. Some students try to fix this problem by using an

absolute error loss 𝐿 = |ℎ𝜃

(𝑥) − 𝑦|. The question is: Will it fix the problem? Please

answer the question and explain it.

Furthermore, some other students try designing

another loss function as follows

𝐿 = {

max(0, ℎ𝜃

(𝑥)), 𝑦 = 0

⋯ , 𝑦 = 1

.

Although it is not complete yet, if it is correct in principle, please complete it and explain

how it can fix the problem. Otherwise, please explain the reason.

(b). [5 point] Consider the logistic regression model ℎ𝜃

(𝑥) = 𝑔(𝜃

𝑇𝑥), trained using the

binary cross entropy loss function, where 𝑔(𝑧) =

1

1+𝑒−𝑧

is the sigmoid function.

Some

students try modifying the original sigmoid function into the following one

𝑔(𝑧) =

𝑒

−𝑧

1+𝑒−𝑧

.

The model would still be trained using the binary cross entropy loss. How would the

model prediction rule, as well as the learnt model parameters 𝜃 , differ from

conventional logistic regression? Please show your answer and explanation.

2

## Question 2 [20 marks]

Consider using logistic regression for classification problems. Four 3-dimensional data

points (𝑥1, 𝑥2, 𝑥3

)

𝑖

and the corresponding labels 𝑦

i

are given as follows.

Data point 𝑥1 𝑥2 𝑥3 y

D1 -0.120 0.300 -0.010 1

D2 0.200 -0.030 -0.350 -1

D3 -0.370 0.250 0.070 -1

D4 -0.100 0.140 -0.520 1

The learning rate 𝜂 is set as 0.2 and the initial parameter 𝜃[0] is set as [-0.09, 0, -0.19, –

0.21]. Please answer the following questions.

a) [5 point] Calculate the initial predicted label for each data point.

b) [10 point] Calculate the parameter in the first and second iterations, i.e., 𝜃[1], 𝜃[2], by

using gradient descent algorithm.

c) [5 point] Implement the gradient descent algorithm to update the parameters 𝜃 using

python language. Please show the change trend diagram of loss function 𝐽(𝜃) in 50000

rounds and upload the source code file.

ps. For a) and b), the detailed calculation process is required and the intermediate and final

results should be rounded to 3 decimal places.