Assignment #2: COMP4434 Big Data Analytics




5/5 - (1 vote)

Question 1 [10 marks]

(a). [5 point] Consider using linear regression for binary classification on the label {0, 1}.
Here, we use a linear model
(𝑥) = 𝜃1𝑥 + 𝜃0
and squared error loss 𝐿 =
(𝑥) − 𝑦)

. The threshold of the prediction is set as
0.5, which means the prediction result is 1 if ℎ𝜃
(𝑥) ≥ 0.5 and 0 if ℎ𝜃
(𝑥) < 0.5.

However, this loss has the problem that it penalizes confident correct predictions, i.e.,
(𝑥) is larger than 1 or less than 0. Some students try to fix this problem by using an
absolute error loss 𝐿 = |ℎ𝜃
(𝑥) − 𝑦|. The question is: Will it fix the problem? Please
answer the question and explain it.

Furthermore, some other students try designing
another loss function as follows
𝐿 = {
max(0, ℎ𝜃
(𝑥)), 𝑦 = 0
⋯ , 𝑦 = 1

Although it is not complete yet, if it is correct in principle, please complete it and explain
how it can fix the problem. Otherwise, please explain the reason.

(b). [5 point] Consider the logistic regression model ℎ𝜃
(𝑥) = 𝑔(𝜃

𝑇𝑥), trained using the
binary cross entropy loss function, where 𝑔(𝑧) =
is the sigmoid function.

students try modifying the original sigmoid function into the following one
𝑔(𝑧) =

The model would still be trained using the binary cross entropy loss. How would the
model prediction rule, as well as the learnt model parameters 𝜃 , differ from
conventional logistic regression? Please show your answer and explanation.

Question 2 [20 marks]

Consider using logistic regression for classification problems. Four 3-dimensional data
points (𝑥1, 𝑥2, 𝑥3
and the corresponding labels 𝑦
are given as follows.

Data point 𝑥1 𝑥2 𝑥3 y
D1 -0.120 0.300 -0.010 1
D2 0.200 -0.030 -0.350 -1
D3 -0.370 0.250 0.070 -1
D4 -0.100 0.140 -0.520 1

The learning rate 𝜂 is set as 0.2 and the initial parameter 𝜃[0] is set as [-0.09, 0, -0.19, –
0.21]. Please answer the following questions.

a) [5 point] Calculate the initial predicted label for each data point.

b) [10 point] Calculate the parameter in the first and second iterations, i.e., 𝜃[1], 𝜃[2], by
using gradient descent algorithm.

c) [5 point] Implement the gradient descent algorithm to update the parameters 𝜃 using
python language. Please show the change trend diagram of loss function 𝐽(𝜃) in 50000
rounds and upload the source code file.

ps. For a) and b), the detailed calculation process is required and the intermediate and final
results should be rounded to 3 decimal places.