CPE/EE/AAI 695 Applied Machine Learning Homework 2

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (1 vote)

1. [10 points] In Module 2, we gave the normal equation (i.e., closed-form solution) for linear regression
using MSE as the cost function.

Prove that the closed-form solution for Ridge Regression
is 𝒘 = (𝜆𝐼 + 𝑋
𝑇
∙ 𝑋)
−1
∙ 𝑋
𝑇
∙ 𝒚, where 𝐼 is the identity matrix, 𝑋 = (𝑥
(1)
, 𝑥
(2)
, … , 𝑥
(𝑚)
)
𝑇
is the input
data matrix, 𝑥
(𝑖) = (1, 𝑥1, 𝑥2, … , 𝑥𝑛) is the 𝑖-th data sample, and 𝒚 = (𝑦
(1)
, 𝑦
(2)
, … , 𝑦
𝑚).

Assume the
hypothesis function ℎ𝑤(𝑥) = 𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 + ⋯ + 𝑤𝑛𝑥𝑛 , and 𝑦
(𝑗)
is the measurement of
ℎ𝑤(𝑥) for the 𝑗-th training sample.

The cost function of the Ridge Regression is 𝐸(𝒘) =
∑ (𝒘𝑇
∙ 𝒙
(𝒊) − 𝑦
(𝑖)
)
𝑚
𝑖=1
2 + 𝜆 ∑ 𝑤𝑖
𝑚 2
𝑖=1
.

2. [10 points] Assume we have K different classes in a multi-class Softmax Regression model. The
posterior probability is 𝑝̂𝑘 = 𝛿(𝑠𝑘
(𝑥))𝑘 =
exp (𝑠𝑘(𝑥))
∑ exp (𝑠𝑗
(𝑥))
𝐾
𝑗=1
for 𝑘 = 1, 2, … ,𝐾, where 𝑠𝑘
(𝑥) = 𝜃𝑘
𝑇
∙ 𝑥,
input 𝑥 is an n-dimension vector, and K the total number of classes.

1) To learn this Softmax Regression model, how many parameters we need to estimate? What are
these parameters?

2) Consider the cross-entropy cost function 𝐽(𝛩) of 𝑚 training samples {(𝑥𝑖
, 𝑦𝑖)}𝑖=1,2,…,𝑚 as below.

Derive the gradient of 𝐽(𝛩) regarding to 𝜃𝑘.
𝐽(𝛩) = −
1
𝑚
∑∑𝑦𝑘
(𝑖)
log (𝑝̂𝑘
(𝑖)
)
𝐾
𝑘=1
𝑚
𝑖=1
where 𝑦𝑘
(𝑖) = 1 if the ith instance belongs to class k; 0 otherwise.

3. [44 points] Write a program to find the coefficients for a linear regression model for the dataset
provided (data2.txt). Assume a linear model: y = w0 + w1*x.

You need to
1) Plot the data (i.e., x-axis for the 1st column, y-axis for the 2nd column),
and use Python to implement the following methods to find the coefficients:
2) Normal equation, and

3) Gradient Descent using batch AND stochastic modes respectively:
a) Split dataset into 80% for training and 20% for testing.

b) Plot MSE vs. iteration of each mode for both training set and testing set; compare
batch and stochastic modes (with discussion) in terms of accuracy (of testing set) and
speed of convergence (You need to determine an appropriate termination condition,
e.g., when cost function is less than a threshold, and/or after a given number of
iterations.)

c) Plot MSE of the testing set vs. learning rate (using 0.001, 0.002, 0.003, 0.004, 0.005,
0.006, 0.007, 0.008, 0.009, 0.01) and determine the best learning rate.
Please implement the algorithms by yourself and do NOT use the fit() function of the library.