- Home
- COMP 3670/6670
- Assignment 2 COMP3670/6670: Introduction to Machine Learning

$40.00

Category: COMP 3670/6670

Description

5/5 - (7 votes)

Exercise 1 Inner Products induce Norms 20 credits

Let V be a vector space, and let h·, ·i : V × V → R be an inner product on V . Define ||x|| :=

p

hx, xi.

Prove that || · || is a norm.

(Hint: To prove the triangle inequality holds, you may need the Cauchy-Schwartz inequality, hx, yi ≤

||x||||y||.)

Exercise 2 Vector Calculus Identities 10+10 credits

1. Let x, a, b ∈ R

n. Prove that ∇x(x

T abT x) = a

T xbT + b

T xaT

.

2. Let B ∈ R

n×n, x ∈ R

n. Prove that ∇x(x

T Bx) = x

T

(B + BT

).

Exercise 3 Properties of Symmetric Positive Definiteness 10 credits

Let A, B be symmetric positive definite matrices. 1 Prove that for any p, q > 0 that pA + qB is also

symmetric and positive definite.

Exercise 4 General Linear Regression with Regularisation (10+10+10+10+10 credits)

Let A ∈ R

N×N , B ∈ R

D×D be symmetric, positive definite matrices. From the lectures, we can use

symmetric positive definite matrices to define a corresponding inner product, as shown below. From the

previous question, we can also define a norm using the inner products.

hx, yiA := x

T Ay

kxk

2

A := hx, xiA

hx, yiB := x

T By

kxk

2

B := hx, xiB

Suppose we are performing linear regression, with a training set {(x1, y1), . . . ,(xN , yN )}, where for each

i, xi ∈ R

D and yi ∈ R. We can define the matrix

X = [x1, . . . , xN ]

T

∈ R

N×D

and the vector

y = [y1, . . . , yN ]

T

∈ R

N .

We would like to find θ ∈ R

D, c ∈ R

N such that y ≈ Xθ + c, where the error is measured using k · kA.

We avoid overfitting by adding a weighted regularization term, measured using ||·||B. We define the loss

function with regularizer:

LA,B,y,X(θ, c) = ||y − Xθ − c||2

A + ||θ||2

B + kck

2

A

For the sake of brevity we write L(θ, c) for LA,B,y,X(θ, c).

For this question:

1A matrix is symmetric positive definite if it is both symmetric and positive definite.

• You may use (without proof) the property that a symmetric positive definite matrix is invertible.

• We assume that there are sufficiently many non-redundant data points for X to be full rank. In

particular, you may assume that the null space of X is trivial (that is, the only solution to Xz = 0

is the trivial solution, z = 0.)

1. Find the gradient ∇θL(θ, c).

2. Let ∇θL(θ, c) = 0, and solve for θ. If you need to invert a matrix to solve for θ, you should prove

the inverse exists.

3. Find the gradient ∇cL(θ, c).

We now compute the gradient with respect to c.

4. Let ∇cL(θ) = 0, and solve for c. If you need to invert a matrix to solve for c, you should prove the

inverse exists.

5. Show that if we set A = I, c = 0, B = λI, where λ ∈ R, your answer for 4.2 agrees with the analytic

solution for the standard least squares regression problem with L2 regularization, given by

θ = (XT X + λI)

−1XT y.

WhatsApp us