Description
Neural Networks & Deep Learning
1. (25 points) Linear algebra refresher.
(a) (12 points) Let A be a square matrix, and further let AAT = I.
i. (3 points) Construct a 2 × 2 example of A and derive the eigenvalues and eigenvectors of this example. Show all work (i.e., do not use a computer’s eigenvalue
decomposition capabilities). You may not use a diagonal matrix as your 2 × 2
example. What do you notice about the eigenvalues and eigenvectors?
ii. (3 points) Show that A has eigenvalues with norm 1.
iii. (3 points) Show that the eigenvectors of A corresponding to distinct eigenvalues
are orthogonal.
iv. (3 points) In words, describe what may happen to a vector x under the transformation Ax.
(b) (8 points) Let A be a matrix.
i. (4 points) What is the relationship between the singular vectors of A and the
eigenvectors of AAT
? What about AT A?
ii. (4 points) What is the relationship between the singular values of A and the eigenvalues of AAT
? What about AT A?
(c) (5 points) True or False. Partial credit on an incorrect solution may be awarded if you
justify your answer.
i. Every linear operator in an n-dimensional vector space has n distinct eigenvalues.
ii. A non-zero sum of two eigenvectors of a matrix A is an eigenvector.
iii. If a matrix A has the positive semidefinite property, i.e., x
T Ax ≥ 0 for all x, then
its eigenvalues must be non-negative.
iv. The rank of a matrix can exceed the number of non-zero eigenvalues.
v. A non-zero sum of two eigenvectors of a matrix A corresponding to the same
eigenvalue λ is always an eigenvector.
2. (22 points) Probability refresher.
(a) (9 points) A jar of coins is equally populated with two types of coins. One is type “H50”
and comes up heads with probability 0.5. Another is type “H60” and comes up heads
with probability 0.6.
i. (3 points) You take one coin from the jar and flip it. It lands tails. What is the
posterior probability that this is an H50 coin?
1
ii. (3 points) You put the coin back, take another, and flip it 4 times. It lands T, H,
H, H. How likely is the coin to be type H50?
iii. (3 points) A new jar is now equally populated with coins of type H50, H55, and
H60 (with probabilities of coming up heads 0.5, 0.55, and 0.6 respectively. You
take one coin and flip it 10 times. It lands heads 9 times. How likely is the coin to
be of each possible type?
(b) (3 points) Consider a pregnancy test with the following statistics.
• If the woman is pregnant, the test returns “positive” (or 1, indicating the woman
is pregnant) 99% of the time.
• If the woman is not pregnant, the test returns “positive” 10% of the time.
• At any given point in time, 99% of the female population is not pregnant.
What is the probability that a woman is pregnant given she received a positive test?
The answer should make intuitive sense; given an explanation of the result that you
find.
(c) (5 points) Let x1, x2, . . . , xn be identically distributed random variables. A random
vector, x, is defined as
x =
x1
x2
.
.
.
xn
What is E (Ax + b) in terms of E(x), given that A and b are deterministic?
(d) (5 points) Let
cov(x) = E
(x − Ex)(x − Ex)
T
What is cov(Ax + b) in terms of cov(x), given that A and b are deterministic?
3. (13 points) Multivariate derivatives.
(a) (2 points) Let x ∈ R
n
, y ∈ R
m, and A ∈ R
n×m. What is ∇xx
T Ay?
(b) (2 points) What is ∇yx
T Ay?
(c) (3 points) What is ∇Ax
T Ay?
(d) (3 points) Let f = x
T Ax + b
T x. What is ∇xf?
(e) (3 points) Let f = tr(AB). What is ∇Af?
4. (10 points) Deriving least-squares with matrix derivatives.
In least-squares, we seek to estimate some multivariate output y via the model
yˆ = Wx
In the training set we’re given paired data examples (x
(i)
, y
(i)
) from i = 1, . . . , n. Leastsquares is the following quadratic optimization problem:
min
W
1
2
Xn
i=1
y
(i) − Wx(i)
2
Derive the optimal W.
Hint: you may find the following derivatives useful:
∂tr(WA)
∂W
= AT
∂tr(WAWT
)
∂W
= WAT + WA
5. (30 points) Hello World in Jupyer.
Complete the Jupyter notebook linear regression.ipynb. Print out the Jupyter notebook
and submit it to Gradescope.
3