CS/ECE/ME532 Assignment 10

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (1 vote)

1. Neural net functions
a) Sketch the function generated by the following 3-neuron ReLU neural network.
f(x) = 2(x − 0.5)+ − 2(2x − 1)+ + 4(0.5x − 2)+
where x ∈ R and where (z)+ = max(0, z) for any z ∈ R. Note that this is a
single-input, single-output function. Plot f(x) vs x by hand.

b) Consider the continuous function depicted below. Approximate this function with
ReLU neural network with 2 neurons. The function should be in the form
f(x) = X
2
j=1
vj (wjx + bj )+
Indicate the weights and biases of each neuron and sketch the neural network
function.

c) A neural network fw can be used for binary classification by predicting the label as
yˆ = sign(fw(x)). Consider a setting where x ∈ R
2 and the desired classifier is −1
if both elements of x are less than or equal to zero and +1 otherwise. Sketch the
desired classification regions in the two-dimensional plane, and provide a formula
for a ReLU network with 2-neurons that can produce the desired classification.

For simplicity, assume in this questions that sign(0) = −1.

2. Gradients of a neural net.
P
Consider a 2 layer neural network of the form f(x) =
J
j=1 vj (wT
j x)+. Suppose we want to train our network on a dataset of N samples xi
with corresponding labels yi
, using a least squares loss function L =
Pn
i=1(f(xi) − yi)
2
.

Derive the gradient descent update steps for the input weights wj and output weights
vj
.

3. Compressing neural nets. Large neural network models can be approximated by
P
considering low rank approximations to weight matrices. The neural network f(x) =
J
j=1 vj (wT
j x)+ can be written as
f(x) = v
T
(Wx)+.

where v is a J × 1 vector of the output weights and W is a J × d matrix with ith
row wT
j
. Let σ1, σ2, . . . denote the singular values of W and assume that σi ≤  for
i > r. Let fr denote the neural network obtained by replacing W with its best rank
r approximation Wˆ
r.

Assuming that x has unit norm, find an upper bound to the
difference maxx |f(x) − fr(x)|. (Hint: for any pair of vectors a and b, the following
inequality holds ka+ − b+k2 ≤ ka − bk2).

4. Face Emotion Classification with a three layer neural network. In this problem
we return to the face emotion data studied previously. You may find it very helpful to
use code from an activity (or libraries such as Keras and Tensorflow).

a) Build a classifier using a full connected three layer neural network with logistic
activation functions. Your network should

• take a vector x ∈ R
10 as input (nine features plus a constant offset),

• have a single, fully connected hidden layer with 32 neurons
• output a scalar yb.

Note that since the logistic activation function is always positive, your decision
should be as follows: y >b 0.5 corresponds to a ‘happy’ face, while yb ≤ 0.5 is not
happy.

b) Train your classifier using stochastic gradient descent (start with a step size of
α = 0.05) and create a plot with the number of epochs on the horizontal axis, and
training accuracy on the vertical axis. Does your classifier achieve 0% training
error? If so, how many epoch does it take for your classifier to achieve perfect
classification on the training set?

c) Find a more realistic estimate of the accuracy of your classifier by using 8-fold
cross validation. Can you achieve perfect test accuracy?