## Description

1. Neural net functions

a) Sketch the function generated by the following 3-neuron ReLU neural network.

f(x) = 2(x − 0.5)+ − 2(2x − 1)+ + 4(0.5x − 2)+

where x ∈ R and where (z)+ = max(0, z) for any z ∈ R. Note that this is a

single-input, single-output function. Plot f(x) vs x by hand.

b) Consider the continuous function depicted below. Approximate this function with

ReLU neural network with 2 neurons. The function should be in the form

f(x) = X

2

j=1

vj (wjx + bj )+

Indicate the weights and biases of each neuron and sketch the neural network

function.

c) A neural network fw can be used for binary classification by predicting the label as

yˆ = sign(fw(x)). Consider a setting where x ∈ R

2 and the desired classifier is −1

if both elements of x are less than or equal to zero and +1 otherwise. Sketch the

desired classification regions in the two-dimensional plane, and provide a formula

for a ReLU network with 2-neurons that can produce the desired classification.

For simplicity, assume in this questions that sign(0) = −1.

2. Gradients of a neural net.

P

Consider a 2 layer neural network of the form f(x) =

J

j=1 vj (wT

j x)+. Suppose we want to train our network on a dataset of N samples xi

with corresponding labels yi

, using a least squares loss function L =

Pn

i=1(f(xi) − yi)

2

.

Derive the gradient descent update steps for the input weights wj and output weights

vj

.

3. Compressing neural nets. Large neural network models can be approximated by

P

considering low rank approximations to weight matrices. The neural network f(x) =

J

j=1 vj (wT

j x)+ can be written as

f(x) = v

T

(Wx)+.

where v is a J × 1 vector of the output weights and W is a J × d matrix with ith

row wT

j

. Let σ1, σ2, . . . denote the singular values of W and assume that σi ≤ for

i > r. Let fr denote the neural network obtained by replacing W with its best rank

r approximation Wˆ

r.

Assuming that x has unit norm, find an upper bound to the

difference maxx |f(x) − fr(x)|. (Hint: for any pair of vectors a and b, the following

inequality holds ka+ − b+k2 ≤ ka − bk2).

4. Face Emotion Classification with a three layer neural network. In this problem

we return to the face emotion data studied previously. You may find it very helpful to

use code from an activity (or libraries such as Keras and Tensorflow).

a) Build a classifier using a full connected three layer neural network with logistic

activation functions. Your network should

• take a vector x ∈ R

10 as input (nine features plus a constant offset),

• have a single, fully connected hidden layer with 32 neurons

• output a scalar yb.

Note that since the logistic activation function is always positive, your decision

should be as follows: y >b 0.5 corresponds to a ‘happy’ face, while yb ≤ 0.5 is not

happy.

b) Train your classifier using stochastic gradient descent (start with a step size of

α = 0.05) and create a plot with the number of epochs on the horizontal axis, and

training accuracy on the vertical axis. Does your classifier achieve 0% training

error? If so, how many epoch does it take for your classifier to achieve perfect

classification on the training set?

c) Find a more realistic estimate of the accuracy of your classifier by using 8-fold

cross validation. Can you achieve perfect test accuracy?