## Description

1. Consider the neural network with three input nodes, two hidden nodes, and two output

nodes shown below. The numbers by the edges are the weights that are applied to

the output of the corresponding nodes. Use the ReLU activation σ(z) = max{0, z}.

Suppose the values at the input nodes are u1 = 4, u2 = −2. Find the values of the

hidden nodes and the output nodes.

2. We use the single neuron shown in the figure for classification. Here x

i

j

is the j-th

feature in the i-th training sample and the output is ˆy

i = σ

PP

j=0 wjx

i

j

.

The stochastic gradient descent update for the weights at step t is

w(t+1) = w(t) − αt∇f

it

(w(t)

)

where αt

is the step size and f

it (w(t)

) is the loss function associated with training

sample it

a) Write the expression for ˆy

i

in terms of the weights wj and the input x

i

j

for the

ReLU activation function σ(z) = max{0, z}.

b) Suppose squared-error loss is used to find the weights, that is, the loss function is

f(w) = 1

2

Pn

i=1(ˆy

i − y

i

)

2 where y

i are the labels for feature sample x

i

.

i. Find the gradient of f(w) with respect to wj assuming the ReLU activation

function.

ii. Write out psuedo-code for implementing SGD to learn the weights w given

n training samples (features and labels) {x

i

, yi

, i = 1, 2, . . . n} assuming the

ReLU activation function.

c) Now suppose we use ridge regression for the loss function

f

i

(w) = 1

2

(ˆy

i − y

i

)

2 + λ

X

P

j=0

w

2

j

and we use the logistic activation function σ(z) = (1+e

−z

)

−1

. Derive the gradient

for the update step, ∇f

it (w(t)

) and write the update equation for w(t+1)