CS/ECE/ME532 Activity 23

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (1 vote)

1. Consider the neural network with three input nodes, two hidden nodes, and two output
nodes shown below. The numbers by the edges are the weights that are applied to
the output of the corresponding nodes. Use the ReLU activation σ(z) = max{0, z}.

Suppose the values at the input nodes are u1 = 4, u2 = −2. Find the values of the
hidden nodes and the output nodes.

2. We use the single neuron shown in the figure for classification. Here x
i
j
is the j-th
feature in the i-th training sample and the output is ˆy
i = σ
PP
j=0 wjx
i
j

.

The stochastic gradient descent update for the weights at step t is
w(t+1) = w(t) − αt∇f
it
(w(t)
)
where αt
is the step size and f
it (w(t)
) is the loss function associated with training
sample it

a) Write the expression for ˆy
i
in terms of the weights wj and the input x
i
j
for the
ReLU activation function σ(z) = max{0, z}.

b) Suppose squared-error loss is used to find the weights, that is, the loss function is
f(w) = 1
2
Pn
i=1(ˆy
i − y
i
)
2 where y
i are the labels for feature sample x
i
.

i. Find the gradient of f(w) with respect to wj assuming the ReLU activation
function.

ii. Write out psuedo-code for implementing SGD to learn the weights w given
n training samples (features and labels) {x
i
, yi
, i = 1, 2, . . . n} assuming the
ReLU activation function.

c) Now suppose we use ridge regression for the loss function
f
i
(w) = 1
2
(ˆy
i − y
i
)
2 + λ
X
P
j=0
w
2
j
and we use the logistic activation function σ(z) = (1+e
−z
)
−1
. Derive the gradient
for the update step, ∇f
it (w(t)
) and write the update equation for w(t+1)