# CS/ECE/ME532 Activity 24

\$30.00

## Description

5/5 - (1 vote)

1. A script is available to train two neurons using stochastic gradient descent to solve
two different classification problems. The two classifier structures are shown below.

Here we use a logistic activation function σ(z) = (1 + e
−z
)
−1
. The code generates
training data and labels corresponding to two decision boundaries: x
i
2 = −2x
i
1 + 0.2,
and x
i
2 = 5(x
i
1
)
3
.

a) Do you expect that a single neuron will be able to accurately classify data from
case 1? Why or why not? Explain the impact of the bias term associated with
w1,0.

b) Do you expect that a single neuron will be able to accurately classify data from
case 2? Why or why not? Explain the impact of the bias term associated with
w2,0.

c) Run SGD for one epoch. This means you cycle through all the training data one
time, in random order. Repeat this five times and find the average number of
errors in cases 1 and 2.

d) Run SGD over twenty epochs. This means you cycle through all the training
data twenty times, in random order. Repeat this five times and find the average
number of errors in cases 1 and 2.

e) Explain the differences in classification performance for the two cases that result
with both one and twenty epochs.

2. This remainder of this activity uses a three-layer neural network with three input nodes
and two output nodes to solve two classification problems. We will vary the number
of hidden nodes. The figure below depicts the structure when there are two hidden
nodes.

A second script is available that generates training data and trains the network using
SGD assuming a logistic activation function σ(z) = (1 + e
−z
)
−1
.

a) Use M = 2 hidden nodes and ten epochs in SGD. Run this four or five times and
comment on the performance of the two classifiers and whether it varies from run
to run.

b) Repeat M = 2 but use 100 epochs in SGD. (You may use fewer epochs if it takes
more than a minute or two per run.) Run this several times and comment on the
performance of the classifiers and whether it varies from run to run.

c) Recall the two-layer network results from the previous problem. How do the
possible decision boundaries change when you add a hidden layer?

d) Now use M = 3 hidden nodes and run 100 epochs of SGD (or as many as you
can compute). Does going from two to three hidden nodes affect classifier performance?

e) Repeat the previous part for M = 4 hidden nodes and comment on classifier
performance.