## Description

## 1 True/False Questions (10 pts)

For each question, please provide a short explanation to support your judgment.

Problem 1.1 (2 pts) The overfitting models can perfectly fit the training data. We should increase the

noise in the training data and the number of parameters to improve NN’s generalization ability.

Problem 1.2 (2 pts) Given a learning task that can be perfectly learned by a Madaline model, the same

set of weight values will be achieved after training, no matter how the Madaline is initialized.

Problem 1.3 (2 pts) The error surface can be complicated. The direction of steepest descent is not

always the direction towards the minimum. Full batch size can keep the direction of steepest descent

perpendicular to the contour lines. Thus, we should increase the batch size when the error surface is

complicated.

Problem 1.4 (2 pts) In the following code, “If-else” splits the modified Adalines model into two parts.

Each part is differentiable. The backpropagation algorithm can be applied to the training of the entire

model.

Problem 1.5 (2 pts) According to the “convolution shape rule,” for a convolution operation with a fixed

input feature map, increasing the height and width of kernel size will always lead to a larger output feature

map size.

Algorithm 1 A modified Adalines with branches

Require: w1, w2, x1, x2, n

Ensure: n ̸= 0

Ensure: (x1w1 + x2w2) ̸= 0

if n > 0 then

y ← Sign(x1w1 + x2w2)

else

y ← Sign(x1w1 + x2w2) + 5

end if

## 2 Adalines (15 pts)

In the following problems, you will be asked to derive the output of a given Adaline, or propose proper

weight values for the Adaline to mimic the functionality of some simple logic functions. For all problems,

please consider +1 as True and −1 as False in the inputs and outputs.

Problem 2.1 (3 pts) Observe the Adaline shown in Figure 1, fill in the feature s and output y for each

pair of inputs given in the truth table. What logic function is this Adaline performing?

!!

!”

+1

Σ

# ## = −2 ! = 2

#” = 1

Sign()

(

)

Inputs

Feature

Output

!! !” ” #

-1 -1

-1 +1

+1 -1

+1 +1

�! = 4

Figure 1: Problem 2.1.

Problem 2.2 (4 pts) Propose proper values for weight w0, w1 and w2 in the Adaline shown in Figure 2

to perform the functionality of a logic NAND function. Fill in the feature s for each pair of inputs given in

the truth table to prove the functionality is correct. [Hint: The truth table of NAND function can be found

here. https://en.wikipedia.org/wiki/NAND_logic]

!!

!”

+1

Σ

#! ##

#”

Sign()

(

)

Inputs

Feature

Output

!! !” ” #

-1 -1 +1

-1 +1 -1

+1 -1 -1

+1 +1 -1

+1

+1

+1

+-11

-1

Figure 2: Problem 2.2.

Problem 2.3 (4 pts) Propose proper values for weight w0, w1, w2 and w3 in the Adaline shown in Figure 3

to perform the functionality of a Majority Vote function. Fill in the feature s for each triplet of inputs given

in the truth table to prove the functionality is correct. [Hint: The truth table of Majority Vote function can

be found here. https://en.wikichip.org/wiki/boolean_algebra/majority_function]

Figure 3: Problem 2.3.

Problem 2.4 (4 pts) As discussed in Lecture 2, the XOR function cannot be represented with a single

Adaline, but can be represented with a 2-layer Madaline. Propose proper values for second-layer weight

w20, w21 and w22 in the Madaline shown in Figure 4 to perform the functionality of a XOR function. Fill

in the feature s for each pair of inputs given in the truth table to prove the functionality is correct.

+0.5

+0.5

−1

−1

1

1

Figure 4: Problem 2.4.

## 3 Back Propagation (10 pts)

Problem 3.1 (5 pts) Consider a 2-layer fully-connected NN, where we have input x1 ∈ Rn×1

, hidden

feature x2 ∈ Rm×1

, output x3 ∈ Rk×1 and weights and bias W1 ∈ Rm×n, W2 ∈ Rk×m, b1 ∈ Rm×1

, b2 ∈

Rk×1 of the two layers.

The hidden features and outputs are computed as follows

x2 = Sigmoid(W1x1 + b1) (1)

x3 = W2x2 + b2 (2)

A MSE loss function L =

1

2

(t − x3)

T

(t − x3) is applied in the end, where t ∈ Rk×1

is the target value.

Following the chain rule, derive the gradient ∂L

∂W1

,

∂L

∂W2

,

∂L

∂b1

,

∂L

∂b2

in a vectorized format.

Problem 3.2 (5 pts) Replace the Sigmoid function with ReLU function. Given a data x1 = [0, 1, 2]T

,

target value t = [1, 2]T

, weights and bias at this iteration are

W1 =

3 −1 1

−5 2 −1

, b1 =

1

2

(3)

W2 =

1 −2

−3 1

, b2 =

1

1

(4)

Following the results in Problem 3.1, calculate the values of L, ∂L

∂W1

,

∂L

∂W2

,

∂L

∂b1

,

∂L

∂b2

3

## 4 2D Convolution (10 pts)

Problem 4.1 (5 pts) Derive the 2D convolution results of the following 5 × 9 input matrix and the 3 × 3

kernel. Consider 0s are padded around the input and the stride is 1, so that the output should also have

shape 5 × 9.

0 0 −1 0 0 0 1 0 0

0 −1 −1 −1 0 1 1 1 0

−1 −1 −1 −1 0 1 1 1 1

0 −1 −1 −1 0 1 1 1 0

0 0 −1 0 0 0 1 0 0

∗

0 −1/2 0

−1/2 1 −1/2

0 −1/2 0

Problem 4.2 (5 pts) Compare the output matrix and the input matrix in Problem 4.1, briefly analyze

the effect of this 3 × 3 kernel on the input. (Hint: apply this kernel to an image to see the outputs)

## 5 Lab: LMS Algorithm (15 pts)

In this lab question, you will implement the LMS algorithm with NumPy to learn a linear regression model

for the provided dataset. You will also be directed to analyze how the choice of learning rate in the LMS

algorithm affect the final result. All the codes generating the results of this lab should be gathered in one

file and submit to Sakai.

To start with, please download the dataset.mat file from Sakai and load it into NumPy arraysa

.

There are two variables in the file: data X ∈ R

100×3 and target D ∈ R

100×1

. Each individual pair of

data and target is composed into X and D following the same way as discussed on Lecture 2 Page

8.

Specifically, each row in X correspond to the transpose of a data point, with the first element as

constant 1 and the other two as the two input features x1k and x2k. The goal of the learning task is

finding the weight vector W ∈ R

3×1

for the linear model that can minimize the MSE loss, which is

also formulated on Lecture 2 Page 7.

(a) (3pt) Directly compute the least square (Wiener) solution with the provided dataset. What is the

optimal weight W∗? What is the MSE loss of the whole dataset when the weight is set to W∗?

(b) (4pt) Now consider that you can only train with 1 pair of data point and target each time. In such

case, the LMS algorithm should be used to find the optimal weight. Please initialize the weight

vector as W0 = [0, 0, 0]T

, and update the weight with the LMS algorithm.

After each epoch (every

time you go through all the training data and loop back to the beginning), compute and record

the MSE loss of the current weight on the whole dataset. Run LMS for 20 epochs with learning

rate r = 0.01, report the weight you get in the end and plot the MSE loss in log scale vs. Epochs.

(c) (3pt) Scatter plot the points (x1k, x2k, dk) for all 100 data-target pairs in a 3D figureb

, and plot

the lines corresponding to the linear models you got in (a) and (b) respectively in the same figure.

Observe if the linear models fit the data well.

(d) (5pt) Learning rate r is an important hyperparameter for the LMS algorithm, as well as for CNN

optimization. Here, try repeat the process in (b) with r set to 0.005, 0.05 and 0.5 respectively.

Together with the result you got in (b), plot the MSE losses of the 4 sets of experiments in log scale

vs. Epochs in one figure. Then try further enlarge the learning rate to r = 1 and observe how the

MSE changes. Base on these observations, comment on how learning rate affects the speed and

quality of the learning process. (Note: The learning rate tuning for the CNN optimization will be

introduced in Lecture 7.)

aYou may refer to https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html for loading

matrices in .mat file into NumPy arrays.

bPlease refer to https://jakevdp.github.io/PythonDataScienceHandbook/04.12-three-dimensional-plotting.

html for plotting 3D plots with Matplotlib.

## 6 Lab: Simple NN (40 pts)

For getting started with deep Neural Network model easily, we consider a simple Neural Network model

here and details of the model architecture is given in Table 1. This lab question focuses on building the

model in PyTorch and observing the shape of each layer’s input, weight and output. Please refer to the

NumPy/PyTorch Tutorial slides on Sakai and the official documentations if you are unfamiliar with

PyTorch syntax.

Please finish this lab by completing the SimpleNN.ipynb notebook file provided on Sakai. The completed notebook file should be submitted to Sakai.

Name Type Kernel size depth/units Activation Strides

Conv 1 Convolution 5 16 ReLU 1

MaxPool MaxPool 4 N/A N/A 2

Conv 2 Convolution 3 16 ReLU 1

MaxPool MaxPool 3 N/A N/A 2

Conv 3 Convolution 7 32 ReLU 1

MaxPool MaxPool 2 N/A N/A 2

FC1 Fully-connected N/A 32 ReLU N/A

FC2 Fully-connected N/A 10 ReLU N/A

Table 1: The padding for all three convolution layers is 2. The padding for all three MaxPool layers is 0.

A flatten layer is required before FC1 to reshape the feature.

In the notebook, first run through the first two code blocks, then follow the instructions in the following questions to complete each code block and acquire the answers.

(a) (10pt) Complete code block 3 for defining the adapted SimpleNN model. Note that customized

CONV and FC classes are provided in code block 2 to replace the nn.Conv2d and nn.Linear classes

in PyTorch respectively.

The usage of the customized classes are exactly the same as their PyTorch

counterparts, the only difference is that in the customized class the input and output feature maps

of the layer will be stored in self.input and self.output respectively after the forward pass,

which will be helpful in question (b). After the code is completed, run through the block and

make sure the model forward pass in the end throw no errors. Please copy your code of the

completed SimpleNN class into the report PDF.

(b) (30pt) Complete the for-loop in code block 4 to print the shape of the input feature map, output

feature map and the weight tensor of the 5 convolutional and fully-connected layers when processing a single input. Then compute the number of parameters and the number of MACs in each

layer with the shapes you get. In your report, use your results to fill in the blanks in Table 2.

Lab 2 (40 points)

Layer Input shape Output shape Weight shape # Param # MAC

Conv 1

Conv 2

Conv 3

FC1

FC2

Table 2: Results of Lab 2(b).

5

Please first finish all the required codes in Lab 2, then proceed to code block 5 of the notebook file.

(a) (2pt) Complete the for-loop in code block 5 to plot the histogram of weight elements in each one

of the 5 convolutional and fully-connected layers.

(b) (3pt) In code block 6, complete the code for backward pass, then complete the for-loop to plot the

histogram of weight elements’ gradients in each one of the 5 convolutional and fully-connected

layers.

(c) (5pt) In code block 7, finish the code to set all the weights to 0. Perform forward and backward

pass again to get the gradients, and plot the histogram of weight elements’ gradients in each one

of the 5 convolutional and fully-connected layers.

Comparing with the histograms you got in

(b), are there any differences? Briefly analyze the cause of the difference, and comment on how

will initializing CNN model with zero weights will affect the training process. (Note: The CNN

initialization methods will be introduced in Lecture 6.)

Lab 3 (Bonus 10 points)