ECE 661: Homework 1 Linear Model, Back Propagation and Building a CNN


Category: You will Instantly receive a download link for .zip solution file upon Payment


5/5 - (1 vote)

1 True/False Questions (10 pts)

For each question, please provide a short explanation to support your judgment.
Problem 1.1 (2 pts) The overfitting models can perfectly fit the training data. We should increase the
noise in the training data and the number of parameters to improve NN’s generalization ability.

Problem 1.2 (2 pts) Given a learning task that can be perfectly learned by a Madaline model, the same
set of weight values will be achieved after training, no matter how the Madaline is initialized.

Problem 1.3 (2 pts) The error surface can be complicated. The direction of steepest descent is not
always the direction towards the minimum. Full batch size can keep the direction of steepest descent
perpendicular to the contour lines. Thus, we should increase the batch size when the error surface is

Problem 1.4 (2 pts) In the following code, “If-else” splits the modified Adalines model into two parts.
Each part is differentiable. The backpropagation algorithm can be applied to the training of the entire

Problem 1.5 (2 pts) According to the “convolution shape rule,” for a convolution operation with a fixed
input feature map, increasing the height and width of kernel size will always lead to a larger output feature
map size.

Algorithm 1 A modified Adalines with branches
Require: w1, w2, x1, x2, n
Ensure: n ̸= 0
Ensure: (x1w1 + x2w2) ̸= 0
if n > 0 then
y ← Sign(x1w1 + x2w2)
y ← Sign(x1w1 + x2w2) + 5
end if

2 Adalines (15 pts)

In the following problems, you will be asked to derive the output of a given Adaline, or propose proper
weight values for the Adaline to mimic the functionality of some simple logic functions. For all problems,
please consider +1 as True and −1 as False in the inputs and outputs.

Problem 2.1 (3 pts) Observe the Adaline shown in Figure 1, fill in the feature s and output y for each
pair of inputs given in the truth table. What logic function is this Adaline performing?
# ## = −2 ! = 2
#” = 1
!! !” ” #
-1 -1
-1 +1
+1 -1
+1 +1
�! = 4
Figure 1: Problem 2.1.

Problem 2.2 (4 pts) Propose proper values for weight w0, w1 and w2 in the Adaline shown in Figure 2
to perform the functionality of a logic NAND function. Fill in the feature s for each pair of inputs given in
the truth table to prove the functionality is correct. [Hint: The truth table of NAND function can be found

#! ##
!! !” ” #
-1 -1 +1
-1 +1 -1
+1 -1 -1
+1 +1 -1
Figure 2: Problem 2.2.

Problem 2.3 (4 pts) Propose proper values for weight w0, w1, w2 and w3 in the Adaline shown in Figure 3
to perform the functionality of a Majority Vote function. Fill in the feature s for each triplet of inputs given
in the truth table to prove the functionality is correct. [Hint: The truth table of Majority Vote function can
be found here.]

Figure 3: Problem 2.3.

Problem 2.4 (4 pts) As discussed in Lecture 2, the XOR function cannot be represented with a single
Adaline, but can be represented with a 2-layer Madaline. Propose proper values for second-layer weight
w20, w21 and w22 in the Madaline shown in Figure 4 to perform the functionality of a XOR function. Fill
in the feature s for each pair of inputs given in the truth table to prove the functionality is correct.
Figure 4: Problem 2.4.

3 Back Propagation (10 pts)

Problem 3.1 (5 pts) Consider a 2-layer fully-connected NN, where we have input x1 ∈ Rn×1
, hidden
feature x2 ∈ Rm×1
, output x3 ∈ Rk×1 and weights and bias W1 ∈ Rm×n, W2 ∈ Rk×m, b1 ∈ Rm×1
, b2 ∈
Rk×1 of the two layers.

The hidden features and outputs are computed as follows
x2 = Sigmoid(W1x1 + b1) (1)
x3 = W2x2 + b2 (2)
A MSE loss function L =
(t − x3)
(t − x3) is applied in the end, where t ∈ Rk×1
is the target value.

Following the chain rule, derive the gradient ∂L
in a vectorized format.

Problem 3.2 (5 pts) Replace the Sigmoid function with ReLU function. Given a data x1 = [0, 1, 2]T
target value t = [1, 2]T
, weights and bias at this iteration are
W1 =

3 −1 1
−5 2 −1

, b1 =


W2 =

1 −2
−3 1 
, b2 =



Following the results in Problem 3.1, calculate the values of L, ∂L

4 2D Convolution (10 pts)

Problem 4.1 (5 pts) Derive the 2D convolution results of the following 5 × 9 input matrix and the 3 × 3
kernel. Consider 0s are padded around the input and the stride is 1, so that the output should also have
shape 5 × 9. 

0 0 −1 0 0 0 1 0 0
0 −1 −1 −1 0 1 1 1 0
−1 −1 −1 −1 0 1 1 1 1
0 −1 −1 −1 0 1 1 1 0
0 0 −1 0 0 0 1 0 0

0 −1/2 0
−1/2 1 −1/2
0 −1/2 0

Problem 4.2 (5 pts) Compare the output matrix and the input matrix in Problem 4.1, briefly analyze
the effect of this 3 × 3 kernel on the input. (Hint: apply this kernel to an image to see the outputs)

5 Lab: LMS Algorithm (15 pts)

In this lab question, you will implement the LMS algorithm with NumPy to learn a linear regression model
for the provided dataset. You will also be directed to analyze how the choice of learning rate in the LMS
algorithm affect the final result. All the codes generating the results of this lab should be gathered in one
file and submit to Sakai.

To start with, please download the dataset.mat file from Sakai and load it into NumPy arraysa
There are two variables in the file: data X ∈ R
100×3 and target D ∈ R
. Each individual pair of
data and target is composed into X and D following the same way as discussed on Lecture 2 Page

Specifically, each row in X correspond to the transpose of a data point, with the first element as
constant 1 and the other two as the two input features x1k and x2k. The goal of the learning task is
finding the weight vector W ∈ R
for the linear model that can minimize the MSE loss, which is
also formulated on Lecture 2 Page 7.

(a) (3pt) Directly compute the least square (Wiener) solution with the provided dataset. What is the
optimal weight W∗? What is the MSE loss of the whole dataset when the weight is set to W∗?

(b) (4pt) Now consider that you can only train with 1 pair of data point and target each time. In such
case, the LMS algorithm should be used to find the optimal weight. Please initialize the weight
vector as W0 = [0, 0, 0]T
, and update the weight with the LMS algorithm.

After each epoch (every
time you go through all the training data and loop back to the beginning), compute and record
the MSE loss of the current weight on the whole dataset. Run LMS for 20 epochs with learning
rate r = 0.01, report the weight you get in the end and plot the MSE loss in log scale vs. Epochs.

(c) (3pt) Scatter plot the points (x1k, x2k, dk) for all 100 data-target pairs in a 3D figureb
, and plot
the lines corresponding to the linear models you got in (a) and (b) respectively in the same figure.
Observe if the linear models fit the data well.

(d) (5pt) Learning rate r is an important hyperparameter for the LMS algorithm, as well as for CNN
optimization. Here, try repeat the process in (b) with r set to 0.005, 0.05 and 0.5 respectively.

Together with the result you got in (b), plot the MSE losses of the 4 sets of experiments in log scale
vs. Epochs in one figure. Then try further enlarge the learning rate to r = 1 and observe how the
MSE changes. Base on these observations, comment on how learning rate affects the speed and
quality of the learning process. (Note: The learning rate tuning for the CNN optimization will be
introduced in Lecture 7.)

aYou may refer to for loading
matrices in .mat file into NumPy arrays.

bPlease refer to
html for plotting 3D plots with Matplotlib.

6 Lab: Simple NN (40 pts)

For getting started with deep Neural Network model easily, we consider a simple Neural Network model
here and details of the model architecture is given in Table 1. This lab question focuses on building the
model in PyTorch and observing the shape of each layer’s input, weight and output. Please refer to the
NumPy/PyTorch Tutorial slides on Sakai and the official documentations if you are unfamiliar with
PyTorch syntax.

Please finish this lab by completing the SimpleNN.ipynb notebook file provided on Sakai. The completed notebook file should be submitted to Sakai.

Name Type Kernel size depth/units Activation Strides
Conv 1 Convolution 5 16 ReLU 1
MaxPool MaxPool 4 N/A N/A 2
Conv 2 Convolution 3 16 ReLU 1
MaxPool MaxPool 3 N/A N/A 2
Conv 3 Convolution 7 32 ReLU 1
MaxPool MaxPool 2 N/A N/A 2
FC1 Fully-connected N/A 32 ReLU N/A
FC2 Fully-connected N/A 10 ReLU N/A

Table 1: The padding for all three convolution layers is 2. The padding for all three MaxPool layers is 0.
A flatten layer is required before FC1 to reshape the feature.

In the notebook, first run through the first two code blocks, then follow the instructions in the following questions to complete each code block and acquire the answers.

(a) (10pt) Complete code block 3 for defining the adapted SimpleNN model. Note that customized
CONV and FC classes are provided in code block 2 to replace the nn.Conv2d and nn.Linear classes
in PyTorch respectively.

The usage of the customized classes are exactly the same as their PyTorch
counterparts, the only difference is that in the customized class the input and output feature maps
of the layer will be stored in self.input and self.output respectively after the forward pass,
which will be helpful in question (b). After the code is completed, run through the block and
make sure the model forward pass in the end throw no errors. Please copy your code of the
completed SimpleNN class into the report PDF.

(b) (30pt) Complete the for-loop in code block 4 to print the shape of the input feature map, output
feature map and the weight tensor of the 5 convolutional and fully-connected layers when processing a single input. Then compute the number of parameters and the number of MACs in each
layer with the shapes you get. In your report, use your results to fill in the blanks in Table 2.

Lab 2 (40 points)
Layer Input shape Output shape Weight shape # Param # MAC
Conv 1
Conv 2
Conv 3
Table 2: Results of Lab 2(b).

Please first finish all the required codes in Lab 2, then proceed to code block 5 of the notebook file.
(a) (2pt) Complete the for-loop in code block 5 to plot the histogram of weight elements in each one
of the 5 convolutional and fully-connected layers.

(b) (3pt) In code block 6, complete the code for backward pass, then complete the for-loop to plot the
histogram of weight elements’ gradients in each one of the 5 convolutional and fully-connected

(c) (5pt) In code block 7, finish the code to set all the weights to 0. Perform forward and backward
pass again to get the gradients, and plot the histogram of weight elements’ gradients in each one
of the 5 convolutional and fully-connected layers.

Comparing with the histograms you got in
(b), are there any differences? Briefly analyze the cause of the difference, and comment on how
will initializing CNN model with zero weights will affect the training process. (Note: The CNN
initialization methods will be introduced in Lecture 6.)
Lab 3 (Bonus 10 points)