Name: CSCI 5561: Assignment 4
SKU: 37898
Price: 30.00 USD
Availability: InStock

Description

5/5 - (1 vote)

Convolutional Neural Network

2 Overview

Figure 1: You will implement (1) a multi-layer perceptron (neural network) and (2)
convolutiona neural network to recognize hand-written digit using the MNIST dataset.

The goal of this assignment is to implement neural networks to recognize hand-written
digits in the MNIST data.

MNIST Data You will use the MNIST hand-written digit dataset to perform the
first task (neural network). We have reduced the image size (28 × 28 → 14 × 14) and
subsampled the data. You can download the training and testing data from Canvas.
Description: The zip file includes two MAT files (mnist_train.mat and mnist_test.mat).

Each file includes im_* and label_* variables:
• im_* is a matrix (196 × n) storing vectorized image data (196 = 14 × 14)
• label_* is 1 × n vector storing the label for each image data.
n is the number of images. You can visualize the i
th image, e.g.,
plt.imshow(mnist_train[’im_train’][:, 0].reshape((14, 14), order=’F’), cmap=’gray’).

3 Single-layer Linear Perceptron

x1 w
1 y 1 a
196 x
1
10 a 10 y
(a) Single linear perceptron
0 2000 4000 6000 8000
Iterations
6
7
8
9
10
11
12
13
Loss
Training loss
Testing loss

(b) Loss
0 1 2 3 4 5 6 7 8 9
Accuracy: 0.297905
0
1
2
3
4
5
6
7
8
9

(c) Confusion
Figure 2: You will implement a single linear perceptron that produces accuracy near
30%. Random chance is 10% on testing data.

You will implement a single-layer linear perceptron (Figure 2(a)) with stochastic gradient descent method. We provide main_slp_linear where you will implement get_mini_batch
and train_slp_linear.
def get_mini_batch(im_train, label_train, batch_size)
…
return mini_batch_x, mini_batch_y

Input: im_train and label_train are a set of images and labels, and batch_size is
the size of the mini-batch for stochastic gradient descent.

Output: mini_batch_x and mini_batch_y are cells that contain a set of batches (images and labels, respectively). Each batch of images is a matrix with size 196×batch_size,
and each batch of labels is a matrix with size 10×batch_size (one-hot encoding). Note
that the number of images in the last batch may be smaller than batch_size.

Description: You should randomly permute the the order of images when building
the batch, and whole sets of mini_batch_* must span all training data.

def fc(x, w, b)
…
return y
Input: x∈ Rm×1
is the input to the fully connected layer, and w∈ Rn×m and b∈ Rn×1
are the weights and bias.

Output: y∈ Rn×1
is the output of the linear transform (fully connected layer).
Description: FC is a linear transform of x, i.e., y = wx + b.
def fc_backward(dl_dy, x, w, b, y)
…
return dl_dx, dl_dw, dl_db
Input: dl_dy ∈ R1×n
is the loss derivative with respect to the output y.

Output: dl_dx ∈ R1×m is the loss derivative with respect the input x, dl_dw ∈
R1×(n×m)
is the loss derivative with respect to the weights, and dl_db ∈ R1×n
is the
loss derivative with respect to the bias.

Description: The partial derivatives w.r.t. input, weights, and bias will be computed.
dl_dx will be back-propagated, and dl_dw and dl_db will be used to update the weights
and bias.
def loss_euclidean(y_tilde, y)
…
return l, dl_dy
Input: y_tilde ∈ Rm is the prediction, and y∈ {0, 1}
m is the ground truth label.

Output: l∈ R is the loss, and dl_dy is the loss derivative with respect to the prediction.
Description: loss_euclidean measure Euclidean distance L = ∥y − ye∥
2
.

def train_slp_linear(mini_batch_x, mini_batch_y)
…
return w, b
Input: mini_batch_x and mini_batch_y are cells where each cell is a batch of images
and labels.
Output: w ∈ R10×196 and b ∈ R10×1 are the trained weights and bias of a single-layer
perceptron.

Description: You will use fc, fc_backward, and loss_euclidean to train a singlelayer perceptron using a stochastic gradient descent method where a pseudo-code can
be found below. Through training, you are expected to see reduction of loss as shown
in Figure 2(b). As a result of training, the network should produce more than 25% of
accuracy on the testing data (Figure 2(c)).

Algorithm 1 Stochastic Gradient Descent based Training
1: Set the learning rate γ
2: Set the decay rate λ ∈ (0, 1]
3: Initialize the weights with a Gaussian noise w ∈ N (0, 1)
4: k = 1
5: for iIter = 1 : nIters do
6: At every 1000th iteration, γ ← λγ
7: ∂L
∂w ← 0 and ∂L
∂b ← 0

8: for Each image xi
in k
th mini-batch do
9: Label prediction of xi
10: Loss computation l
11: Gradient back-propagation of xi
,
∂l
∂w
using back-propagation.

12: ∂L
∂w =
∂L
∂w +
∂l
∂w
and ∂L
∂b =
∂L
∂b +
∂l
∂b
13: end for
14: k++ (Set k = 1 if k is greater than the number of mini-batches.)
15: Update the weights, w ← w −
γ
R
∂L
∂w
, and bias b ← b −
γ
R
∂L
∂b
16: end for

4 Single-layer Perceptron

x1 w
1 y
196 x
1
10 y
1 a
10 a
1
f
S
o
ft
–
m
a
x
10 f
(a) Single-layer perceptron
0 1000 2000 3000 4000 5000
Iterations
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Loss
Training loss
Testing loss

(b) Loss
0 1 2 3 4 5 6 7 8 9
Accuracy: 0.898720
0
1
2
3
4
5
6
7
8
9

You will implement a single-layer perceptron with soft-max cross-entropy using stochastic gradient descent method. We provide main_slp where you will implement train_slp.

Unlike the single-layer linear perceptron, it has a soft-max layer that approximates a
max function by clamping the output to [0, 1] range as shown in Figure 3(a).
def loss_cross_entropy_softmax(x, y)
…
return l, dl_dy
Input: x ∈ Rm×1
is the input to the soft-max, and y∈ {0, 1}
m is the ground truth
label.

Output: L∈ R is the loss, and dl_dy is the loss derivative with respect to x.
Description: Loss_cross_entropy_softmax measure cross-entropy between two distributions L =
Pm
i yi
log yei where yei
is the soft-max output that approximates the max
operation by clamping x to [0, 1] range:
yei =
e
xi
P
i
e
xi
,
where xi
is the i
th element of x.

def train_slp(mini_batch_x, mini_batch_y)
…
return w, b
Output: w ∈ R10×196 and b ∈ R10×1 are the trained weights and bias of a single-layer
perceptron.

Description: You will use the following functions to train a single-layer perceptron using a stochastic gradient descent method: fc, fc_backward, loss_cross_entropy_softmax
Through training, you are expected to see reduction of loss as shown in Figure 3(b).

As a result of training, the network should produce more than 85% of accuracy on the
testing data (Figure 3(c)).

5 Multi-layer Perceptron

w1 1 x
196 x
1
1 y 1 a
10 y 10 a
1 a 1
f
m
a
mf
w2
1
f
S
o
ft
–
m
a
x
10 f
(a) Multi-layer perceptron
0 1 2 3 4 5 6 7 8 9
Accuracy: 0.914553
0
1
2
3
4
5
6
7
8
9

(b) Confusion
Figure 4: You will implement a multi-layer perceptron that produces accuracy more
than 90% on testing data.

You will implement a multi-layer perceptron with a single hidden layer using a stochastic
gradient descent method. We provide main_mlp. The hidden layer is composed of 30
units as shown in Figure 4(a).
def relu(x)
…
return y

Input: x is a general tensor, matrix, and vector.
Output: y is the output of the Rectified Linear Unit (ReLu) with the same input size.
Description: ReLu is an activation unit (yi = max(0, xi)). In some case, it is possible
to use a Leaky ReLu (yi = max(ϵxi
, xi) where ϵ = 0.01).
def relu_backward(dl_dy, x, y)
…
return dl_dx
Input: dl_dy ∈ R1×z
is the loss derivative with respect to the output y ∈ Rz where z
is the size of input (it can be tensor, matrix, and vector).

Output: dl_dx ∈ R1×z
is the loss derivative with respect to the input x.

def train_mlp(mini_batch_x, mini_batch_y)
…
return w1, b1, w2, b2
Output: w1 ∈ R30×196
, b1 ∈ R30×1
, w2 ∈ R10×30
, b2 ∈ R10×1 are the trained weights
and biases of a multi-layer perceptron.

Description: You will use the following functions to train a multi-layer perceptron
using a stochastic gradient descent method: fc, fc_backward, relu, relu_backward,
loss_cross_entropy_softmax. As a result of training, the network should produce
more than 90% of accuracy on the testing data (Figure 4(b)).

(b) Confusion
Figure 5: You will implement a convolutional neural network that produces accuracy
more than 92% on testing data.

You will implement a convolutional neural network (CNN) using a stochastic gradient
descent method. We provide main_cnn. As shown in Figure 4(a), the network is
composed of: a single channel input (14×14×1) → Conv layer (3×3 convolution with
3 channel output and stride 1) → ReLu layer → Max-pooling layer (2 × 2 with stride
2) → Flattening layer (147 units) → FC layer (10 units) → Soft-max.

def conv(x, w_conv, b_conv)
…
return y
Input: x ∈ RH×W×C1
is an input to the convolutional operation, w_conv ∈ Rh×w×C1×C2
and b_conv ∈ RC2×1 are weights and bias of the convolutional operation.

Output: y ∈ RH×W×C2
is the output of the convolutional operation. Note that to get
the same size with the input, you may pad zero at the boundary of the input image.
Description: You can use np.pad for padding 0s at boundary. Optionally, you may
use im2col1
to simplify convolutional operation.
1https://leonardoaraujosantos.gitbook.io/artificial-inteligence/machine_learning/
deep_learning/convolution_layer/making_faster

def conv_backward(dl_dy, x, w_conv, b_conv, y)
…
return dl_dw, dl_db
Input: dl_dy is the loss derivative with respect to y.
Output: dl_dw and dl_db are the loss derivatives with respect to convolutional weights
and bias w and b, respectively.

Description: Note that for the single convolutional layer, ∂L
∂x
is not needed. Optionally, you may use im2col to simplify convolutional operation.
def pool2x2(x)
…
return y
Input: x ∈ RH×W×C is a general tensor and matrix.
Output: y ∈ R
H
2 × W
2 ×C is the output of the 2 × 2 max-pooling operation with stride 2.
def pool2x2_backward(dl_dy, x, y)
…
return dl_dx

Input: dl_dy is the loss derivative with respect to the output y.
Output: dl_dx is the loss derivative with respect to the input x.

def flattening(x)
…
return y
Input: x ∈ RH×W×C is a tensor.
Output: y ∈ RHW C is the vectorized tensor (column major).
def flattening_backward(dl_dy, x, y)
…
return dl_dx

Input: dl_dy is the loss derivative with respect to the output y.
Output: dl_dx is the loss derivative with respect to the input x.
function train_cnn(mini_batch_x, mini_batch_y)
…
return w_conv, b_conv, w_fc, b_fc
Output: w_conv ∈ R3×3×1×3
, b_conv ∈ R3
, w_fc ∈ R10×147
, b_fc ∈ R10×1 are the
trained weights and biases of the CNN.

Description: You will use the following functions to train a convolutional neural
network using a stochastic gradient descent method: conv, conv_backward, pool2x2,
pool2x2_backward, Flattening, flattening_backward, fc, fc_backward, relu, relu_backward,
loss_cross_entropy_softmax. As a result of training, the network should produce
more than 92% of accuracy on the testing data (Figure 5(b)).

CSCI 5561: Assignment 4

Description

Convolutional Neural Network

2 Overview

3 Single-layer Linear Perceptron

4 Single-layer Perceptron

5 Multi-layer Perceptron

6 Convolutional Neural Network

Related products

CSCI 5561: Assignment 1

CSCI 5561: Assignment 2

CSCI 5561: Assignment 3