Description
Assignment 2: Classification / Deep learning
1 Softmax for Multi-Class Classification
The softmax function is a multi-class generalization of the logistic sigmoid:
p(Ck|x) = exp(ak)
P
j
exp(aj )
(1)
Consider a case where the activation functions aj are linear functions of the input. Assume there
are 3 classes (C1, C2, C3), and the input is x = (x1, x2) ∈ R
2
• a1 = 3×1 + 1×2 + 1
• a2 = 1×1 + 3×2 + 2
• a3 = −3×1 + 1.5×2 + 2
The image below shows the 3 decision regions induced by these activation functions, their common
point intersection point (in green) and decision boundaries (in red).
Answer the following questions. For 2 and 3, you may provide qualitative answers (i.e. no need to
analyze limits).
1. (3 marks) What are the probabilities p(Ck|x) at the green point?
2. (3 marks) What happens to the probabilities along each of the red lines? What happens as
we move along a red line (away from the green point)?
3. (4 marks) What happens to the probabilities as we move far away from the intersection point,
staying in the middle of one region?
2
CMPT 419/726: Assignment 2
2 Error Backpropagation
We will derive error derivatives using back-propagation on the network below.
Notation: Please use notation following the examples of names for weights given in the figure.
For activations/outputs, the red node would have activation a
(1)
2 = w
(1)
21 x1 + w
(1)
22 x2 + w
(1)
23 x3 and
output z
(1)
2 = h(a
(1)
2
).
Activation functions: Assume the activation functions h(·) for the hidden layers are logistics. For
the final output node assume the activation function is an identity function h(a) = a.
Error function: Assume this network is doing regression, trained using the standard squared error
so that En(w) = 1
2
(y(xn, w) − tn)
2
.
input output
x1
x2
x3
w(1)
11 w(2)
11
w(3)
11
Consider the output layer.
• Calculate ∂En(w)
∂a(3)
1
. Note that a
(3)
1
is the activation of the output node, and that ∂En(w)
∂a(3)
1
≡ δ
(3)
1
.
• Use this result to calculate ∂En(w)
∂w(3)
12
.
Next, consider the penultimate layer of nodes.
• Write an expression for ∂En(w)
∂a(2)
1
. Use δ
(3)
1
in this expression.
• Use this result to calculate ∂En(w)
∂w(2)
11
.
Finally, consider the weights connecting from the inputs.
• Write an expression for ∂En(w)
∂a(1)
1
. Use the set of δ
(2)
k
in this expression.
• Use this result to calculate ∂En(w)
∂w(1)
11
.
3
CMPT 419/726: Assignment 2
3 Logistic Regression
In this question you will examine optimization for logistic regression.
1. Download the assignment 2 code and data from the website. Run the script logistic regression.py
in the P3 directory. This code performs gradient descent to find w which minimizes negative
log-likelihood (i.e. maximizes likelihood).
Include the final output of Figures 2 and 3 (plot of separator path in slope-intercept space;
plot of neg. log likelihood over epochs) in your report.
Why are these plots oscillating? Briefly explain why in your report.
2. Create a Python script logistic regression mod.py for the following.
Modify logistic regression.py to run gradient descent with the learning rates η =
0.5, 0.3, 0.1, 0.05, 0.01.
Include in your report a single plot comparing negative log-likelihood versus epoch for these
different learning rates.
Compare these results. What are the relative advantages of the different rates?
3. Create a Python script logistic regression sgd.py for the following.
Modify this code to do stochastic gradient descent. Use the parameters
η = 0.5, 0.3, 0.1, 0.05, 0.01.
Include in your report a new plot comparing negative log-likelihood versus iteration using
stochastic gradient descent.
Is stochastic gradient descent faster than gradient descent? Explain using your plots.
4
CMPT 419/726: Assignment 2
4 Fine-Tuning a Pre-Trained Network
In this question you will experiment with fine-tuning a pre-trained network. This is a standard
workflow in adapting existing deep networks to a new task.
We will utilize PyTorch (https://pytorch.org) a machine learning library for python.
The provided code builds upon ResNet 50, a state of the art deep network for image classification.
ResNet 50 has been designed for ImageNet image classification with 1000 output classes.
The ResNet 50 model has been adapted to solve a (simpler) different task, classifying an image as
one of 10 classes on CIFAR10 dataset.
The code imagenet finetune.py does the following:
• Constructs a deep network. This network starts with ResNet 50 up to its average pooling
layer. Then, a small network with 32 hidden nodes then 10 output nodes (dense connections)
is added on top.
• Initializes the weights of the ResNet 50 portion with the parameters from training on ImageNet.
• Performs training on only the new layers using CIFAR10 dataset – all other weights are fixed
to their values learned on ImageNet.
The code and data can be found on the course website. For convenience, Anaconda (https:
//www.anaconda.com) environment config files with the latest stable release of PyTorch and
torchvision are provided for Python 2.7 and Python 3.6 for Linux and macOs users. You can use
one of the config files to create virtual environments and test your code. To set up the virtual
environment, install Anaconda and run the following command
conda env create -f CONFIG_FILE.
Replace CONFIG FILE with the path to the config files you downloaded. To activate the virtual
environment, run the following command
source activate ENV_NAME
Replacing ENV NAME with cmpt419-pytorch-python27 or cmpt419-pytorch-python36
depending on your Python version.
Windows users please follow the instructions on PyTorch website (https://pytorch.org)
to install manually. PyTorch only supports Python3 on Windows!
If you wish to download and install PyTorch by yourself, you will need PyTorch (v 0.4.1), torchvision (v 0.2.1), and their dependencies.
What to do:
Start by running the code provided. It will be *very* slow to train since the code runs on a CPU.
You can try figuring out how to change the code to train on a GPU if you have a good GPU and
want to accelerate training. Try to do one of the following tasks:
5
CMPT 419/726: Assignment 2
• Write a Python function to be used at the end of training that generates HTML output showing each test image and its classification scores. You could produce an HTML table output
for example.
• Run validation of the model every few training epochs on validation or test set of the dataset
and save the model with the best validation error.
• Try applying L2 regularization to the coefficients in the small networks we added.
• Try running this code on one of the datasets in torchvision.datasets (https://pytorch.
org/docs/stable/torchvision/datasets.html) except CIFAR100. You may
need to change some layers in the network. Try creating a custom dataloader that loads data
from your own dataset and run the code using your dataloader. (Hints: Your own dataset
should not come from torchvision.datasets. A standard approach is to implement your own
torch.utils.data.Dataset and wrap it with torch.utils.data.DataLoader)
• Try modifying the structure of the new layers that were added on top of ResNet 50.
• Try adding data augmentation for the training data using torchvision.transforms and then implementing your custom image transformation methods not available in torchvision.transforms,
like gaussian blur.
• The current code is inefficient because it recomputes the output of ResNet 50 every time a
training/validation example is seen, even though those layers aren’t being trained. Change
this by saving the output of ResNet 50 and using these as input rather than the dataloader
currently used.
• The current code does not train the layers in ResNet 50. After training the new layers for
a while (until good values have been obtained), turn on training for the ResNet 50 layers to
see if better performance can be achieved.
Put your code and a readme file for Problem 4 under a separate directory named P4 in the code.zip
file you submit for this assignment. The readme file should describe what you implemented for
this problem and what each one of your code files does. It should also include the command to run
your code. If you have any figures or tables to show, put them in your report for this assignment
and mention them in your readme file.
6
CMPT 419/726: Assignment 2
Submitting Your Assignment
The assignment must be submitted online at https://courses.cs.sfu.ca. You must submit three files:
1. An assignment report in PDF format, called report.pdf. This report must contain the
solutions to questions 1 and 2 as well as the figures / explanations requested for 3 and 4.
(please take screenshots from your entire screen for the figures requested for questions 3 and
4.)
2. A .zip file of all your code, called code.zip.
7