STA414/2104 HOMEWORK 3 – V1

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (6 votes)

1. Support Vector Machines Dual Problem – (40 pts). Assume that you are given a
data set D = {(ti
, xi) : for i = 1, …, N} with ti ∈ {±1}.
1.1. Hard margin – (20 pts). Recall that the hard margin SVM problem can be written in the
following primal form
min
w,b
1
2
kwk
2
2
s.t. ti(w>xi + b) ≥ 1 i = 1, . . . , N
(a) Write down the Lagrangian for this problem with Lagrangian parameters denoted with αi
’s.
(b) Show that the equivalent dual problem can be written as
max
α
W(α) = X
N
i=1
αi −
1
2
X
N
i,j=1
titjαiαjx
T
i xj
s.t. 0 ≤ αi i = 1, 2, …, N.
X
N
i=1
αiti = 0.
(c) Assume that we solved the above dual formulation and obtained the optimal α. For a given
test data point x, how can we predict its class?
1.2. Soft margin – (20 pts). Recall that the soft margin SVM problem can be written in the
following primal form
min
w,b,ξ
1
2
kwk
2
2 + γ
X
N
i=1
ξi
s.t. ti(w>xi + b) ≥ 1 − ξi i = 1, . . . , N
ξi ≥ 0 i = 1, . . . , N
1
(a) Use the Lagrangian provided in the lecture to show that the equivalent dual problem can
be written as
max
α
W(α) = X
N
i=1
αi −
1
2
X
N
i,j=1
titjαiαjx
T
i xj
s.t. 0 ≤ αi ≤ γ i = 1, . . . , N
X
N
i=1
αiti = 0.
(b) Assume that we solved the above dual formulation and obtained the optimal α. For a given
test data point x, how can we predict its class?
What to submit?
1.1-a) Lagrangian.
b) Your derivation of equivalent optimization problem.
c) The prediction rule for a new test point.
1.2-a) Your derivation of equivalent optimization problem..
b) The prediction rule for a new test point.
2. Neural Networks (60 points). In this problem, you will experiment on a subset of the
Toronto Faces Dataset (TFD). You will complete the starter code provided to you, and experiment
with the completed code. You should understand the code instead of using it as a black box.
We subsample 3374, 419 and 385 grayscale images from TFD as the training, validation and
testing set respectively. Each image is of size 48 × 48 and contains a face that has been extracted
from a variety of sources. The faces have been rotated, scaled and aligned to make the task
easier. The faces have been labeled by experts and research assistants based on their expression.
These expressions fall into one of seven categories: 1-Anger, 2-Disgust, 3-Fear, 4-Happy, 5-Sad,
6-Surprise, 7-Neutral. We show one example face per class in Figure 1.
Fig 1: Example faces. From left to right, the the corresponding class is from 1 to 7.
Code for training a neural network (fully connected) is partially provided in nn.py.
2.1. Complete the code [20 points]. Follow the instructions in nn.py to implement the missing
functions that perform the backward pass of the network.
2.2. Generalization [10 points]. Train the neural network with the default set of hyperparameters. Report training, and validation errors and a plot of error curves (training and validation).
Examine the statistics and plots of training error and validation error (generalization). How does
the network’s performance differ on the training set vs. the validation set during learning?
2
2.3. Optimization [10 points]. Try different values of the learning rate (step size) η (“eta”)
ranging from η ∈ {0.001, 0.01, 0.5}. What happens to the convergence properties of the algorithm
(looking at both cross-entropy and percent-correct)? Try 3 different mini-batch sizes ranging from
{10, 100, 1000}. How does mini-batch size affect convergence? How would you choose the best
value of these parameters? In each of these hold the other parameters constant while you vary
the one you are studying.
2.4. Model architecture [10 points]. Try 3 different values of the number of hidden units for
each layer of the fully connected network (range from {2, 20, 80}). You might need to adjust the
learning rate and the number of epochs (iterations). Comment on the effect of this modification
on the convergence properties, and the generalization of the network.
2.5. Network Uncertainty [10 points]. Plot five examples where the neural network is not
confident of the classification output (the top score is below some threshold), and comment on
them. Will the classifier be correct if it outputs the top scoring class anyways?
What to submit?
2.1) Completed code.
2.2) Final training and validation errors, and a plot of these errors across all iterations. Your
comments on network’s performance.
2.3) The curves you obtained in the previous part for the given step size, mini-batch size choices.
Your comments/answers to the two questions.
2.4) The curves you obtained in the previous part for the given number of hidden units. Your
comments on the convergence/generalization.
2.5) Five example images and your comments, and answer to the question. Your comments on
the convergence/generalization.
3