Description
1 Analytical Part (2 percent grade)
This part will be graded as a PASS or FAIL.
1. Answer the following questions with a yes or no along with proper justification.
a. Is the decision boundary of voted perceptron linear?
b. Is the decision boundary of averaged perceptron linear?
2. In the class, we saw the Passive-Aggressive (PA) update that tries to achieve a margin equal
to one after each update. Derive the PA weight update for achieving margin M.
3. Consider the following setting. You are provided with n training examples: (x1, y1, h1), · · · ,(xn, yn, hn),
where xi
is the input example, yi
is the class label (+1 or -1), and hi > 0 is the importance
weight of the example. The teacher gave you some additional information by specifying the
importance of each training example.
a. How will you modify the perceptron algorithm to be able to leverage this extra information? Please justify your answer.
b. How can you solve this learning problem using the standard perceptron algorithm? Please
justify your answer. I’m looking for a reduction based solution.
4. Consider the following setting. You are provided with n training examples: (x1, y1),(x2, y2), · · · ,(xn, yn),
where xi
is the input example, and yi
is the class label (+1 or -1). However, the training data
is highly imbalanced (say 90% of the examples are negative and 10% of the examples are
positive) and we care more about the accuracy of positive examples.
a. How will you modify the perceptron algorithm to solve this learning problem? Please
justify your answer.
b. How can you solve this learning problem using the standard perceptron algorithm? Please
justify your answer. I’m looking for a reduction based solution.
2 Programming and Empirical Analysis Part (6 percent grade)
1. Programming and empirical analysis question.
Implement a binary classifier with both perceptron and passive-aggressive (PA) weight update
as shown below.
1
Algorithm 1 Online Binary-Classifier Learning Algorithm
Input: D = Training examples, T = maximum number of training iterations
Output: w, the final weight vector
1: Initialize the weights w = 0
2: for each training iteration itr ∈ {1, 2, · · · , T} do
3: for each training example (xt
, yt) ∈ D do
4: yˆt = sign(w · xt) // predict using the current weights
5: if mistake then
6: w = w + τ · yt
· xt // update the weights
7: end if
8: end for
9: end for
10: return final weight vector w
For standard perceptron, you will use τ = 1, and for Passive-Aggressive (PA) algorithm, you
will compute the learning rate τ as follows.
τ =
1 − yt
· (w · xt)
kxtk
2
(1)
Implement a multi-class online learning algorithm with both perceptron and passive-aggressive
(PA) weight update as shown below. Employ the single weight vector represenation (representationII as discussed in the class). This representation is defined as follows. Each training example
is of the form (xt
, yt), where xt ∈ <d
is the input and yt ∈ {1, 2, · · · , k} is the class (output)
label. In this representation, you will have a single weight-vector w ∈ <k·d and the augmented
feature function F(xt
, y) ∈ <k·d will have k blocks of size d and it will have zeroes everywhere
except for the y
th block, which will have xt
in it.
Algorithm 2 Online Multi-Class Classifier Learning Algorithm
Input: D = Training examples, k = number of classes, T = maximum number of training iterations
Output: w, the final weight vector
1: Initialize the weights w = 0
2: for each training iteration itr ∈ {1, 2, · · · , T} do
3: for each training example (xt
, yt) ∈ D do
4: yˆt = arg maxy∈{1,2,···,k} w · F(xt
, y) // predict using the current weights
5: if mistake then
6: w = w + τ · (F(xt
, yt) − F(xt
, yˆt)) // update the weights
7: end if
8: end for
9: end for
10: return final weight vector w
For standard perceptron, you will use τ = 1, and for Passive-Aggressive (PA) algorithm, you
will compute the learning rate τ as follows.
τ =
1 − (w · F(xt
, yt) − w · F(xt
, yˆt))
kF(xt
, yt) − F(xt
, yˆt)k
2
(2)
2
You will use the Fashin MNIST data (https://github.com/zalandoresearch/fashion-mnist).
There is a fixed training and testing set.
Each example is a 28×28 grayscale image, associated with a label from 10 classes: 0 Tshirt/top, 1 Trouser, 2 Pullover, 3 Dress, 4 Coat, 5 Sandal, 6 Shirt, 7 Sneaker, 8 Bag, 9 Ankle
boot.
You will use ONLY training data for training and testing data for evaluation.
5.1 Binary Classification (40 points) Learn a binary classifier to classify even labels (0,
2, 4, 6, 8) and odd labels (1, 3, 5, 7, 9).
a. Compute the online learning curve for both Perceptron and PA algorithm by plotting the
number of training iterations (1 to 50) on the x-axis and the number of mistakes on the y-axis.
Compare the two curves and list your observations.
b. Compute the accuracy of both Perceptron and PA algorithm on the training data and
testing data for 20 training iterations. So you will have two accuracy curves for Perceptron
and another two accuracy curves for PA algorithm. Compare the four curves and list your
observations.
c. Repeat experiment (b) with averaged perceptron. Compare the test accuracies of plain
perceptron and averaged perceptron. What did you observe?
d. Compute the general learning curve (vary the number of training examples starting from
5000 in the increments of 5000) for 20 training iterations. Plot the number of training examples on x-axis and the testing accuracy on the y-axis. List your observations from this curve.
5.2 Multi-Class Classification (40 points) Learn a multi-class classifier to map images
to the corresponding fashion label.
a. Compute the online learning curve for both Perceptron and PA algorithm by plotting the
number of training iterations (1 to 50) on the x-axis and the number of mistakes on the y-axis.
Compare the two curves and list your observations.
b. Compute the accuracy of both Perceptron and PA algorithm on the training data and
testing data for 20 training iterations. So you will have two accuracy curves for Perceptron
and another two accuracy curves for PA algorithm. Compare the four curves and list your
observations.
c. Repeat experiment (b) with averaged perceptron. Compare the test accuracies of plain
perceptron and averaged perceptron. What did you observe?
d. Compute the general learning curve (vary the number of training examples starting from
5000 in the increments of 5000) for 20 training iterations. Plot the number of training examples on x-axis and the testing accuracy on the y-axis. List your observations from this curve.
3
3 Instructions for Submission
Please follow the below instructions. If you do not follow them, your homework will not be graded.
We will provide a dropbox folder link for the homework submission.
PDF submission. One PDF file with both answers for analytical part (Part I) and empirical
analysis questions with results/analysis (Part-II). Please label x-axis, y-axis, and name of the graphs
appropriately. Please name this file as WWSUID-LASTNAME.pdf (e.g., 111222-Fern.pdf).
Code submission. You will submit one zip file for your code as per the instructions below. If
your script and/or code does not execute, we will try to give some partial credit by looking at the
overall code contents.
• Mention the programming language and version (e.g., Python 2.5) that you used.
• Submit one folder with name WSUID-LASTNAME.zip (e.g., 111222-Fern.zip) and include a
README file.
• Include a script to run the code and it should be referred in the README file. Please make
sure that your script runs on a standard linux machine.
• Don’t submit the data folder. Assume there is a folder “data” with all the files.
• Output of your programs should be well-formatted in order to answer the empirical analysis
questions.
• Structure your code meaningfully and add comments to make it readable.
If you have collaborated or discussed the homework with some student, please provide this information with all the relevant details. If we find that the code of two different students has traces of
plagiarism, both students will be penalized and we will report the academic dishonesty case to graduate school (see https://communitystandards.wsu.edu/policies-and-reporting/academic-integritypolicy/). The bottom line is please DO NOT even think of going this route. It is very unpleasant
to deal with these things for both faculty, TA, and students involved.
4
4 Grading Rubric
Each question in the students work will be assigned a letter grade of either A,B,C,D, or F by the
Instructor and TAs. This five-point (discrete) scale is described as follows:
• A) Exemplary (=100%).
Solution presented solves the problem stated correctly and meets all requirements of the problem.
Solution is clearly presented.
Assumptions made are reasonable and are explicitly stated in the solution.
Solution represents an elegant and effective way to solve the problem and is not overly complicated than is necessary.
• B) Capable (=75%).
Solution is mostly correct, satisfying most of the above criteria under the exemplary category,
but contains some minor pitfalls, errors/flaws or limitations.
• C) Needs Improvement (=50%).
Solution demonstrates a viable approach toward solving the problem but contains some major
pitfalls, errors/flaws or limitations.
• D) Unsatisfactory (=25%)
Critical elements of the solution are missing or significantly flawed.
Solution does not demonstrate sufficient understanding of the problem and/or any reasonable
directions to solve the problem.
• F) Not attempted (=0%)
No solution provided.
The points on a given homework question will be equal to the percentage assigned (given by the
letter grades shown above) multiplied by the maximum number of possible points worth for that
question. For example, if a question is worth 6 points and the answer is awarded a B grade, then
that implies 4.5 points out of 6.