ECE421 Introduction to ML Assignment 3

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (8 votes)

Problem 1 The goal of this problem is to derive the parameters of a support vector
machine with a hard margin. We will consider the following toy dataset of 4 points: x(1) =
(1, 1)⊤, x(2) = (−1, −1)⊤, x(3) = (1, 0)⊤, and x(4) = (0, 1)⊤. x(1) is the only point from
the positive class, whereas x(2)
, x(3)
, and x(4) are from the negative class. Put another way,
y(1) = 1 and y(2) = y(3) = y(4) = −1.
1. (2 points) Plot the data and draw the maximum-margin hyperplane. By inspecting the
plot, give the equation of this hyperplane. You will use it to verify your answers at the
end of this problem.
2. (1 point) Which of the points are support vectors?
3. (1 point) Recall the optimization problem we defined for the SVM with a hard margin:
min w!,b
1
2
“w!”2 s.t. ∀j(w! · x(j) + b)y(j) ≥ 1 (1)
Write down the Lagrangian L(w!, b, α!) corresponding to this problem, where α! is the
vector of dual variables.
Fall 2022 – Introduction to ML Assignment 3 – Page 2 of 3 Oct 20
4. (2 points) We can therefore rewrite Equation 1 as follows:
min w!,b
max
α! L(w!, b, α!)
Because Slater’s condition holds, this is equivalent to solving:
max
α! min w!,b
L(w!, b, α!) (2)
Given a fixed α!, solve the inner (minimization) problem. You should obtain two relationships taking the form of
A! = !
j
BjCjD!j
!
j
EjFj = G
5. (2 points) Simplify Equation 2 with the result of question 4. Show that it takes the
form of
max
α!
!
j
Aj − 1
2
!
j
!
k
BjCkDjEkF! · G!
Note that the letters used in this question are not related to the letters used in the
previous question.
6. (1 point) We now come back to our toy dataset. Because it has 4 trainings points, we
have j ∈ 1..4, i.e., four dual variables. For points that are not support vectors, recall
from class that the corresponding constraints in Equation 1 can be removed. Deduce the
value of one of the dual variable αj corresponding to one value of j (i.e., one constraint)
which we will write j∗ to not reveal the answer. Follow indexes introduced in the problem
statement above when describing the dataset.
7. (1 point) Deduce from the second expression found in question 4 (i.e.,

j EjFj = G),
a relationship between all remaining dual variables αj for j ∕= j∗
8. (3 points) Solving the optimization problem from question 5, and using question 7 in
addition, find values for all dual variables αj for j ∈ 1..4.
9. (1 point) Deduce the numerical values of the two components of w!.
10. (2 points) Through an analysis of constraints in Equation 1 which are tight, deduce the
numerical value of b. Hint: recall the role of support vectors here.
11. (0 points) Check that the hyperplane you found analytically is the same than the one
you found geometrically in the first question.
Fall 2022 – Introduction to ML Assignment 3 – Page 3 of 3 Oct 20
Problem 2 – Binary linear classification vs. SVM For this problem, we will empirically compare the performance of a binary linear classifier to a SVM. Because the purpose
of this problem is to compare the performance of the two classifiers, you can use their implementation in sklearn. However, note that (a) you should understand how to implement
these classifiers from scratch and (b) you may be evaluated on this later in the course.
Consider the iris dataset included in sklearn. For all questions of this problem (except
the last question), we will only consider the first 100 entries of the dataset. With the help
of train test split from sklearn.model selection, split the dataset into a training set and a
test set. For now, you can do so by setting the argument test size to 0.8 in train test split.
To ensure results below are comparable and reproducible, set the random state argument
of your train test split calls to 0: this will control the shuffling applied to the data before
applying the split.
1. (2 points) Implement a binary linear classifier on the first two dimensions (sepal length
and width) of the iris dataset and plot its decision boundary. (Hint: sklearn refers to
the binary linear classifier as a LogisticRegression, we will see why later in the course.)
2. (1 point) Report the accuracy of your binary linear classifier on both the training and
test sets.
3. (2 points) Implement a linear SVM classifier on the first two dimensions (sepal length
and width). Plot the decision boundary of the classifier and its margins.
4. (1 point) Circle the support vectors. Please justify how to identify them through the
duality theorem. (hint: KKT condition)
5. (1 point) Report the accuracy of your linear SVM classifier on both the training and
test sets.
6. (1 point) What is the value of the margin? Justify your answer.
7. (1 point) Which vector is orthogonal to the decision boundary?
8. (3 points) Split the iris dataset again in a training and test set, this time setting test size
to 0.4 when calling train test split. Train the SVM classifier again. Does the decision
boundary change? How about the test accuracy? Please justify why (hint: think about
the support vectors), and illustrate your argument with a new plot.
9. (1 point) Do the binary linear classifier and SVM have the same decision boundaries?
10. (3 points) Now consider all 150 entries in the iris dataset, and retrain the SVM. You
should find that the data points are not linearly separable. How can you deal with it?
Justify your answer and plot the decision boundary of your new proposed classifier.

∗ ∗