CS5590 Assignment 2 Foundations of Machine Learning

$30.00

Category: Tags: , , , , You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (7 votes)

Questions: Theory
1. Support Vector Machines: (4 marks) In the derivation for the Support Vector Machine,
we assumed that the margin boundaries are given by w.x+b = +1 and w.x+b = −1. Show
that, if the +1 and -1 on the right-hand side were replaced by some arbitrary constants +γ
and −γ where γ > 0, the solution for the maximum margin hyperplane is unchanged. (You
can show this for the hard-margin SVM without any slack variables.)
2. Support Vector Machines: (4 marks) Consider the half-margin of maximum-margin
SVM defined by ρ, i.e. ρ =
1
||w|| . Show that ρ is given by:
1
ρ
2
=
X
N
i=1
αi
1
where αi are the Lagrange multipliers given by the SVM dual (as on Slide 30 of the SVM
lecture uploaded on Piazza). (Hint: The answer involves just 3-4 steps, if you are thinking
of something longer, re-think!)
3. Kernels: (5 marks) Let k1 and k2 be valid kernel functions. Comment about the validity
of the following kernel functions, and justify your answer with proof or counter-examples as
required:
(a) k(x, z) = k1(x, z) + k2(x, z)
(b) k(x, z) = k1(x, z)k2(x, z)
(c) k(x, z) = h(k1(x, z)) where h is a polynomial function with positive co-efficients
(d) k(x, z) = exp(k1(x, z))
(e) k(x, z) = exp
−kx−zk
2
2
σ2

Questions: Programming
4. SVMs: (2 + 2 + 4 + 2 = 10 marks) In this question, you will be working on a
soft-margin SVM. You may find it helpful to review the Scikit Learn’s SVM documentation:
http://scikit-learn.org/stable/modules/svm.html.
We will apply soft-margin SVM to handwritten digits from the processed US Postal Service
Zip Code data set. The data (extracted features of intensity and symmetry) for training and
testing are available at:
• http://www.amlbook.com/data/zip/features.train
• http://www.amlbook.com/data/zip/features.test
In this dataset, the 1st column is digit label and 2nd and 3rd columns are the features. We
will train a one-versus-one (one digit is class +1 and another digit is class -1) classifier for
the digits ‘1’ (+1) and ‘5’ (-1). (In the original dataset, only consider data samples(rows)
with the label as either 1 or 5, for both train and test settings. Then for training details,
you may find this link at http://scikit-learn.org/stable/modules/svm.html helpful.)
(a) Consider the linear kernel K(xn, xm) = x
T
n xm. Train using the provided training data
and test using the provided test data, and report your accuracy over the entire test set,
and the number of support vectors.
(b) In continuation, train only using the first {50, 100, 200, 800} points with the linear
kernel. Report the accuracy over the entire test set, and the number of support vectors
in each of these cases.
(c) Consider the polynomial kernel K(xn, xm) = (1 + x
T
n xm)
Q, where Q is the degree of
the polynomial. Comparing Q = 2 with Q = 5, comment whether each of the following
statements is TRUE or FALSE.
i. When C = 0.0001, training error is higher at Q = 5.
ii. When C = 0.001, the number of support vectors is lower at Q = 5.
iii. When C = 0.01, training error is higher at Q = 5.
2
iv. When C = 1, test error is lower at Q = 5.
(d) Consider the radial basis function (RBF) kernel K(xn, xm) = e(−||xn − xm||2
) in the
soft-margin SVM approach. Which value of C ∈ {0.01, 1, 100, 104
, 106} results in the
lowest training error? The lowest test error? Show the error values for all the C values.
Deliverables:
• Code, Brief report (PDF) with your solutions for the above questions
5. SVMs (contd): (3 + 4 = 7 marks) GISETTE (https://archive.ics.uci.edu/ml/
datasets/Gisette) is a handwritten digit recognition problem. The problem is to separate
the highly confusible digits ‘4’ and ‘9’. This dataset is one of five datasets of the NIPS 2003
feature selection challenge. The dataset for this problem is large, so please budget time
accordingly for this problem.
(a) Standard run: Use all the 6000 training samples from the training set to train the
model, and test over all test instances, using the linear kernel. Report the train error,
test error, and number of support vectors.
(b) Kernel variations: In addition to the basic linear kernel, investigate two other standard
kernels: RBF (a.k.a. Gaussian kernel; set γ = 0.001), Polynomial kernel (set degree =
2, coef0 = 1; e.g, (1 + x
T x)
2
). Which kernel yields the lowest training error? Report
the train error, test error, and number of support vectors for both these kernels.
Deliverables:
• Code, Brief report (PDF) with your solutions for the above questions
3