Description
Problem 1: SPAM, SPAM, HAM (50 pts)
For this problem, you will use the spam data set provided with this problem set. The data has
been divided into three pieces spam train.data, spam validation.data, and spam test.data. These
data sets were generated using the UCI Spambase data set (follow the link for information about
the format of the data). Note that the class label is the last column in the data set.
1. Primal SVMs
(a) Using gradient descent or quadratic programming, apply the SVM with slack formulation
to train a classifier for each choice of
c ∈ {1, 10, 102
, 103
, 104
, 105
, 106
, 107
, 108} without using any feature maps.
(b) What is the accuracy of the learned classifier on the training set for each value of c?
(c) Use the validation set to select the best value of c. What is the accuracy on the validation
set for each value of c?
(d) Report the accuracy on the test set for the selected classifier.
2. Dual SVMs with Gaussian Kernels
(a) Using quadratic programming, apply the dual of the SVM with slack formulation to
train a classifier for each choice of
c ∈ {1, 10, 102
, 103
, 104
, 105
, 106
, 107
, 108} using a Gaussian kernel with
σ
2 ∈ {.1, 1, 10, 100, 1000}.
(b) What is the accuracy of the learned classifier on the training set for each pair of c and
σ
2
?
(c) Use the validation set to select the best value of c and σ
2
. What is the accuracy on the
validation set for each pair of c and σ
2
?
(d) Report the accuracy on the test set for the selected classifier.
3. What is the accuracy of the k-nearest neighbor classifier for k = 1, 5, 11, 15, 21? You don’t
need to implement the k-dimensional tree version.
4. Which of these approaches (if any) should be preferred for this classification task? Explain.
1
Problem 2: Method of Lagrange Multipliers (20 pts)
Suppose that we modified the objective function in the SVM with slack formulation to be a quadratic
penalty instead of a linear penalty, that is minimize 1
2
||w||2+c
P
i
ξ
2
i
subject to the same constraints
as the standard SVM with slack. What is the dual of this new quadratic penalized SVM with slack
problem for a fixed c? Can the kernel trick still be applied?
Problem 3: Poisonous Mushrooms? (30 pts)
For this problem, you will use the mushroom data set provided with this problem set. The data has
been divided into two pieces mush train.data and mush test.data. These data sets were generated
using the UCI Mushroom data set (follow the link for information about the format of the data).
Note that the class label is the first column in the data set.
1. Assuming you break ties using the attribute that occurs first (left to right) in the data, draw
the resulting decision tree and report the maximum information gain for each node that you
added to the tree.
2. What is the accuracy of this decision tree on the test data?
3. Now consider arbitrary input data. Suppose that you decide to limit yourself to decision
trees of height one, i.e., only one split. Is the tree produced by the information gain heuristic
optimal on the training data (that is, no other decision tree has higher accuracy)?
2