CS/ECE/ME532 Activity 19


Category: You will Instantly receive a download link for .zip solution file upon Payment


5/5 - (1 vote)

1. You have two feature vectors x1 =

, x2 =

and corresponding labels
d1 = −1, d2 = 1. The linear classifier is sign{x
i w} where w =


a) Find the squared error loss for this classifier.

b) Find the hinge loss for this classifier.

2. You have four data points x1 = 2, x2 = 1.5, x3 = 1/2, x4 = −1/2 and corresponding
labels y1 = 1, y2 = 1, y3 = −1, y4 = −1.

a) Find a maximum margin linear classifier for this data. Hint: Graph the data.

b) Use squared-error loss to train the classifier (with the help of Python). Does this
classifier make any errors?

c) Find a classifier with zero hinge loss. Hint: Use what you’ve learned about hinge
loss, not computation. Does this classifier make any errors?

d) Now suppose x4 = −5. Use squared-error loss to find the classifier (with the help
of Python). Does this classifier make any errors?

e) Can you still find a classifier with zero hinge loss when x4 = −5? Does it make
any errors?

3. Previously, we examined the performance of classifiers trained using the squared error
loss function (i.e, trained using least squares). This problem uses an off-the-shelf Linear
Support Vector Machine to train a binary linear classifier.

The data set is divided into training and test data sets. In order to represent a decision
boundary that may not pass through the origin, we can consider the feature vector
T =

x1 x2 1


a) Classifier using off the shelf SVM. Code is provided to train a classifier using
an off the shelf SVM with hinge loss. Run the code to find the linear classifier
weights. Next, uses the weights to predict the class of the test data. How many
classification errors occur?

b) Comment out the code that trains the classifier using the linear SVM, and uncomment the code that train the classifier using least squares (i.e, wopt = (XTX)
−1XT y).
How many errors occur on the test set?

c) Training a classifier using the squared error as a loss function can fail when
correctly labeled data points lie far from the decision boundary. Linear SVMs
trained with hinge loss are not susceptible to the same problem. A new dataset
consisting of the first dataset, plus 1000 (correctly labeled) datapoints at x1 =
0, x2 = 10 is created. What happens to the decision boundary when these new
data points are included in training the linear SVM?

d) How does this compare with the error rate of the linear classifier trained with the
new data points? Why is the such a difference in performance?