Description
1. Binary linear classifiers. Assume there are two possible labels, y = 1 or y = −1
associated with two features x1 and x2. We consider several different linear classifiers
yˆ = sign{x
T w} where x is derived from x1 and x2 and w are the classifier weights.
Define the decision boundary of the classifier as the set {x1, x2} for which x
T w = 0.
Let x2 be the vertical axis in your sketches and depict the interval 0 ≤ x1, x2 ≤ 1.
a) Classifier 1. Let x
T =
x1 x2
and assume w =
”
5
−2
#
.
i. Sketch the decision boundary in the x1-x2 plane.
ii. Does the decision boundary represent a subspace in R
2
? Why or why not? If
it represents a subspace, then find an orthonormal basis for the subspace.
b) Classifier 2. Let x
T =
x1 x2 1
and assume w =
5
−2
1
.
i. Sketch the decision boundary in the x1-x2 plane.
ii. Does the decision boundary represent a subspace in R
2
? Why or why not? If
it represents a subspace, then find an orthonormal basis for the subspace.
c) Classifier 3. Let x
T =
x
2
1 x2 1
and assume w =
1
−2
1
.
i. Sketch the decision boundary in the x1-x2 plane.
ii. Does the decision boundary represent a subspace in R
2
? Why or why not? If
it represents a subspace, then find an orthonormal basis for the subspace.
2. Linear Classifier. Download the script and the data file classifier data.mat. This
code trains linear classifiers using least squares. The scripts provided steps through
the problems below.
Make sure to ‘publish’ your results and store them as a PDF file for submission.
a) Classifier 1. Let x
T =
x1 x2
. Briefly comment on the fit of the classifier to
the decision boundary apparent in the evaluation data. Also identify the percent
error based on the ratio of misclassified evaluation data points to the total number
of evaluation data points.
b) Classifier 2. Consider squaring the original features, and also using them for
classification, so that: x
T =
x
2
1 x
2
2 x1 x2 1
. This will allow for a curved
decision boundary. Briefly comment on the fit of the classifier to the decision
boundary apparent in the evaluation data. Also identify the percent error based on
the ratio of misclassified evaluation data points to the total number of evaluation
data points.
c) Shortcoming of training using least squares as a loss function. Training a classifier
using the squared error as a loss function can fail when correctly labeled data
points lie far from the decison boundary. A new dataset consisting of the first
dataset, plus 1000 (correctly labeled) datapoints at x1 = 0, x2 = 3 is created.
What happens to the decision boundary when these new data points are included
in training? What happens to the error rate if you move the 1000 data points to
x1 = 0, x2 = 10? Why does this happen?
3. Overfitting. Download the dataset overfitting data.mat. You may find it helpful
to adapt the code from the previous problem. The dataset has 50 data points for
training, and 10,000 data points to be used for evaluation of the classifier. Each data
point consists a two-dimensional feature vector x and a label y ∈ {−1, 1}. The feature
vector is a “noisy” version of the true underlying feature, which blurs the boundary
between classes.
a) Plot the training data using a scatter plot. Indicate the points y = −1 using one
color, and the points with y = 1 with another.
b) Plot the evaluation data using a scatter plot. Indicate the points labeled −1 using
one color, and the points labeled +1 with another.
c) Classifier 1. As before, x =
x1 x2
T
, and y = sign(x
T w).
i. Train the classifier using least squares to find the classifier weights w. Apply
the classifier to the evaluation data, and plot the data points using a scatter
plot with different colors for different predicted labels.
ii. Plot the correctly predicted evaluation data points using one color, and incorrectly predicted points using a second color. How many errors are there?
d) Classifier 2. Let x =
x
2
1 x
2
2 x1 x2 1
T
, and y = sign(x
T w).
i. Train the classifier using least squares to find the classifier weights w. Apply
the classifier to the evaluation data, and plot the data points using a scatter
plot with different colors for different predicted labels.
ii. Plot the correctly predicted evaluation data points using one color, and incorrectly predicted points using a second color. How many errors are there?
e) Classifier 3. Let x =
x
6
1 x
6
2 x
5
1 x
5
2
. . . x1 x2 1
T
, and y = sign(x
T w).
i. Train the classifier using least squares to find the classifier weights w. Apply
the classifier to the evaluation data, and plot the data points using a scatter
plot with different colors for different predicted labels.
ii. Plot the correctly predicted evaluation data points using one color, and incorrectly predicted points using a second color. How many errors are there?
f) Of the three classifiers, which one performs worse, and why?
4. A binary linear classifier based on three features x1, x2, and x3 is ˆy = sign{x
T w}
where x
T =
x1 x2 x3
. Hence the decision boundary is the set {x1, x2, x3} for
which x
T w = 0.
The decision boundary for a two-dimensional classifier is a line. What type of geometric
object is the decision boundary in three dimensions?
5. A decision boundary for a classification problem involving features x1, x2, and x3 is
defined as x
T w = 0 where x
T =
x1 x2 x3 1
. Find w so that the decision
boundary is parallel to the x1-x2 plane and includes the point (x1, x2, x3) = (0, 0, 1).