Problem 1: Version Spaces [30 points]
In this problem we will investigate the geometry of version spaces for different hypotheses.
For all parts of this question, training instances are vectors in Z
2
, i.e. points on a 2D
lattice, with labels c ∈ {−1, 1}.
(a) Rectangular hypothesis Consider a hypothesis of the form (pT L, pBR), where pi =
{xi
, yi
|xi
, yi are integers and 0 ≤ xi
, yi ≤ 8} define the diagonally-opposite corners
of a rectangle (Top Left, Bottom Right, respectively). An instance (x, y) is classified
positive if it falls inside, or on the boundary of this rectangle, and negative otherwise.
Note, that the definition allows for degenerate cases, where two or four of the corners
overlap. In addition, allow for pT L and pBR to take on a special value ∅. If either
pT L = ∅ or pBR = ∅, all instances are classified as negative. This hypothesis definition
applies to problems 1a to 1d.
What is the size of this hypothesis space?
(b) Rectangular hypothesis A hypothesis h1 is considered more general than hypothesis
1-1
h2 (and h2 more specific than h1) if h2 implies h1 The most general (most specific)
hypothesis h
∗
is a hypothesis, such that no other hypothesis is more general (more
specific) than h
∗
.
Draw the most general hypothesis that satisfies the training data D1 in Figure 1.
Draw the most specific hypothesis.
(c) Rectangular hypothesis What is the size of the version space for the training data
D1?
(d) Rectangular hypothesis In a form of Active Learning, a learner can query the teacher
for more data. The goal of the learner would be to pick query instances, such that the
size of the version space is reduced the most (which means that the greatest number
of inconsitent hypotheses is pruned after observing the query instance). Consider
the following 3 candidates for a query instance: (3,7), (4,3), (4,6). Compute, for
each, the expected size of the version space after observing the training set D1 and
the label for that query instance (considering the two possible labels have an equal
chance of occuring). Between the three query candidates, is there a best choice?
x
y
Figure 1: Training set D1
(e) Decision Tree hypothesis Now consider a hypothesis space formed by all 1-level decision trees. Since all of the attributes for our data are integer-valued, consider
that splitting thresholds in our decision tree are also constrained to integers, and
each node in the tree splits on a single attribute, x or y, with a splitting criterion
attribute ≥ threshold. Give the size of the version space for a 1-level decision tree
and the training set D2. Note that in general multiple trees can represent the same
function, and we are interested in counting functions.
1-2
(f) Decision Tree hypothesis Now consider a hypothesis space formed by all decision
trees with 3 leaf nodes, with the same splitting criteria as above. What is the size of
the version-space now? Draw all hypotheses that belong to the version space in the
(x, y) space of Figure 2.
x
y
Figure 2: Training dataset D2
Problem 2: Regression with kNN [30 points]
In this problem, we will investigate the application of kNN to regression in a setting of
partial image completion. You have been hired by the NSA to help them reconstruct
corrupted surveillance photos of possible criminals. Due to a glitch in their cameras,
some images are partially blank. Your goal will be to provide the “best” completion of
the image using kNN.
Included with the assignment is a subset of the Olivetti face dataset. The dataset is a
collection of 64×64 grayscale images of peoples faces. This dataset was processed into the
SVMLight format for this assignment. This file format will be used for all programming
assignments in this course. In this file, each line corresponds to a training instance,
providing the label, and the feature-value pairs for that instance: