## Description

Problem 1: Version Spaces [30 points]

In this problem we will investigate the geometry of version spaces for different hypotheses.

For all parts of this question, training instances are vectors in Z

2

, i.e. points on a 2D

lattice, with labels c ∈ {−1, 1}.

(a) Rectangular hypothesis Consider a hypothesis of the form (pT L, pBR), where pi =

{xi

, yi

|xi

, yi are integers and 0 ≤ xi

, yi ≤ 8} define the diagonally-opposite corners

of a rectangle (Top Left, Bottom Right, respectively). An instance (x, y) is classified

positive if it falls inside, or on the boundary of this rectangle, and negative otherwise.

Note, that the definition allows for degenerate cases, where two or four of the corners

overlap. In addition, allow for pT L and pBR to take on a special value ∅. If either

pT L = ∅ or pBR = ∅, all instances are classified as negative. This hypothesis definition

applies to problems 1a to 1d.

What is the size of this hypothesis space?

(b) Rectangular hypothesis A hypothesis h1 is considered more general than hypothesis

1-1

h2 (and h2 more specific than h1) if h2 implies h1 The most general (most specific)

hypothesis h

∗

is a hypothesis, such that no other hypothesis is more general (more

specific) than h

∗

.

Draw the most general hypothesis that satisfies the training data D1 in Figure 1.

Draw the most specific hypothesis.

(c) Rectangular hypothesis What is the size of the version space for the training data

D1?

(d) Rectangular hypothesis In a form of Active Learning, a learner can query the teacher

for more data. The goal of the learner would be to pick query instances, such that the

size of the version space is reduced the most (which means that the greatest number

of inconsitent hypotheses is pruned after observing the query instance). Consider

the following 3 candidates for a query instance: (3,7), (4,3), (4,6). Compute, for

each, the expected size of the version space after observing the training set D1 and

the label for that query instance (considering the two possible labels have an equal

chance of occuring). Between the three query candidates, is there a best choice?

x

y

Figure 1: Training set D1

(e) Decision Tree hypothesis Now consider a hypothesis space formed by all 1-level decision trees. Since all of the attributes for our data are integer-valued, consider

that splitting thresholds in our decision tree are also constrained to integers, and

each node in the tree splits on a single attribute, x or y, with a splitting criterion

attribute ≥ threshold. Give the size of the version space for a 1-level decision tree

and the training set D2. Note that in general multiple trees can represent the same

function, and we are interested in counting functions.

1-2

(f) Decision Tree hypothesis Now consider a hypothesis space formed by all decision

trees with 3 leaf nodes, with the same splitting criteria as above. What is the size of

the version-space now? Draw all hypotheses that belong to the version space in the

(x, y) space of Figure 2.

x

y

Figure 2: Training dataset D2

Problem 2: Regression with kNN [30 points]

In this problem, we will investigate the application of kNN to regression in a setting of

partial image completion. You have been hired by the NSA to help them reconstruct

corrupted surveillance photos of possible criminals. Due to a glitch in their cameras,

some images are partially blank. Your goal will be to provide the “best” completion of

the image using kNN.

Included with the assignment is a subset of the Olivetti face dataset. The dataset is a

collection of 64×64 grayscale images of peoples faces. This dataset was processed into the

SVMLight format for this assignment. This file format will be used for all programming

assignments in this course. In this file, each line corresponds to a training instance,

providing the label, and the feature-value pairs for that instance: