Description
1 Warm up: Feature expansion
[10 points] Recall problem 2 in Homework 2, where we saw the concept class C consisting of
functions function fr defined by an integer radius r (with 1 ≤ r ≤ 128) as follows:
fr(x1, x2) =
+1 x
2
1 + x
2
2 ≤ r
2
;
−1 otherwise (1)
Clearly the hypothesis class is not linearly separable in < 2 . Construct a function φ(x1, x2) that maps examples to a new space, such that the positive and negative examples are linearly separable in that space. (Note that φ should not depend on r). 2 PAC Learning 1. [20 points total] A factory assembles a product that consist of different parts. Suppose a robot was invented to recognize whether a product contains all the right parts. The rules of making products are very simple: 1) you are free to combine any of the parts 1 as they are 2) you may also cut any of the parts into two distinct pieces before using them. You wonder how much effort a robot would need to figure out the what parts are used in the product. (a) [5 points] Suppose that a naive robot has to recognize products made using only rule 1. Given N available parts and each product made out of these constitutes a distinct hypothesis. How large would the hypothesis space be? Brief explain your answer. (b) [5 points] Suppose that an experienced worker follows both rules when making a product. How large is the hypothesis space now? Explain. (c) [10 points] An experienced worker decides to train the naive robot to discern the makeup of a product by showing you the product samples he has assembled. There are 6 available parts. If the robot would like to learn any product at 0.01 error with probability 99%, how many examples would the robot have to see? 2. [20 points, from Tom Mitchell’s book] We have learned an expression for the number of training examples sufficient to ensure that every hypothesis will have true error no worse than plus its observed training error errorD(h). In particular, we used Hoeffding bounds to derive m ≥ 1 2 2 (ln(|H|) + ln(1/δ)). Derive an alternative expression for the number of training examples sufficient to ensure that every hypothesis will have true error no worse than (1 + γ)errorD(h). You can use general Chernoff bounds to derive such a result. Chernoff bounds: Suppose X1, · · · , Xm are the outcomes of m independent coin flips (Bernoulli trials), where the probability of heads on any single trail is P r[Xi = 1] = p and the probability of tails is P r[Xi = 0] = 1 − p. Define S = X1 + X2 + · · · + Xm to be the sum of these m trials. The expected value of S/m is E[S/m] = p. The Chernoff bounds govern the probabilty that S/m will differ from p by some factor 0 ≤ γ ≤ 1. P r[S/m > (1 + γ)p] ≤ e
−mpγ2/3
P r[S/m < (1 − γ)p] ≤ e
−mpγ2/2
(2)
3 VC Dimension
In this problem, we investigate a few properties of the Vapnik-Chervonenkis dimension.
1. [Shattering, 10 points] Recall the definition of shattering from class: a concept class
shatters a set of points if for any labeling of those points, there is some function in the
class that correctly labels it. Let C be the set of all conjunctions of n Boolean variables.
Find a set S ⊆ {0, 1}
n
consisting of exactly n examples that can be shattered by C.
Prove the correctness of your answer.
2
2. [10 points] Show that a finite concept class C has VC dimension at most log |C|. Hint:
You can prove this by contradiction.
3. [15 points] We have a learning problem where each example is a point in <
2
. The
concept class H is defined as follows: A function h ∈ H is specified by two parameters
a and b. An example x = {x1, x2} in <
2
is labeled as + if and only if x1 ≥ a and
x2 ≤ b and is labeled − otherwise.
For example, if we set a = 1, b = 4, the grey region in figure 1 is the region of
x = {x1, x2} that has label +1.
What is the VC dimension of this class?
x1
x2
x1 = 1 x2 = 4
Figure 1: An example with a = 1, b = 4. All points in the gray region (extending infinitely)
shows the region that will be labeled as positive.
4. [15 points] Determine the VC dimension of concept class consisting of the union of 2
intervals on the real line (that is, points within either of the intervals will be labeled
as positive).
5. [For 6350 Students, 10 points] Generalize the result of the above question. Find
the VC dimension of the union of k intervals on the real line.
6. [For 6350 Students, 15 points] Let two hypothesis classes H1 and H2 satisfy H1 ⊆
H2. Prove: V C(H1) ≤ V C(H2).
What To Submit
1. Your assignment should be no more than 10 pages. X points will be deducted if
your submission is X pages beyond the page limit.
3
2. Your report should be in the form of a pdf file, LATEX is recommended.
3. Please look up the late policy on the course website.
4