## Description

Problem 3.1 (Nearest neighbor predictors and Voronoi sets)

Nearest neighbor, k-nearest neighbor, and tree predictors are piecewise constant functions. This

means that we can partition Rd

into N regions, R1, . . . , RN , and we have g(x) = ˆyk for all

x ∈ Rk. The regions don’t overlap, except for their boundaries, and every point in Rd

is in one

of the regions. (We are not concerned with how these predictors work when a point is right on

the boundary between two regions; in any case, whichever value the predictor takes on these

boundaries makes no difference at all in practice.) In this problem we explore this idea.

(a) The set of points closer to one given point than another. Suppose u and v are given vectors

in Rd

, with u 6= v. We define the set of all points in Rd

that are closer to u ∈ Rd

than v ∈ Rd as

S(u, v) =

x ∈ Rd

| kx − uk2 ≤ kx − vk2

. Show that S(u, v) is a halfspace, which has the form

S(u, v) =

x ∈ Rd

| a

T x ≤ b

. (You should say explicitly what the vector a is, and what the

scalar b is.) The boundary of S(u, v) is a hyperplane, i.e., the set of points that satisfy a

T x = b.

Sketch this for the specific example with u = (1, 1) and v = (2, 4). Show the points u and v,

shade the set S(u, v), and indicate its boundary, which is a line (since a hyperplane in R2

is a

line).

(b) Voronoi sets. Suppose u

1

, . . . , um are given points in Rd

. Define Vi

, the set of points in Rd

closer to x

i

than the others (i.e., u

i

is the nearest neighbor of all points in Vi ), for i = 1, . . . , m.

We can express these sets as

Vi =

\

j6=i

S

u

i

, uj

,

i.e., Vi

is the intersection of the m−1 halfspaces S

u

i

, uj

for j 6= i. An intersection of halfspaces

is called a polyhedron. The specific polyhedra Vi are called Voronoi sets or Voronoi regions

associated with the collection of points u

1

, . . . , um. (Polyhedra is the plural of polyhedron.)

They form a partition of Rd

into a set of polyhedra, called the Voronoi partition of Rd

. Sketch

the Voronoi regions for the collection of points (1, 1),(2, 4),(5, 3).

(c) Nearest neighbor predictor. Let g be the 1-nearest neighbor predictor for the data set

x

1

, . . . , xn

, y1

, . . . , yn

. Explain why g has the form g(x) = y

k

for x ∈ Vk, where Vk is the Voronoi

region associated with x

k

. In other words, g is piecewise constant, with the same value inside

each of the Voronoi regions associated with the vectors x

i

, i = 1, . . . , n.

(d) k-nearest neighbor predictor. Let g(x) be the k-nearest neighbor predictor for the data set

x

1

, . . . , xn

, y1

, . . . , yn

. Explain why g is piecewise constant on a regions that are polyhedra. You

can do this for the case k = 2, to make things simpler.

Hint. The set of points x for which x

q and x

l are its two nearest neighbors has the form

Rql =

n

x ∈ Rd

| kx − x

q

k2 ≤

x − x

j

2

,

x − x

l

2

≤

x − x

j

2

, j 6= q or l

o

.

This is an intersection of 2(n − 1) halfspaces. Hint. Drawing these sets in in 2D is encouraged.

You can use such drawings as part of your explanation.

Problem 3.2 (Multiclass exponential loss)

For a K-class classification problem, consider the coding Y = (Y1, . . . , YK)

T with

Yk =

(

1, if G = Gk

−

1

K−1

, otherwise

Let f = (f1, . . . , fK)

T with PK

k=1 fk = 0, and define

L(Y, f) = exp

−

1

K

Y

T

f

(a) Using Lagrange multipliers, derive the population minimizer f

∗ of E(Y, f), subject to the

zero-sum constraint, and relate these to the class probabilities.

(b) Show that a multiclass boosting using this loss function leads to a reweighting algorithm

similar to Adaboost

Problem 3.3 (Back propagation for cross-entropy loss function)

Derive the forward and backward propagation equations for the cross-entropy loss function.

2