Description
Problem 3.1 (Nearest neighbor predictors and Voronoi sets)
Nearest neighbor, k-nearest neighbor, and tree predictors are piecewise constant functions. This
means that we can partition Rd
into N regions, R1, . . . , RN , and we have g(x) = ˆyk for all
x ∈ Rk. The regions don’t overlap, except for their boundaries, and every point in Rd
is in one
of the regions. (We are not concerned with how these predictors work when a point is right on
the boundary between two regions; in any case, whichever value the predictor takes on these
boundaries makes no difference at all in practice.) In this problem we explore this idea.
(a) The set of points closer to one given point than another. Suppose u and v are given vectors
in Rd
, with u 6= v. We define the set of all points in Rd
that are closer to u ∈ Rd
than v ∈ Rd as
S(u, v) =
x ∈ Rd
| kx − uk2 ≤ kx − vk2
. Show that S(u, v) is a halfspace, which has the form
S(u, v) =
x ∈ Rd
| a
T x ≤ b
. (You should say explicitly what the vector a is, and what the
scalar b is.) The boundary of S(u, v) is a hyperplane, i.e., the set of points that satisfy a
T x = b.
Sketch this for the specific example with u = (1, 1) and v = (2, 4). Show the points u and v,
shade the set S(u, v), and indicate its boundary, which is a line (since a hyperplane in R2
is a
line).
(b) Voronoi sets. Suppose u
1
, . . . , um are given points in Rd
. Define Vi
, the set of points in Rd
closer to x
i
than the others (i.e., u
i
is the nearest neighbor of all points in Vi ), for i = 1, . . . , m.
We can express these sets as
Vi =
\
j6=i
S
u
i
, uj
,
i.e., Vi
is the intersection of the m−1 halfspaces S
u
i
, uj
for j 6= i. An intersection of halfspaces
is called a polyhedron. The specific polyhedra Vi are called Voronoi sets or Voronoi regions
associated with the collection of points u
1
, . . . , um. (Polyhedra is the plural of polyhedron.)
They form a partition of Rd
into a set of polyhedra, called the Voronoi partition of Rd
. Sketch
the Voronoi regions for the collection of points (1, 1),(2, 4),(5, 3).
(c) Nearest neighbor predictor. Let g be the 1-nearest neighbor predictor for the data set
x
1
, . . . , xn
, y1
, . . . , yn
. Explain why g has the form g(x) = y
k
for x ∈ Vk, where Vk is the Voronoi
region associated with x
k
. In other words, g is piecewise constant, with the same value inside
each of the Voronoi regions associated with the vectors x
i
, i = 1, . . . , n.
(d) k-nearest neighbor predictor. Let g(x) be the k-nearest neighbor predictor for the data set
x
1
, . . . , xn
, y1
, . . . , yn
. Explain why g is piecewise constant on a regions that are polyhedra. You
can do this for the case k = 2, to make things simpler.
Hint. The set of points x for which x
q and x
l are its two nearest neighbors has the form
Rql =
n
x ∈ Rd
| kx − x
q
k2 ≤
x − x
j
2
,
x − x
l
2
≤
x − x
j
2
, j 6= q or l
o
.
This is an intersection of 2(n − 1) halfspaces. Hint. Drawing these sets in in 2D is encouraged.
You can use such drawings as part of your explanation.
Problem 3.2 (Multiclass exponential loss)
For a K-class classification problem, consider the coding Y = (Y1, . . . , YK)
T with
Yk =
(
1, if G = Gk
−
1
K−1
, otherwise
Let f = (f1, . . . , fK)
T with PK
k=1 fk = 0, and define
L(Y, f) = exp
−
1
K
Y
T
f
(a) Using Lagrange multipliers, derive the population minimizer f
∗ of E(Y, f), subject to the
zero-sum constraint, and relate these to the class probabilities.
(b) Show that a multiclass boosting using this loss function leads to a reweighting algorithm
similar to Adaboost
Problem 3.3 (Back propagation for cross-entropy loss function)
Derive the forward and backward propagation equations for the cross-entropy loss function.
2