Description
IE531: Algorithms for Data Analytics
1. (25 points) Tightness of the Chebyshev Bound: This problem is about discovering distributions where the upper-bounds of the Chebyshev Inequality is tight.
First, you are going to show (by example) that there is a discrete RV where this
bound is tight. Then, you are going to present a cogent argument (no need to be
super formal here!) that there can be no continuous RV where the Chebyshev
Bound it tight.
(a) (5 points) Show that the Chebyshev Bound is tight for the discrete RV X ∈
{−1, 0, 1}, where Prob(X = −1) = Prob(X = 1) =
1
2k
2
. That is, compute
E{X} and var(X) and plug it into the Chebyshev Bound and arrive at the
conclusion that Prob(| X |≥ 1) =
1
k
2
(b) (20 points) Show that there can be no continuous distribution over the
whole real axis where the Chebyshev Bound is tight.
2. (25 points) Unit-Ball in High Dimensions: We will use the `4-norm to define
the unit-ball as
B(1, d, 4) = {(x1, x2, . . . , xd) ∈ Rd
| x
4
1 + x
4
2 + · · · + x
4
d ≤ 1}
(a) (12.5 points) Suppose we define
S := {(x1, x2, . . . , xd) ∈ Rd
| x
4
1 + x
4
2 + · · · + x
4
d ≤
1
2
},
what fraction of the volume of B(1, d, 4) does S occupy?
(b) (12.5 points) For any c > 0, prove that the fraction of the volume of
B(1, d, 4) outside the slab
| x1 |≤
c
d
1/4
is at most 1
c
3
e
−c
4
/4
.
1
3. (25 points) Overlap of Spheres in High-Dimensions: Let x be a random sample
from the (surface of the) unit sphere in d-dimensions with the origin as center.
(a) (5 points) What is the value of E{x}?
(b) (5 points) What is component-wise variance of x? That is, for i ∈ {1, 2, . . . , d}
what is E{(xi − E{xi})
2
}?
(c) (5 points) Show that for any unit length vector u, the variance of the realvalued random variable u
T x is Pd
i=1
u
2
i
E{x
2
i
}. Using this, compute the variance and standard deviation of u
T x.
(d) (5 points) Given two unit-radius spheres in d-dimensional space whose centers are separated by a distance of a, show that the volume of their intersection is at most
8e
−a
2
(d−1)/8
a
√
d − 1
times the volume of each sphere.
(e) (5 points) From your solution to problem 3d, present a verbal argument
that supports the conclusion that if the inter-center separation of the two
spheres of radius r (r is not necessarily unity) is Ω(r/
√
d), then they share
very small mass. From this, make a cogent case for the conclusion that
given randomly generated points from the two distributions, one inside
each sphere, we can tell “which sphere contains which point” (i.e. classify we have a clustering algorithm that separates randomly generated data
into two spherical-groups)
4. (25 points) A Counterpoint to the Johnson-Lindenstrauss Lemma: Prove that
for every fixed dimension reduction matrix A ∈ Rk×d with k < d, there is a pair
of vectors x, y ∈ Rd
such that the distances between their images Ax and Ay is
hugely distorted (compared to the distance between x and y).
2