Description
IE531: Algorithms for Data Analytics
1 Introduction
We have n-many, d-dimensional vectors, {x1, x2, . . . , xn}, where ∀i, xi ∈
Rd
, and d is very large. We wish to find n-many, k-dimensional vectors,
{y1, y2, . . . , yn} where ∀i, yi ∈ Rk
, and k d is comparatively smaller than d.
Theorem 2.11 of the text as the following form –
Theorem 1.1 (Johnson-Lindenstrauss Lemma) For any ∈ (0, 1), and any
integer n, let k ≥
3
c2 ln n, with c as defined in Theorem 2.9. For any set of nmany vectors {x1, x2, . . . , xn} where xi ∈ Rd
, the random projection f : Rd →
Rk defined in section 2.7 of the text has the property that for all pairs xi and
xj , with probability at least 1 −
3
2n
,
(1 − )
√
kkxi − xjk ≤ kf(xi) − f(xj )k ≤ (1 + )
√
kkxi − xjk
(Note, I have replaced the text’s | • | with my k • k to denote length of the
vector-argument).
2 Discussion
For this programming assignment we will take a slightly different form/version
of the JL-Lemma. As before, we would like a mapping f : Rd → Rk
(where
k d) such that the distance between any two d-dimensional points x1, x2 ∈ Rd
is more-or-less preserved when we look at f(x1), f(x2) ∈ Rk
. We are going to
rewrite the requirement of theorem 1.1 as equation 1 shown below, where for
some (small) ∈ R, and ∀x1, x2 ∈ Rd
, we want
(1 − )kx1 − x2k ≤ kf(x1) − f(x2)k ≤ (1 + )kx1 − x2k (1)
As suggested by the text, you make k-many calls to a routine that generates
d-dimensional Gaussian RVs with zero-mean and an identity-covariance matrix.
Let us suppose this process gets us the vectors {u1, u2, . . . , uk}, where each
ui ∈