Homework #0 CSE 446: Machine Learning

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment

Description

5/5 - (2 votes)

Probability and Statistics

1. [2 points] (Bayes Rule, from Murphy exercise 2.4.) After your yearly checkup, the doctor has bad news and
good news. The bad news is that you tested positive for a serious disease, and that the test is 99% accurate
(i.e., the probability of testing positive given that you have the disease is 0.99, as is the probability of testing
negative given that you dont have the disease). The good news is that this is a rare disease, striking only one
in 10,000 people. What are the chances that you actually have the disease? (Show your calculations as well as
giving the final result.)

2. For any two random variables X, Y the covariance is defined as Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]. You
may assume X and Y take on a discrete values if you find that is easier to work with.

a. [1 points] If E[Y |X = x] = x show that Cov(X, Y ) = E[(X − E[X])2
].
b. [1 points] If X, Y are independent show that Cov(X, Y ) = 0.

3. Let X and Y be independent random variables with PDFs given by f and g, respectively. Let h be the PDF
of the random variable Z = X + Y .
a. [2 points] Show that h(z) = R ∞
−∞ f(x)g(z −x)dx. (If you are more comfortable with discrete probabilities,
you can instead derive an analogous expression for the discrete case, and then you should give a one
sentence explanation as to why your expression is analogous to the continuous case.).
b. [1 points] If X and Y are both independent and uniformly distributed on [0, 1] (i.e. f(x) = g(x) = 1 for
x ∈ [0, 1] and 0 otherwise) what is h, the PDF of Z = X + Y ?

4. [1 points] A random variable X ∼ N (µ, σ2
) is Gaussian distributed with mean µ and variance σ
2
. Given
that for any a, b ∈ R, we have that Y = aX + b is also Gaussian, find a, b such that Y ∼ N (0, 1).

5. [2 points] For a random variable Z, its mean and variance are defined as E[Z] and E[(Z−E[Z])2
], respectively.
Let X1, . . . , Xn be independent and identically distributed random variables, each with mean µ and variance
σ
2
. If we define µbn =
1
n
Pn
i=1 Xi
, what is the mean and variance of √
n(µbn − µ)?

6. If f(x) is a PDF, the cumulative distribution function (CDF) is defined as F(x) = R x
−∞ f(y)dy. For any
function g : R → R and random variable X with PDF f(x), recall that the expected value of g(X) is defined
as E[g(X)] = R ∞
−∞ g(y)f(y)dy. For a boolean event A, define 1{A} as 1 if A is true, and 0 otherwise. Thus,
1{x ≤ a} is 1 whenever x ≤ a and 0 whenever x > a. Note that F(x) = E[1{X ≤ x}]. Let X1, . . . , Xn be
independent and identically distributed random variables with CDF F(x). Define Fbn(x) = 1
n
Pn
i=1 1{Xi ≤ x}.

Note, for every x, that Fbn(x) is an empirical estimate of F(x). You may use your answers to the previous
problem.
a. [1 points] For any x, what is E[Fbn(x)]?
b. [1 points] For any x, the variance of Fbn(x) is E[(Fbn(x) − F(x))2
]. Show that Variance(Fbn(x)) =
F (x)(1−F (x))
n
.
1
c. [1 points] Using your answer to c, show that for all x ∈ R, we have E[(Fbn(x) − F(x))2
] ≤
1
4n
.
Linear Algebra and Vector Calculus
7. (Rank) Let A =


1 2 1
1 0 3
1 1 2

 and B =


1 2 3
1 0 1
1 1 2

. For each matrix A and B,
a. [2 points] what is its rank?
b. [2 points] what is a (minimal size) basis for its column span?

8. (Linear equations) Let A =


0 2 4
2 4 2
3 3 1

, b =

−2 −2 −4
T
, and c =

1 1 1T
.
a. [1 points] What is Ac?
b. [2 points] What is the solution to the linear system Ax = b? (Show your work).

9. (Hyperplanes) Assume w is an n-dimensional vector and b is a scalar. A hyperplane in R
n is the set
{x : x ∈ R
n, s.t. w
T x + b = 0}.
a. [1 points] (n = 2 example) Draw the hyperplane for w = [−1, 2]T
, b = 2? Label your axes.
b. [1 points] (n = 3 example) Draw the hyperplane for w = [1, 1, 1]T
, b = 0? Label your axes.
c. [2 points] Given some x0 ∈ R
n, find the squared distance to the hyperplane defined by w
T x + b = 0. In
other words, solve the following optimization problem:
min
x
kx0 − xk
2
s.t. w
T x + b = 0
(Hint: if xe0 is the minimizer of the above problem, note that kx0 − xe0k = |
w
T (x0−xe0)
kwk
|. What is w
T xe0?)

10. For possibly non-symmetric A, B ∈ R
n×n and c ∈ R, let f(x, y) = x
T Ax + y
T Bx + c. Define ∇zf(x, y) =
h
∂f(x,y)
∂z1
∂f(x,y)
∂z2
. . .
∂f(x,y)
∂zn
iT

.
a. [2 points] Explicitly write out the function f(x, y) in terms of the components Ai,j and Bi,j using appropriate summations over the indices.
b. [2 points] What is ∇xf(x, y) in terms of the summations over indices and vector notation?
c. [2 points] What is ∇yf(x, y) in terms of the summations over indices and vector notation?

Programming

11. For the A, b, c as defined in Problem 8, use NumPy to compute (take a screen shot of your answer):
a. [2 points] What is A−1
?
b. [1 points] What is A−1
b? What is Ac?

12. [4 points] Two random variables X and Y have equal distributions if their CDFs, FX and FY , respectively,
are equal, i.e. for all x, |FX(x) − FY (x)| = 0. The central limit theorem says that the sum of k independent,
zero-mean, variance-1/k random variables converges to a (standard) Normal distribution as k goes off to infinity.
We will study this phenomenon empirically (you will use the Python packages Numpy and Matplotlib). Define
Y
(k) = √
1
k
Pk
i=1 Bi where each Bi
is equal to −1 and 1 with equal probability. From your solution to problem
5, we know that √
1
k
Bi
is zero-mean and has variance 1/k.
2

a. For i = 1, . . . , n let Zi ∼ N (0, 1). If F(x) is the true CDF from which each Zi
is drawn (i.e., Gaussian)
and Fbn(x) = 1
n
Pn
i=1 1{Zi ≤ x), use the answer to problem 1.5 above to choose n large enough such that,
for all x ∈ R,
q
E[(Fbn(x) − F(x))2] ≤ 0.0025, and plot Fbn(x) from −3 to 3.
(Hint: use Z=numpy.random.randn(n) to generate the random variables, and import matplotlib.pyplot
as plt;
plt.step(sorted(Z), np.arange(1,n+1)/float(n)) to plot).
b. For each k ∈ {1, 8, 64, 512} generate n independent copies Y
(k) and plot their empirical CDF on the same
plot as part a.
(Hint: np.sum(np.sign(np.random.randn(n, k))*np.sqrt(1./k), axis=1) generates n of the Y
(k)
random variables.)
Be sure to always label your axes. Your plot should look something like the following (Tip: checkout seaborn
for instantly better looking plots.)
3