- Home
- Uncategorized
- Homework #0 CSE 446: Machine Learning

$30.00

Category: Uncategorized

Description

5/5 - (2 votes)

Probability and Statistics

1. [2 points] (Bayes Rule, from Murphy exercise 2.4.) After your yearly checkup, the doctor has bad news and

good news. The bad news is that you tested positive for a serious disease, and that the test is 99% accurate

(i.e., the probability of testing positive given that you have the disease is 0.99, as is the probability of testing

negative given that you dont have the disease). The good news is that this is a rare disease, striking only one

in 10,000 people. What are the chances that you actually have the disease? (Show your calculations as well as

giving the final result.)

2. For any two random variables X, Y the covariance is defined as Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]. You

may assume X and Y take on a discrete values if you find that is easier to work with.

a. [1 points] If E[Y |X = x] = x show that Cov(X, Y ) = E[(X − E[X])2

].

b. [1 points] If X, Y are independent show that Cov(X, Y ) = 0.

3. Let X and Y be independent random variables with PDFs given by f and g, respectively. Let h be the PDF

of the random variable Z = X + Y .

a. [2 points] Show that h(z) = R ∞

−∞ f(x)g(z −x)dx. (If you are more comfortable with discrete probabilities,

you can instead derive an analogous expression for the discrete case, and then you should give a one

sentence explanation as to why your expression is analogous to the continuous case.).

b. [1 points] If X and Y are both independent and uniformly distributed on [0, 1] (i.e. f(x) = g(x) = 1 for

x ∈ [0, 1] and 0 otherwise) what is h, the PDF of Z = X + Y ?

4. [1 points] A random variable X ∼ N (µ, σ2

) is Gaussian distributed with mean µ and variance σ

2

. Given

that for any a, b ∈ R, we have that Y = aX + b is also Gaussian, find a, b such that Y ∼ N (0, 1).

5. [2 points] For a random variable Z, its mean and variance are defined as E[Z] and E[(Z−E[Z])2

], respectively.

Let X1, . . . , Xn be independent and identically distributed random variables, each with mean µ and variance

σ

2

. If we define µbn =

1

n

Pn

i=1 Xi

, what is the mean and variance of √

n(µbn − µ)?

6. If f(x) is a PDF, the cumulative distribution function (CDF) is defined as F(x) = R x

−∞ f(y)dy. For any

function g : R → R and random variable X with PDF f(x), recall that the expected value of g(X) is defined

as E[g(X)] = R ∞

−∞ g(y)f(y)dy. For a boolean event A, define 1{A} as 1 if A is true, and 0 otherwise. Thus,

1{x ≤ a} is 1 whenever x ≤ a and 0 whenever x > a. Note that F(x) = E[1{X ≤ x}]. Let X1, . . . , Xn be

independent and identically distributed random variables with CDF F(x). Define Fbn(x) = 1

n

Pn

i=1 1{Xi ≤ x}.

Note, for every x, that Fbn(x) is an empirical estimate of F(x). You may use your answers to the previous

problem.

a. [1 points] For any x, what is E[Fbn(x)]?

b. [1 points] For any x, the variance of Fbn(x) is E[(Fbn(x) − F(x))2

]. Show that Variance(Fbn(x)) =

F (x)(1−F (x))

n

.

1

c. [1 points] Using your answer to c, show that for all x ∈ R, we have E[(Fbn(x) − F(x))2

] ≤

1

4n

.

Linear Algebra and Vector Calculus

7. (Rank) Let A =

1 2 1

1 0 3

1 1 2

and B =

1 2 3

1 0 1

1 1 2

. For each matrix A and B,

a. [2 points] what is its rank?

b. [2 points] what is a (minimal size) basis for its column span?

8. (Linear equations) Let A =

0 2 4

2 4 2

3 3 1

, b =

−2 −2 −4

T

, and c =

1 1 1T

.

a. [1 points] What is Ac?

b. [2 points] What is the solution to the linear system Ax = b? (Show your work).

9. (Hyperplanes) Assume w is an n-dimensional vector and b is a scalar. A hyperplane in R

n is the set

{x : x ∈ R

n, s.t. w

T x + b = 0}.

a. [1 points] (n = 2 example) Draw the hyperplane for w = [−1, 2]T

, b = 2? Label your axes.

b. [1 points] (n = 3 example) Draw the hyperplane for w = [1, 1, 1]T

, b = 0? Label your axes.

c. [2 points] Given some x0 ∈ R

n, find the squared distance to the hyperplane defined by w

T x + b = 0. In

other words, solve the following optimization problem:

min

x

kx0 − xk

2

s.t. w

T x + b = 0

(Hint: if xe0 is the minimizer of the above problem, note that kx0 − xe0k = |

w

T (x0−xe0)

kwk

|. What is w

T xe0?)

10. For possibly non-symmetric A, B ∈ R

n×n and c ∈ R, let f(x, y) = x

T Ax + y

T Bx + c. Define ∇zf(x, y) =

h

∂f(x,y)

∂z1

∂f(x,y)

∂z2

. . .

∂f(x,y)

∂zn

iT

.

a. [2 points] Explicitly write out the function f(x, y) in terms of the components Ai,j and Bi,j using appropriate summations over the indices.

b. [2 points] What is ∇xf(x, y) in terms of the summations over indices and vector notation?

c. [2 points] What is ∇yf(x, y) in terms of the summations over indices and vector notation?

Programming

11. For the A, b, c as defined in Problem 8, use NumPy to compute (take a screen shot of your answer):

a. [2 points] What is A−1

?

b. [1 points] What is A−1

b? What is Ac?

12. [4 points] Two random variables X and Y have equal distributions if their CDFs, FX and FY , respectively,

are equal, i.e. for all x, |FX(x) − FY (x)| = 0. The central limit theorem says that the sum of k independent,

zero-mean, variance-1/k random variables converges to a (standard) Normal distribution as k goes off to infinity.

We will study this phenomenon empirically (you will use the Python packages Numpy and Matplotlib). Define

Y

(k) = √

1

k

Pk

i=1 Bi where each Bi

is equal to −1 and 1 with equal probability. From your solution to problem

5, we know that √

1

k

Bi

is zero-mean and has variance 1/k.

2

a. For i = 1, . . . , n let Zi ∼ N (0, 1). If F(x) is the true CDF from which each Zi

is drawn (i.e., Gaussian)

and Fbn(x) = 1

n

Pn

i=1 1{Zi ≤ x), use the answer to problem 1.5 above to choose n large enough such that,

for all x ∈ R,

q

E[(Fbn(x) − F(x))2] ≤ 0.0025, and plot Fbn(x) from −3 to 3.

(Hint: use Z=numpy.random.randn(n) to generate the random variables, and import matplotlib.pyplot

as plt;

plt.step(sorted(Z), np.arange(1,n+1)/float(n)) to plot).

b. For each k ∈ {1, 8, 64, 512} generate n independent copies Y

(k) and plot their empirical CDF on the same

plot as part a.

(Hint: np.sum(np.sign(np.random.randn(n, k))*np.sqrt(1./k), axis=1) generates n of the Y

(k)

random variables.)

Be sure to always label your axes. Your plot should look something like the following (Tip: checkout seaborn

for instantly better looking plots.)

3