ISTA 421/521 – Homework 3

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

Instructions
In this assignment you are required to modify/write 2 scripts in python. Details of what you are to do are
specified in problems 2 and 7, below.
Included in the homework 3 release are following sample scripts:
• approx expected value.py – This script demonstrates how to approximate an expected value through
sampling. You will modify this code and submit your solution for problem 2.
• gauss surf.py – This is provided for fun – it is not required for any problem here. It generates a 2d
multivariate Gaussian and plots it as both a contour and surface plot.
• predictive variance example.py – This script demonstrates (a) generating and plotting error bars
(predictive variance) and (b) sampling of model parameters from the cov{wb } estimated from data.
You will run this script in problem 6, and then use it as the basis for a script in problem 7.
• w variation demo.py – This script is also provided for fun and is not required for the assignment. (It
also provides more example python code!) This implements the simulated experiment demonstrating
the theoretical and empirical bias in the estimate of variance, σc2, of the model variance, σ
2
, as a
function of the sample size used for estimation.
All problems require that you provide some “written” answer (in some cases also figures), so you will also
submit a .pdf of your written answers. (You can use LATEX or any other system (including handwritten;
plots, of course, must be program-generated) as long as the final version is in PDF.)
The final submission will include (minimally) the two scripts and a PDF version of your
written part of the assignment. You are required to create either a .zip or tarball (.tar.gz /
.tgz) archive of all of the files for your submission and submit your archive to the d2l dropbox
by the date/time deadline above.
NOTE: Problems 4 and 8 are required for Graduate students only; Undergraduates may complete them for
extra credit equal to the point value.
(FCMA refers to the course text: Rogers and Girolami (2012), A First Course in Machine Learning. For
general notes on using LATEX to typeset math, see: http://en.wikibooks.org/wiki/LaTeX/Mathematics)
1
1. [2 points] Adapted from Exercise 2.3 of FCMA p.90:
Let Y be a random variable that can take any positive integer value. The likelihood of these outcomes
is given by the Poisson pmf (probability mass function):
p(y) = λ
y
y!
e
−λ
(1)
By using the fact that for a discrete random variable the pmf gives the probabilities of the individual
events occurring and the probabilities are additive…
(a) Compute the probability that Y ≤ 6 for λ = 8, i.e., P(Y ≤ 6). Write a (very!) short python
script to compute this value, and include a listing of the code in your solution.
(b) Using the result of (a) and the fact that one outcome has to happen, compute the probability
that Y > 6.
Solution.
a)
Code Listing 1: poisson.py script
#! / u s r / bin /python
import math , s y s
def p o i s s o n p r o b a b i l i t y (y , lamb ) :
sum = 0 . 0 ;
fo r i in r an ge ( y + 1 ) :
sum += (math . pow( lamb , i ) * math . exp ( – lamb ) ) / math . f a c t o r i a l ( i )
return sum
y = 6
lamb = 8
prob = p o i s s o n p r o b a b i l i t y (y , lamb )
print ”The P oi s s o n p r o b a b i l i t y i s ” , prob
[emanuel@localhost submit]$ python poisson.py
The Poisson probability is 0.313374277536
As we saw above, P(Y ≤ 6) = 0.3133742
b)
P(Y > 6) = 1 − P(Y ≤ 6) = 1 − 0.3133742 = 0.6866257
2. [3 points] Adapted from Exercise 2.4 of FCMA p.90:
Let X be a random variable with uniform density, p(x) = U(a, b). Derive Ep(x){1 + 0.1x + 0.5x
2 +
0.05x
3}. Work out analytically Ep(x)

1 + 0.1x + 0.5x
2 + 0.05x
3

for a = −10, b = 5 (show the steps).
The script approx expected value.py demonstrates how you use random samples to approximate an
expectation, as described in Section 2.5.1 of the book. The script estimates the expectation of the
function y
2 when Y ∼ U(0, 1) (that is, y is uniformly distributed between 0 and 1). This script shows
a plot of how the estimation improves as larger samples are considered, up to 100 samples.
Modify the script approx expected value.py to compute a sample-based approximation to the expectation of the function 1+0.1x+0.5x
2+0.05x
3 when X ∼ U(−10, 5) and observe how the approximation
improves with the number of samples drawn. Include a plot showing the evolution of the approximation,
relative to the true value, over 3,000 samples.
2
Solution.
3. [3 points] Adapted from Exercise 2.5 of FCMA p.91:
Assume that p(w) is the Gaussian pdf for a D-dimensional vector w given in
p(w) = 1
(2π)D/2|Σ|
1/2
exp 

1
2
(w − µ)
>Σ−1
(w − µ)

.
By expanding the vector notation and re-arranging, show that using Σ = σ
2
I as the covariance matrix
assumes independence of the D elements of w. You will need to be aware that the determinant of a
matrix that only has entries on the diagonal (|σ
2
I|) is the product of the diagonal values and that the
inverse of the same matrix is constructed by simply inverting each element on the diagonal. (Hint,
a product of exponentials can be expressed as an exponential of a sum. Also, just a reminder that
exp{x} is e
x
.)
Solution.
4. [2 points; Required only for Graduates] Adapted from Exercise 2.6 of FCMA p.91:
Using the same setup as in Problem 4, see what happens if we use a diagonal covariance matrix with
different elements on the diagonal, i.e.,
Σ =





σ
2
1 0 · · · 0
0 σ
2
2
· · · 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 · · · σ
2
D





Solution.
5. [4 points] Adapted from Exercise 2.9 of FCMA p.91:
Assume that a dataset of N binary values, x1, …, xn, was sampled from a Bernoulli distribution, and
each sample xi
is independent of any other sample. Explain why this is not a Binomial distribution.
Derive the maximum likelihood estimate for the Bernoulli parameter.
Solution.
6. [3 points] Adapted from Exercise 2.12 of FCMA p.91:
Familiarize yourself with the provided script predictive variance example.py. When you run it, it
will generate a dataset and then remove all values for which −2 ≤ x ≤ 2. Observe the effect this has
on the predictive variance in this range. Plot (a) the data, (b) the error bar plots for model orders
1, 3, 5 and 9, and (c) the sampled functions for model orders 1, 3, 5 and 9. You will plot a total of
9 figures. Include a caption for each figure that qualitatively describes what the figure shows. Also,
clearly explain what removing the points has done in contrast to when they’re left in.
Solution.
7. [5 points]
In this exercise, you will create a simple demonstration of how model bias impacts variance, similar to
the demonstration in class. Using the same true model in the script predictive variance example.py,
that is t = 1 + 0.1x+ 0.5x
2 + 0.05x
3
, generate 20 data sets, each consisting of 25 samples from the true
function (using the same range of x ∈ [−12.0, 5.0]). Then, create a separate plot for each of the model
polynomial orders 1, 3, 5 and 9, in which you plot the true function in red and each of the best fit
functions of that model order to the 20 data sets. You will therefore produce four plots. The first will
be for model order 1 and will include the true model plotted in red and then 20 curves, one each for an
3
order 1 best fit model for each of the 20 data set, for all data sets. The second plot will repeat this for
model order 3, and so on. You can use any of the code in the script predictive variance example.py
as a guide. Describe what happens to the variance in the functions as the model order is changed.
(tips: plot the true function curve last, so it is plotted on top of the others; also, use linewidth=3 in
the plot fn to increase the line width to make the curve stand out more.)
Solution.
8. [3 points; Required only for Graduates] Adapted from Exercise 2.13 of FCMA p.92:
Compute the Fisher Information Matrix for the parameter of a Bernoulli distribution.
Solution.
4