Description
1. MSE in terms of bias (Total 5 points)
For some estimator ποΏ½, show that MSE = bias2
(ποΏ½) + Var(ποΏ½). Show your steps clearly.
2. Programming fun with ποΏ½ (Total 17 points)
For this question, we require some programming; you should only use Python. You may use the scripts
provided on the class website as templates. Do not use any libraries or functions to bypass the
programming effort. Please submit your code in the google form (will be announced) with sufficient
documentation so the code can be evaluated. Attach each plot as a separate sheet to your submission.
All plots must be neat, legible (large fonts), with appropriate legends, axis labels, titles, etc.
(a) Write a program to plot πΉοΏ½ (empirical CDF or eCDF) given a list of samples as input. Your plot must
have y-limits from 0 to 1, and x-limits from 0 to the largest sample. Show the input points as crosses
on the x-axis. (3 points)
(b) Use an integer random number generator with range [1, 99] to draw n=10, 100, and 1000 samples.
Feed these as input to (a) to generate three plots. What do you observe? (2 points)
(c) Modify (a) above so that it takes as input a collection of list of samples; that is, a 2-D array of sorts
where each row is a list of samples (as in (a)). The program should now compute the average πΉοΏ½
across the rows and plot it. That is, first compute the πΉοΏ½ for each row (student), then average them
all out across rows, and plot the averageοΏ½
πΉ. Show all input points as crosses on the x-axis. (3 points)
(d) Use the same integer random number generator from (b) to draw n=10 samples for m=10, 100,
1000 rows. Feed these as input to (d) to generate three plots. What do you observe? (2 points)
(e) Modify the program from (a) to now also add 95% Normal-based CI lines for πΉοΏ½, given a list of
samples as input. Draw a plot showing πΉοΏ½ and the CI lines for the q2.dat data file (799 samples) on
the class website. Use x-limits of 0 to 2, and y-limits of 0 to 1. (3 points)
(f) Modify the program from (e) to also add 95% DKW-based CI lines for πΉοΏ½. Draw a single plot showing
πΉοΏ½ and both sets of CI lines (Normal and DKW) for the q2.dat data. Which CI is tighter? (4 points)
3. Plug-in estimates (Total 10 points)
(a) Show that the plug-in estimator of the variance of X is ποΏ½2 = 1
π β (ππ β ποΏ½π) π 2
π=1 , where ποΏ½π is the
sample mean, ποΏ½π = 1
π β ππ π
π=1 . (3 points)
(b) Show that the bias of ποΏ½2 is β π2/π, where π2 is the true variance. (4 points)
(c) The kurtosis for a RV X with mean π and variance π2 is defined aπ πΎπ’ππ‘[π] = πΈ[(π β π)4] β π4 .
Derive the plug-in estimate of the kurtosis in terms of the sample data. (3 points)
4. Consistency of eCDF (Total 10 points)
Let D={X1, X2, β¦, Xn} be a set of i.i.d. samples with true CDF F. Let πΉοΏ½ be the eCDF for D, as defined in
class.
(a) Derive E(πΉοΏ½) in terms of F. Start by writing the expression for πΉοΏ½ at some Ξ±. (3 points)
(b) Show that bias(πΉοΏ½) = 0. (2 points)
(c) Derive se(πΉοΏ½) in terms of F and n. (3 points)
(d) Show that πΉοΏ½ is a consistent estimator. (2 points)
5. Histogram estimator (Total 13 points)
Histogram is a representation of sample data grouped by bins. Consider a true distribution X with range
[0, 1). Let m β Z+ and b < 1 be such that m β b = 1, where m is the number of bins and b is the bin size. Bin
i, denoted as π΅π, where 1 β€ i β€ m, contains all data samples that lie in the range [(πβ1)
π ,
π
π).
(a) Let ππ denote the probability that the true distribution lies in π΅π. As in class, derive ποΏ½π€ in terms of
indicator RVs of i.i.d. data samples (the ππ) drawn from the true distribution, X. (3 points)
(b) The histogram estimator for some π₯ β [0, 1) is defined asοΏ½β(π₯) = ποΏ½π₯
π
οΏ½ , where π₯ β π΅π. Show that
πΈοΏ½βοΏ½(π₯)οΏ½ = π(π₯) when bβ0, where f(x) is the true pdf of X. (4 points)
(c) Use all of the weather.dat data on the class website and plot its histogram estimate (that is, plot the
οΏ½β(π₯) = ποΏ½π₯
π = πΜ
π β π₯ β π΅π) using python with a bin size of 1. Do not use any in-built libraries to
bypass the programming effort. Use the same instructions as in Q2 for legibility and format of plot
submissions. Submit your code via the google form, labeled as q5.py. (3 points)
(d) Now use the histogram estimator (οΏ½β(π₯) = ποΏ½π₯
π
β π₯ β π΅π; π = 1) as an estimate of pdf based on the
weather.dat dataset. Based on these pdf estimates, plot the CDF of the dataset using python. Attach
the plot to your hardcopy submission. (3 points)
6. Properties of estimators (Total 5 points)
Find the bias, se, and MSE in terms of π for ποΏ½ = 1
π β ππ π
π=1 , where Xi are i.i.d. ~ Poisson(ΞΈ). Show your
work. Hint: Follow the same steps as in class, assuming the true distribution is unknown. Only at the
very end use the fact that the unknown distribution is Poisson(ΞΈ) to get the final answers in terms of π.
7. Kernel density estimation (Total 15 points)
As usual, submit all code for this Q on the google form. The histogram density estimation has several
drawbacks such as discontinuities in the estimate and density estimate depends on the starting of the
bin. To alleviate these short coming to a certain extent, we will use another type of non-parametric
density estimation technique called Kernel density estimation (KDE). The formal definition of KDE is :
For a data sample π· = {π1,π2, β¦ , ππ}. The KDE of any point π₯ is given by:
πΜ
πΎπ·πΈ(π₯) = 1
πβοΏ½ πΎ(
ππ β π₯
β )
π
π=1
where πΎ(. ) is called the kernel function which should be a smooth, symmetric and a valid density
function. Parameter β > 0 is called the smoothing bandwidth that controls the amount of smoothing.
(a) Density Estimation: Generate a sample of 800 data points π· = {π1,π2, β¦ , π800 } which are i.i.d. and
sampled from a Mixture of Normal distributions such that with prob. 0.25, it is Nor(0,1), with prob.
0.25, it is Nor(3,1), with prob. 0.25 it is Nor(6,1), and with remaining prob. 0.25, it is Nor(9,1). Note
that this is the true distribution. A simple way to sample data from this distribution is to sample a
RV π~π[0, 1], if π β€ 0.25 sample from the 1st Normal (that is, Nor(0,1)), if π β (0.25, 0.5] then
sample from 2nd Normal (that is, Nor(3,1)), and so on. Now obtain the KDE estimate of the PDF
πΜ
πΎπ·πΈ (πΌ) for πΌ β {β5, β4.9, β4.8, β¦ , 10 } (use np.arange(-5, 10, 0.1)) using Parzen window
kernel, where the density estimate is defined by
πΜ
πΎπ·πΈ (πΌ) = 1
πβ β πΌ{|πΌ β π₯π| β€ β
2
} 500
π=1 , where I() is the indicator RV.
Write a python function which takes as input (a) Data (π·) and (b) Smoothing bandwidth (β), and
returns a list of KDE estimates for all the points in the list np.arange(-5, 10, 0.1)). Using this
function, generate plots (in the same figure) of the KDE estimate of the PDF for all the values of
πΌ β {β5, β4.9, β4.8, β¦ , 10 } for the values of β β {.1, 1, 7} along with the true PDF of πΌ; note that
the true distribution is the mixture of Normals stated above. To numerically get the pdf of a given
Normal in python, try scipy.stats.norm.pdf. The master plot should have on x-axis the alpha values,
ranging from -5 to 10. You should have 4 lines: one for each h value and one for the true
distribution. Make sure to have a useful legend. What are your observations regarding the effect of
β on KDE estimate πΜ
πΎπ·πΈ (πΌ) ? (8 points)
(b) Bias and Variance: Now we will study the effect of parameter β on the bias and variance of the KDE
estimates. Repeat the trial of generating 800 data points 150 times in the same way as above. Let
each row represent a trial (so you should have a matrix with 150 rows, and each row (trial) should
have 800 columns). Let πΜ
π
πΎπ·πΈ (πΌ) = KDE estimate at πΌ for the π
π‘β trial and let π(πΌ) be the true pdf
at πΌ. Then the expectation, bias, Var and MSE are given by:
πΈ[πΜ
πΎπ·πΈ (πΌ)] = 1
150 β πΜ
π
πΎπ·πΈ (πΌ) 150
π=1 ,
ππποΏ½πΜ
πΎπ·πΈ (πΌ)οΏ½ = 1
150 β οΏ½πΜ
π
πΎπ·πΈ (πΌ) β πΈ[πΜ
πΎπ·πΈ (πΌ)]οΏ½
2 150
π=1 ,
π΅πππ οΏ½πΜ
πΎπ·πΈ (πΌ)οΏ½ = ( 1
150 β πΜ
π
πΎπ·πΈ (πΌ)) β π(πΌ) 150
π=1 ,
πππΈοΏ½πΜ
πΎπ·πΈ (πΌ)οΏ½ = ππποΏ½πΜ
πΎπ·πΈ (πΌ)οΏ½ + π΅πππ 2οΏ½πΜ
πΎπ·πΈ (πΌ)οΏ½.
To observe the effect of β on the bias and variance, first calculate the total bias and variance (across
all points) as π΅πππ π‘ππ‘
2 (β) = β π΅πππ 2 πΌβπ οΏ½ποΏ½πΎπ·πΈ (πΌ)οΏ½
|π| and ππππ‘ππ‘(β) = β ππποΏ½ποΏ½πΎπ·πΈ (πΌπ πΌβπ )οΏ½
|π| ,
where π = {β5, β4.9, β4.8, β¦ , 10 } is the set of points for which you are estimating the density.
Write python code to solve the following questions:
(i) For each value of β β {0.01, .1, .3, .6, 1, 3, 7} calculate the bias and variance as defined above
and generate two plots, one for π΅πππ π‘ππ‘
2 (β) vs β and another for ππππ‘ππ‘(β) vs β. What do you
observe from these plots? (5 points)
(ii) If we use πππΈ as a measure to select the optimal β, i.e., ββ = ππππππ₯β(ππππ‘ππ‘(β) +
π΅πππ π‘ππ‘
2 (β)), what is the optimal value of β you should use? (2 points)