Description

5/5 - (4 votes)

1. MSE in terms of bias (Total 5 points)
For some estimator 𝜃�, show that MSE = bias2
(𝜃�) + Var(𝜃�). Show your steps clearly.
2. Programming fun with 𝑭� (Total 17 points)
For this question, we require some programming; you should only use Python. You may use the scripts
provided on the class website as templates. Do not use any libraries or functions to bypass the
programming effort. Please submit your code in the google form (will be announced) with sufficient
documentation so the code can be evaluated. Attach each plot as a separate sheet to your submission.
All plots must be neat, legible (large fonts), with appropriate legends, axis labels, titles, etc.
(a) Write a program to plot 𝐹� (empirical CDF or eCDF) given a list of samples as input. Your plot must
have y-limits from 0 to 1, and x-limits from 0 to the largest sample. Show the input points as crosses
on the x-axis. (3 points)
(b) Use an integer random number generator with range [1, 99] to draw n=10, 100, and 1000 samples.
Feed these as input to (a) to generate three plots. What do you observe? (2 points)
(c) Modify (a) above so that it takes as input a collection of list of samples; that is, a 2-D array of sorts
where each row is a list of samples (as in (a)). The program should now compute the average 𝐹�
across the rows and plot it. That is, first compute the 𝐹� for each row (student), then average them
all out across rows, and plot the average�
𝐹. Show all input points as crosses on the x-axis. (3 points)
(d) Use the same integer random number generator from (b) to draw n=10 samples for m=10, 100,
1000 rows. Feed these as input to (d) to generate three plots. What do you observe? (2 points)
(e) Modify the program from (a) to now also add 95% Normal-based CI lines for 𝐹�, given a list of
samples as input. Draw a plot showing 𝐹� and the CI lines for the q2.dat data file (799 samples) on
the class website. Use x-limits of 0 to 2, and y-limits of 0 to 1. (3 points)
(f) Modify the program from (e) to also add 95% DKW-based CI lines for 𝐹�. Draw a single plot showing
𝐹� and both sets of CI lines (Normal and DKW) for the q2.dat data. Which CI is tighter? (4 points)
3. Plug-in estimates (Total 10 points)
(a) Show that the plug-in estimator of the variance of X is 𝜎�2 = 1
𝑛 ∑ (𝑋𝑖 − 𝑋�𝑛) 𝑛 2
𝑖=1 , where 𝑋�𝑛 is the
sample mean, 𝑋�𝑛 = 1
𝑛 ∑ 𝑋𝑖 𝑛
𝑖=1 . (3 points)
(b) Show that the bias of 𝜎�2 is − 𝜎2/𝑛, where 𝜎2 is the true variance. (4 points)
(c) The kurtosis for a RV X with mean 𝜇 and variance 𝜎2 is defined a𝑠 𝐾𝑢𝑟𝑡[𝑋] = 𝐸[(𝑋 − 𝜇)4] ⁄ 𝜎4 .
Derive the plug-in estimate of the kurtosis in terms of the sample data. (3 points)
4. Consistency of eCDF (Total 10 points)
Let D={X1, X2, …, Xn} be a set of i.i.d. samples with true CDF F. Let 𝐹� be the eCDF for D, as defined in
class.
(a) Derive E(𝐹�) in terms of F. Start by writing the expression for 𝐹� at some α. (3 points)
(b) Show that bias(𝐹�) = 0. (2 points)
(c) Derive se(𝐹�) in terms of F and n. (3 points)
(d) Show that 𝐹� is a consistent estimator. (2 points)
5. Histogram estimator (Total 13 points)
Histogram is a representation of sample data grouped by bins. Consider a true distribution X with range
[0, 1). Let m ∈ Z+ and b < 1 be such that m ∙ b = 1, where m is the number of bins and b is the bin size. Bin
i, denoted as 𝐵𝑖, where 1 ≤ i ≤ m, contains all data samples that lie in the range [(𝑖−1)
𝑚 ,
𝑖
𝑚).
(a) Let 𝑝𝑖 denote the probability that the true distribution lies in 𝐵𝑖. As in class, derive 𝑝�𝚤 in terms of
indicator RVs of i.i.d. data samples (the 𝑋𝑖) drawn from the true distribution, X. (3 points)
(b) The histogram estimator for some 𝑥 ∈ [0, 1) is defined as�ℎ(𝑥) = 𝑝�𝚥
𝑏
� , where 𝑥 ∈ 𝐵𝑗. Show that
𝐸�ℎ�(𝑥)� = 𝑓(𝑥) when b→0, where f(x) is the true pdf of X. (4 points)
(c) Use all of the weather.dat data on the class website and plot its histogram estimate (that is, plot the
�ℎ(𝑥) = 𝑝�𝚥
𝑏 = 𝑝̂
𝑗 ∀ 𝑥 ∈ 𝐵𝑗) using python with a bin size of 1. Do not use any in-built libraries to
bypass the programming effort. Use the same instructions as in Q2 for legibility and format of plot
submissions. Submit your code via the google form, labeled as q5.py. (3 points)
(d) Now use the histogram estimator (�ℎ(𝑥) = 𝑝�𝚥
𝑏
∀ 𝑥 ∈ 𝐵𝑗; 𝑏 = 1) as an estimate of pdf based on the
weather.dat dataset. Based on these pdf estimates, plot the CDF of the dataset using python. Attach
the plot to your hardcopy submission. (3 points)
6. Properties of estimators (Total 5 points)
Find the bias, se, and MSE in terms of 𝜃 for 𝜃� = 1
𝑛 ∑ 𝑋𝑖 𝑛
𝑖=1 , where Xi are i.i.d. ~ Poisson(θ). Show your
work. Hint: Follow the same steps as in class, assuming the true distribution is unknown. Only at the
very end use the fact that the unknown distribution is Poisson(θ) to get the final answers in terms of 𝜃.
7. Kernel density estimation (Total 15 points)
As usual, submit all code for this Q on the google form. The histogram density estimation has several
drawbacks such as discontinuities in the estimate and density estimate depends on the starting of the
bin. To alleviate these short coming to a certain extent, we will use another type of non-parametric
density estimation technique called Kernel density estimation (KDE). The formal definition of KDE is :
For a data sample 𝐷 = {𝑋1,𝑋2, … , 𝑋𝑛}. The KDE of any point 𝑥 is given by:
𝑝̂
𝐾𝐷𝐸(𝑥) = 1
𝑛ℎ� 𝐾(
𝑋𝑖 − 𝑥
ℎ )
𝑛
𝑖=1
where 𝐾(. ) is called the kernel function which should be a smooth, symmetric and a valid density
function. Parameter ℎ > 0 is called the smoothing bandwidth that controls the amount of smoothing.
(a) Density Estimation: Generate a sample of 800 data points 𝐷 = {𝑋1,𝑋2, … , 𝑋800 } which are i.i.d. and
sampled from a Mixture of Normal distributions such that with prob. 0.25, it is Nor(0,1), with prob.
0.25, it is Nor(3,1), with prob. 0.25 it is Nor(6,1), and with remaining prob. 0.25, it is Nor(9,1). Note
that this is the true distribution. A simple way to sample data from this distribution is to sample a
RV 𝑋~𝑈[0, 1], if 𝑋 ≤ 0.25 sample from the 1st Normal (that is, Nor(0,1)), if 𝑋 ∈ (0.25, 0.5] then
sample from 2nd Normal (that is, Nor(3,1)), and so on. Now obtain the KDE estimate of the PDF
𝑝̂
𝐾𝐷𝐸 (𝛼) for 𝛼 ∈ {−5, −4.9, −4.8, … , 10 } (use np.arange(-5, 10, 0.1)) using Parzen window
kernel, where the density estimate is defined by
𝑝̂
𝐾𝐷𝐸 (𝛼) = 1
𝑛ℎ ∑ 𝐼{|𝛼 − 𝑥𝑖| ≤ ℎ
2
} 500
𝑖=1 , where I() is the indicator RV.
Write a python function which takes as input (a) Data (𝐷) and (b) Smoothing bandwidth (ℎ), and
returns a list of KDE estimates for all the points in the list np.arange(-5, 10, 0.1)). Using this
function, generate plots (in the same figure) of the KDE estimate of the PDF for all the values of
𝛼 ∈ {−5, −4.9, −4.8, … , 10 } for the values of ℎ ∈ {.1, 1, 7} along with the true PDF of 𝛼; note that
the true distribution is the mixture of Normals stated above. To numerically get the pdf of a given
Normal in python, try scipy.stats.norm.pdf. The master plot should have on x-axis the alpha values,
ranging from -5 to 10. You should have 4 lines: one for each h value and one for the true
distribution. Make sure to have a useful legend. What are your observations regarding the effect of
ℎ on KDE estimate 𝑝̂
𝐾𝐷𝐸 (𝛼) ? (8 points)
(b) Bias and Variance: Now we will study the effect of parameter ℎ on the bias and variance of the KDE
estimates. Repeat the trial of generating 800 data points 150 times in the same way as above. Let
each row represent a trial (so you should have a matrix with 150 rows, and each row (trial) should
have 800 columns). Let 𝑝̂
𝑖
𝐾𝐷𝐸 (𝛼) = KDE estimate at 𝛼 for the 𝑖
𝑡ℎ trial and let 𝑝(𝛼) be the true pdf
at 𝛼. Then the expectation, bias, Var and MSE are given by:
𝐸[𝑝̂
𝐾𝐷𝐸 (𝛼)] = 1
150 ∑ 𝑝̂
𝑖
𝐾𝐷𝐸 (𝛼) 150
𝑗=1 ,
𝑉𝑎𝑟�𝑝̂
𝐾𝐷𝐸 (𝛼)� = 1
150 ∑ �𝑝̂
𝑖
𝐾𝐷𝐸 (𝛼) − 𝐸[𝑝̂
𝐾𝐷𝐸 (𝛼)]�
2 150
𝑖=1 ,
𝐵𝑖𝑎𝑠�𝑝̂
𝐾𝐷𝐸 (𝛼)� = ( 1
150 ∑ 𝑝̂
𝑖
𝐾𝐷𝐸 (𝛼)) − 𝑝(𝛼) 150
𝑗=1 ,
𝑀𝑆𝐸�𝑝̂
𝐾𝐷𝐸 (𝛼)� = 𝑉𝑎𝑟�𝑝̂
𝐾𝐷𝐸 (𝛼)� + 𝐵𝑖𝑎𝑠2�𝑝̂
𝐾𝐷𝐸 (𝛼)�.
To observe the effect of ℎ on the bias and variance, first calculate the total bias and variance (across
all points) as 𝐵𝑖𝑎𝑠𝑡𝑜𝑡
2 (ℎ) = ∑ 𝐵𝑖𝑎𝑠2 𝛼∈𝑆 �𝑝�𝐾𝐷𝐸 (𝛼)�
|𝑆| and 𝑉𝑎𝑟𝑡𝑜𝑡(ℎ) = ∑ 𝑉𝑎𝑟�𝑝�𝐾𝐷𝐸 (𝛼𝑖 𝛼∈𝑆 )�
|𝑆| ,
where 𝑆 = {−5, −4.9, −4.8, … , 10 } is the set of points for which you are estimating the density.
Write python code to solve the following questions:
(i) For each value of ℎ ∈ {0.01, .1, .3, .6, 1, 3, 7} calculate the bias and variance as defined above
and generate two plots, one for 𝐵𝑖𝑎𝑠𝑡𝑜𝑡
2 (ℎ) vs ℎ and another for 𝑉𝑎𝑟𝑡𝑜𝑡(ℎ) vs ℎ. What do you
observe from these plots? (5 points)
(ii) If we use 𝑀𝑆𝐸 as a measure to select the optimal ℎ, i.e., ℎ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ(𝑉𝑎𝑟𝑡𝑜𝑡(ℎ) +
𝐵𝑖𝑎𝑠𝑡𝑜𝑡
2 (ℎ)), what is the optimal value of ℎ you should use? (2 points)

CSE 544, Probability and Statistics for Data Science Assignment 3: Non-Parametric Inference

Description

Related products

CSCI4180 Assignment 3: Deduplication

COMP 302 Programming Languages and Paradigms Assignment 3

Applied Numerical Methods (MATH 151A) Assignment 3