CSE 544, Probability and Statistics for Data Science Assignment 6: Bayesian Inference and Regression 

$30.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

1. Posterior for Normal (Total 10 points)
Let 𝑋ଵ, 𝑋ଶ,…,𝑋௡ be distributed as Normal(θ, σ2
), where σ is assumed to be known. You are also given
that the prior for θ is Normal(a, b2
).
(a) Show that the posterior of θ is Normal(x, y2
), such that: (6 points)
𝑥 ൌ
௕మ௑തା௦௘మ௔
௕మା௦௘మ 𝑎𝑛𝑑 𝑦ଶ ൌ ௕మ௦௘మ
௕మା௦௘మ; where 𝑋ത ൌ ଵ
௡ ∑ 𝑋௜ ௡
௜ୀଵ and 𝑠𝑒ଶ ൌ 𝜎ଶ
𝑛ൗ .
(Hint: less messier if you ignore the constants, but please justify why you can ignore them)
(b) Compute the (1‐α) posterior interval for θ. (4 points)
2. Bayesian Inference in action (Total 15 points)
You will need the q2_sigma3.dat and q2_sigma100.dat files for this question; these files are on the class
website. Each file contains 5 rows of 100 samples each. Refer back to Q 1 (a); you can use its result even
if you have not solved that question. Submit all python code for this question with suitable filenames.
(a) Assume that σ = 3 (meaning σ2
= 9). Let the prior be the standard Normal (mean 0, variance 1). Read
in the 1st row of q2_sigma3.dat and compute the new posterior. Now, assuming this posterior is
your new prior, read in the 2nd row of q2_sigma3.dat and compute the new posterior. Repeat till the
5th row. Please provide your steps here and draw a table with your estimates of the mean and
variance of the posterior for all 5 steps (table should have 5 rows, 2 columns). Also plot each of the 5
posterior distributions on a single graph and attach this graph. What do you observe? (7 points)
(b) Now assume that σ = 100 and repeat part (a) above but with q2_sigma100.dat. Assume the same
prior of a standard Normal. Provide the table and final graph. What do you observe? (7 points)
(c) Based on the comparison of answers of (a) and (b), what can you conclude? (1 point)
3. Regression Analysis (Total 7 points)
Assume Simple Linear Regression on n sample points (Y1, X1), (Y2, X2), …, (Yn, Xn); that is, Y = β0 + β1 X + εi,
where E[εi] = 0.
(a) Derive the estimates of β when minimizing the sum of squared errors and show that:
𝛽
෢ଵ ൌ ∑ ሺ௑೔ି௑തሻሺ௒೔ି௒തሻ ೙
೔సభ
∑ ሺ௑೔ି௑തሻ ೙ మ ೔సభ
and 𝛽
෢଴ ൌ 𝑌ത െ 𝛽෢ଵ𝑋ത, where 𝑋ത ൌ ሺ∑ 𝑋௜ ௡
௜ୀଵ ሻ/𝑛 and 𝑌ത ൌ ሺ∑ 𝑌௜ ௡
௜ୀଵ ሻ/𝑛. (4 points)
(b) Show that the above estimators, given Xis, are unbiased (Hint: Treat X’s as constants) (3 points)
4. More on Regression and Time series analysis (Total 10 points)
In this problem, we use the data from Azure trace; refer to q4.dat dataset on the website. The file
contains 576 values. Each value represents the number of VMs running in a data center for a 5 minutes
interval. Thus, the data spans exactly 2 days. Report all answers in your submission; you do not need to
submit any code.
(a) Split the dataset into 4 equal parts. For each quarter of the data, using simple linear regression
(include β0 term), plot the original data and the regression fit (using the corresponding quarter of
data as training), and calculate the SSE in all 4 cases. (5 points)
(b)Split the dataset into 2 equal parts. Use the first half of the data as the training set. Predict the data
points for the second half of the data using exponential moving average (α=0.5), auto regression
(p=3), and seasonal average (s=288). For each technique report the average errors across all the 288
predictions. Note that you may have to use predicted data for training. From the original data, use
only the first 288 values as part of training (you can augment them with predictions for 289th point,
290th point, etc.), and use the final 288 points for computing the error. (5 points)
5. Bayesian hypothesis testing (Total 18 points)
You are tired of studying probs and stats and have finally decided to give up your current life and turn to
your one true passion – farming. Lucky for you, there is lot of farmland on Long Island, and you have
your heart set on a particular farm that is available for purchase. However, you do not know whether
the soil in the farm is good or not. Say the soil in the farm is a discrete random variable 𝐻 and it can only
take values in the set ሼ0, 1ሽ, where 0 represent good soil and 1 represents bad soil. We transform this as
a hypothesis test as follows: 𝑯𝟎:𝐻 ൌ 0 and 𝑯𝟏:𝐻 ൌ 1. Let the prior probability 𝑃ሺ𝑯𝟎ሻ ൌ 𝑃ሺ𝐻ൌ0ሻ ൌ
𝑝 and 𝑃ሺ𝑯𝟏ሻ ൌ 𝑃ሺ𝐻ൌ1ሻ ൌ1െ𝑝. The water content in the soil depends upon the type of soil. If we
assume water content to be a RV 𝑊, then 𝑓ௐሺ𝑤|𝐻ൌ0ሻ ൌ 𝑁ሺ𝑤; െ𝜇, 𝜎ଶሻ and 𝑓ௐሺ𝑤|𝐻ൌ1ሻ ൌ
𝑁ሺ𝑤; 𝜇, 𝜎ଶሻ. To test which of the two hypotheses is correct, you take 𝑛 samples of the soil from
different patches of the farm and measure the water content metric of each sample; the resulting data
sample set is 𝒘 ൌ ሼ𝑤ଵ, 𝑤ଶ, 𝑤ଷ …,𝑤௡ሽ. Assume that the samples are conditionally independent given the
hypothesis/soil type.
(a)If we denote the hypothesis chosen as a RV 𝐶 where 𝐶 ∈ ሼ0, 1ሽ, then according to MAP (Maximum a
posteriori), we have 𝐶 ൌ ቄ
0 𝑖𝑓 𝑃ሺ𝐻 ൌ 0|𝒘 ሻ ൒ 𝑃ሺ𝐻 ൌ 1|𝒘 ሻ
1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 . This implies that the hypothesis H=0 is
chosen (referring to C=0) when P(H=0|w) ≥ P(H=1|w). Derive a condition for choosing the hypothesis
that soil in the farm is of type is 0, in terms of 𝑝, 𝜇 𝑎𝑛𝑑 𝜎. (4 points)
(b)Write a python function MAP_descision() in a script named Q5_b.py, where your function takes as
input (i) the list of observations 𝒘, and (ii) the prior probability of 𝑯𝟎, and returns the chosen
hypothesis (value of C) according to the MAP criterion. Report the result for the 10 different
instances of observations from the q5.csv dataset and for each prior probability p = [0.1, 0.3, 0.5, 0.8]
for the value of (𝜇, 𝜎ଶ) = (0.5, 1.0). (10 points)
Output format:
For 𝑃ሺ𝐻଴ሻ ൌ 0.1, the hypotheses selected are :: 0 1 0 1 0 0 1 0 0 1
For 𝑃ሺ𝐻଴ሻ ൌ 0.3, the hypotheses selected are :: 1 1 0 1 1 0 0 0 0 1
For 𝑃ሺ𝐻଴ሻ ൌ 0.5, the hypotheses selected are :: 1 1 0 1 1 0 0 0 0 1
For 𝑃ሺ𝐻଴ሻ ൌ 0.8, the hypotheses selected are :: 1 1 0 1 1 0 0 0 0 1
(c) Denoting the hypothesis selected as a RV 𝐶 where 𝐶 ∈ ሼ0, 1ሽ, the average error probability via the
MAP criterion is given by 𝑨𝑬𝑷 ൌ 𝑃ሺ𝐶ൌ0|𝐻ൌ1ሻ𝑃ሺ𝐻 ൌ 1ሻ ൅ 𝑃ሺ𝐶ൌ1|𝐻ൌ0ሻ𝑃ሺ𝐻 ൌ 0ሻ. Given the
observations 𝒘 ൌ ሼ𝑤ଵ, 𝑤ଶ, 𝑤ଷ …,𝑤௡ሽ, derive 𝑨𝑬𝑷 in terms of 𝜇, 𝜎, Φሺ ሻ 𝑎𝑛𝑑 𝑝. (4 points)