Sale!

ROB313: Introduction to Learning from Data Assignment 4

$30.00 $18.00

Category: You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (1 vote)

Q1) 8pts Perform Bayesian inference for a logistic regression model with a Bernoulli likelihood
(as we had considered in assignment 3), and with a zero-centered, uncorrelated Gaussian prior on the weights, as follows:
Pr(y|w, x) =
fb(x; w)
y

1 − fb(x; w)
1−y
,
Pr(w) = Y
D
i=0
1

2πσ2
exp 

w
2
i

2

= N

w

0, σ2
I

,
where fb(x; w) = Pr(y=1|w, x) gives the class conditional probability of class 1 by
mapping R
D → [0, 1], and w = {w0, w1, . . . , wD} ∈ R
D+1.

Also, fb is a logistic
sigmoid acting on a linear model as follows
fb(x; w) = sigmoid
w0 +
X
D
i=1
wixi

,
where sigmoid(z) = 1
1+exp(−z)
.

Making the assumption that all training examples are
i.i.d., the log-likelihood, and log-prior, along with their gradient (∇) and hessian (∇2
)
can be written as follows
log Pr(y|w, X) = X
N
i=1
y
(i)
log
fb(x
(i)
; w)

+

1 − y
(i)

log
1 − fb(x
(i)
; w)

,
∇ log Pr(y|w, X) = X
N
i=1

y
(i) − fb(x
(i)
; w)


(i)
,
∇2
log Pr(y|w, X) = X
N
i=1
fb(x
(i)
; w)

fb(x
(i)
; w) − 1


(i)x¯
(i)T
log Pr(w) = −
D + 1
2
log(2π) −
D + 1
2
log(σ
2
) −
X
D
i=0
w
2
i

2
,
∇ log Pr(w) = −
w
σ
2
∇2
log Pr(w) = −
1
σ
2
I
where x¯
(i) =

1, x
(i)
1
, . . . , x
(i)
D
T
∈ R
D+1.

All studies will be done on the iris dataset,
with the training and validation sets merged, and considering only the second response
to determine whether the flower is an iris versicolour, or not1
.

You are encouraged to
re-use code from previous assignments where possible.
1Use x train, x test = np.vstack((x train, x valid)), x test and
y train, y test = np.vstack((y train[:,(1,)], y valid[:,(1,)])), y test[:,(1,)]

(a) (4pts) Consider the prior variances σ
2 = 0.5, σ
2 = 1, and σ
2 = 2. Which
of these priors gives a model with a higher complexity? Explain why. Choose
between these prior variances by approximating the log marginal likelihood using
a Laplace approximation. Report your results.

(b) (4pts) Choose a proposal distribution and use importance sampling to estimate
the most probable predictive posterior class on each element of the test set using
a prior variance of σ
2 = 1. Report your test set accuracy results and justify your
chosen proposal. Also, analyze and comment on the accuracy of your proposal
distribution. It may help to visualize the values of the posterior evaluated at
your samples to help justify your answer. You may find the scipy.stats module
useful for sampling.

Q2) 4pts Construct a Bayesian linear model on the mauna loa dataset. Consider the following
model
Pr(y|w, X) = N

y

Φw, σ2
I

Pr(w) = N

w

0, I

,
where σ
2 = 10−4 and Φ ∈ R

N×M contains the M features at all N training examples
defined by the following function (please copy this code and use it in your solution)
import numpy as np
def features(x):
“””
evaluates phi(x)
Inputs:
x : (N, 1) input datapoints
Outputs:
phi : (N, M) features for each datapoint
“””
year = 0.057 # equal to one year in input space
phi = np.hstack(
# add polynomial terms
[np.power(x, np.arange(11))]
# add periodic terms:
+ [np.sin(x∗2∗np.pi∗factor/year) for factor in range(1,11)]
+ [np.cos(x∗2∗np.pi∗factor/year) for factor in range(1,11)]
)
return phi

The basis functions were designed for the mauna loa dataset by observing that the
response appears to have a periodic annual sawtooth pattern on top of a smooth
multi-year curve.

To model the long-term multi-year curve, polynomial features up
to order ten are included and to model the periodic pattern, sine and cosine features
are included with a yearly period, or an integer fraction of a yearly period (to enable
recovery of something like a Fourier transform of the observed sawtooth pattern).

Use both the training and validation sets to predict on the test set. Plot the predictive
posterior relative to the test data: plot the predictive posterior mean as a line and
plot the 99.7% confidence interval (the predictive posterior mean ± three standard
deviations) as a shaded region2
.

On the same plot, also show the test data and discuss
the quality of the predictive posterior. How might you improve this model? Use the
Cholesky decomposition for the predictive posterior computations.

Submission guidelines: Submit an electronic copy of your report in pdf format, and
documented python scripts. You should include a file named “README” outlining how
the scripts should be run. Upload a single tar or zip file containing all files to Quercus. You
are expected to verify the integrity of your tar/zip file before uploading. Do not include (or
modify) the supplied *.npz data files or the data utils.py module in your submission.

The
report must contain
• Objectives of the assignment
• A brief description of the structure of your code, and strategies employed
• Relevant figures, tables, and discussion

Do not use scikit-learn for this assignment, the intention is that you implement the simple
algorithms required from scratch. Also, for reproducibility, always set a seed for any random
number generator used in your code. For example, you can set the seed in numpy using
numpy.random.seed
2Use the function matplotlib.pyplot.fill between to plot the shaded region. Use the keyword argument alpha=0.3 to give some transparency to the shaded region.
Page 3 of 3