## Description

Question 1 (30%)

Design a classifier that achieves minimum probability of error for a three-class problem where

the class priors are respectively P(L = 1) = 0.15,P(L = 2) = 0.35,P(L = 3) = 0.5 and the classconditional data distributions are all Gaussians for two-dimensional data vectors:

N ([−1

0

],[

1 −0.4

−0.4 0.5

]),N ([ 1

0

],[

0.5 0

0 0.2

]),N ([ 0

1

],[

0.1 0

0 0.1

]).

Generate 10000 samples according to this data distribution, keep track of the true class labels

for each sample. Apply your optimal classifier designed as described above to this dataset and

obtain decision labels for each sample. Report the following:

• actual number of samples that were generated from each class;

• the confusion matrix for your classifier consisting of number of samples decided as class

r ∈ {1,2,3} when their true labels were class c ∈ {1,2,3}, using r, c as row/column indices;

• the total number of samples misclassified by your classifier;

• an estimate of the probability of error your classifier will achieves, based on these samples;

• a visualization of the data as a 2-dimensional scatter plot, with true labels and decision labels

indicated using two separate visulization cues, such as marker shape and marker color;

• a clear but brief description of the results presented as described above.

Note: See the attached generateData Exam1Question1.m Matlab script for data generation.

1

Question 2 (35%)

An object at true position [xT , yT ]

T

in 2-dimensional space is to be localized using distance

(range) measurements to K reference (landmark) coordinates {[x1, y1]

T

,…,[xi

, yi

]

T

,…,[xK, yK]

T}.

These range measurements are ri = dTi +ni for i ∈ {1,…,K}, where dTi = k[xT , yT ]

T −[xi

, yi

]

Tk

is the true distance between the object and the i

th reference point, and ni

is a zero mean Gaussian distributed measurement noise with known variance σ

2

i

. The noise in each measurement is

independent from the others.

Assume that we have the following prior knowledge regarding the position of the object:

p

x

y

!

= (2πσxσy)

−1

e

−

1

2

h

x yi

”

σ

2

x 0

0 σ

2

y

#−1″

x

y

#

(1)

where [x, y]

T

indicates a candidate position under consideration.

Express the optimization problem that needs to be solved to determine the MAP estimate of

the object position. Simplify the objective function so that the exponentials and additive/multiplicative

terms that do not impact the determination of the MAP estimate [xMAP, yMAP]

T

are removed appropriately from the objective function for computational savings when evaluating the objective.

Implement the following as computer code: Set the true object location to be inside the

circle with unit radious centered at the origin. For each K ∈ {1,2,3,4} repeat the following.

Place evenly spaced K landmarks on a circle with unit radius centered at the origin. Set measurement noise standard deviation to 0.3 for all range measurements. Generate K range measurements according to the model specified above (if a range measurement turns out to be negative,

reject it and resample; all range measurements need to be nonnegative).

Plot the equilevel contours of the MAP estimation objective for the range of horizontal and

vertical coordinates from −2 to 2; superimpose the true location of the object on these equilevel

contours (e.g. use a + mark), as well as the landmark locations (e.g. use a o mark for each one).

Provide plots of the MAP objective function contours for each value of K. When preparing

your final contour plots for different K values, make sure to plot contours at the same function

value across each of the different contour plots for easy visual comparison of the MAP objective

landscapes.

Supplement your plots with a brief description of how your code works. Comment on the

behavior of the MAP estimate of position (visually assessed from the contour plots; roughly center

of the innermost contour) relative to the true position. Does the MAP estimate get closer to the

true position as K increases? Doe is get more certain? Explain how your contours justify your

conclusions.

Suggestion: For σx and σy consider values around 0.25, and for the noise variance values σ

2

i

consider values around 0.1 for posterior functions that are illustrative; you may choose different

values than what is suggested here, so make sure to specify what your values are in the numerical

results presented.

Note: The additive Gaussian distributed noise used in this question is actually not appropriate,

since it could lead to negative measurements, which are not legitimate for a proper distance sensor.

However, in this question, we will ignore this issue and proceeding with this noise model for

the sake of illustration. In practice, a multiplicative log-normal distributed noise may be more

appropriate than an additive normal distributed noise.

2

Question 3 (35%)

We have two dimensional real-valued data (x, y) that is generated by the following procedure,

where all polynomial coefficients are real-valued and v ∼ N (0,σ

2

):

y = ax3 +bx2 +cx+d +v (2)

Let w = [a,b, c,d]

T be the parameter vector for this polynomial relationship. Given the knowledge of σ and that the relationship between x and y is a cubic polynomial corrupted by additive

noise as shown above, iid samples D = (x1, y1),…,(xN, yN) generated by the procedure using the

true value of the parameters (say wtrue), and a Gaussian prior w ∼ N (0, γ

2

I), where I is the 4×4

identity matrix, determine the MAP estimate for the parameter vector.

Write code to generate N = 10 samples according to the model, draw iid x ∼ Uni f orm[−1,1]

and choose the true parameters to place the real roots (for simplicity) for the polynomial in the

interval [−1,1]. Pick a value for σ (that makes the noise level sufficiently large), and keep it

constant for the experiments. Repeat the following for different values of γ (note that as γ increases

the MAP estimates approach the ML estimate).

Generate samples of x and v, then determine the corrsponding values of y. Given this particular

realization of the dataset D, for each value of γ, find the MAP estimate of the parameter vector and

calculate the squared L2 distance between the true parameter vector and this estimate.

For each value of γ perform at least 100 experiments, where the data is independently generated

according to the procedure, while keeping the true parameters fixed. Report the minimum, 25%,

median, 75%, and maximum values of these squared-error values, kwtrue −wMAPk

2

2

, for the MAP

estimator for each value of γ in a single plot. How do these curves behave as this parameter for the

prior changes?

Note: Make sure to change gamma to cover a sufficiently broad range to see its effects at

multiple scales. To achieve this, you might want to select values for this hyperparameter as power

of 10 linearly spaced from −B to +B, so that you cover the interval [10−B

,10B

] logarithmically.

Choose B > 0 well.

3