Programming Assignment #2 Expectation Maximization

$35.00

Category: Tags: , , , , You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (2 votes)

Introduction
This assignment asks you to implement an algorithm for learning parameters for a
simple Bayesian network from missing data.
The algorithm
Missing data are prevalent in real-world problems and appear as either hidden
variables that are never observed or missing values for some features. Learning from such
datasets can be solved using an algorithm called Expectation Maximization (EM). The
idea is to start with a model completed with random parameters and to repeat the
following two steps until convergence:
• Estimate the missing data using the current complete model (E-step),
• Learn a new set of parameters using the data set “completed” with the missing
data just estimated (M-step).
You should set a small threshold (say 0.001) for the change of likelihoods
between two iterations to detect the convergence of your algorithm. Since EM is sensitive
to the starting point, you should try multiple starting points in order to find a good
solution.
EM is also sensitive to the amount of missing data. Five data sets with missing
rates being 10%, 30%, 50%, 70%, and 100% respectively are provided. Learn a model
for each data set.
The model
The Bayesian network has the following variables: Gender, Weight and Height,
whose relations are shown in the following graph.

The datasets (download from course website) have 20 data points each with occasional
missing values for Gender, denoted as “-”. All the variables are binary: Gender (M/0,
F/1), Weight (greater_than_130/0, less_than_130/1), and Height (greater_than_55/0,
less_than_55/1). The parameters of this model are to be estimated from the datasets.
Gender
Weight Height
Implementation
Implement the EM algorithm for learning the parameters for the above model in
your preferred programming language. Hint: E-step of the EM algorithm is essentially
estimating the probabilities of different values of Gender given that we know a person’s
Weight and Height, i.e., P(Gender | Weight, Height), and use these estimations as if they
are our (expected) counts.
Evaluation and analysis
Test your EM algorithm on the five datasets. For each dataset, try several different
starting points. In your report you should at least include the following results:
• The starting points of the learning
• The final conditional probability tables for each learning
• Plots of the likelihood vs number of iterations to demonstrate the convergence of
your algorithm.
• Try the following starting parameters and report your results.
P(gender=M)=0.7;
P(weight=greater_than_130|gender=M)=0.8;
P(weight=greater_than_130|gender=F)=0.4;
P(height= greater_than_55|gender=M)=0.7;
P(height= greater_than_55|gender=F)=0.3;
And provide analysis on the following issues at least:
• Do multiple starting points help in finding better solutions?
• Do some of the different solutions have the same likelihood scores?
• How does the data missing rate affect your algorithm and the results?
Deliverables
Submit a single .zip file named LastName.FirstInitial.HW# containing the
following via email to me at changhe.yuan@qc.cuny.edu. In addition, submit a hardcopy
of your report and code in class on due date.
1) Well commented code;
2) A readme file explaining how to compile and run your program;
3) A written report explaining your experimental results.