Description
1 Overview
In Programming Assignment 1, you implemented a rudimentary inference engine that could
correctly answer queries over small networks. Unfortunately, in Programming Assignments 2
and 3, your inference engine got passed over in favor of third-party inference engines. It’s time
to bring your inference engine back!
We have provided you with the correct code for Assignment 1’s ComputeMarginal and all
of the functions that it calls, as well as a genetic inheritance network for a 9-person pedigree
from Assignment 2. At your own risk, try nding the marginal of variable 1 in this network by
running the following code:
load(‘PA4Sample.mat’, ‘NinePersonPedigree’);
ComputeMarginal(1, NinePersonPedigree, []);
As you can see, the brute force inference engine that you implemented in Programming
Assignment 1 is unable to handle anything larger than tiny networks; its running time is proportional to the number of entries in the joint distribution over the entire network, which even
in this small 9-person example is on the order of 107
.
Where brute force inference fails, however, other forms of exact inference can still be very
ecient. We will explore clique tree message passing in this assignment, and by its end, you
will have created an inference engine powerful enough to handle probabilistic queries and nd
MAP assignments over the genetic inheritance networks from Programming Assignment 2 and
the OCR networks from Programming Assignment 3, respectively.
2 Belief Propagation
Let’s start by taking a look at the belief propagation algorithm, specialized to exact inference
over clique trees:
1. Construct a clique tree from a given set of factors Φ.
2. Assign each factor φk ∈ Φ to a clique Cα(k) such that Scope[φk] ⊆ Cα(k)
. α(k) returns the
index of the clique to which φk is assigned.
3. Compute initial potentials ψi(Ci) = Q
k:α(k)=i
φk.
4. Designate an arbitrary clique as the root, and pass messages upwards from the leaves
towards the root clique.
5. Pass messages from the root down towards the leaves.
6. Compute the beliefs for each clique: βi(Ci) = ψi ×
Q
k∈Ni
δk→i
CS228 Programming Assignment #4 2
As an example, consider the following genetic inheritance network:
Figure 1: A pedigree of 6 people and its corresponding Bayesian network.
From the list of factors corresponding to this network, we can create the following clique tree:
3,9
1,3,4
5,11 4,10
2,5,6 6,12
1,2,3 1,7
2,8
Root
Figure 2: One possible clique tree for the genotype network above.
Next, we initialize the clique potentials in this tree by assigning each of the original factors
to a clique. Arbitrarily choosing the clique with variables 1, 2, and 3 to be the root, we can start
passing messages up from the leaves to the root, and then down from the root to the leaves. In
this clique tree, there are 9 cliques, so 2 · (9 − 1) = 16 messages suce to correctly compute all
beliefs. The dashed lines show the messages from the leaves to the root, and the dotted lines
show the messages from the root to the leaves. Finally, we can use the calibrated clique beliefs
to answer probabilistic queries on the original network.
In the subsequent sections we will work our way through the steps of this algorithm, implementing two variants: sum-product message passing and max-sum message passing.
CS228 Programming Assignment #4 3
3 Clique Tree Construction
In this and future assignments, a CliqueTree is a struct with the following elds:
• cliqueList This is a struct array of cliques. The representation of a clique is exactly
like that of a factor (which you have seen in Assignments 1-3). The .val for each clique i
should be initialized to its initial potentials ψi
.
• edges This is an adjacency matrix representing edges between cliques in the clique tree.
It is an n × n matrix, where n is the number of edges. If clique i and clique j share an
edge, then .edges[i,j] and .edges[j,i] are both 1; else, they are 0.
To perform clique tree message passing, we rst need to take in a list of factors and construct an
appropriate clique tree. We have provided a CreateCliqueTree function that creates a clique
tree using a modied variable elimination procedure, as described in the Clique Trees and VE
lecture.
CreateCliqueTree relies on the ComputeInitialPotentials function, which you will have
to write:
• ComputeInitialPotentials.m (10 points) This function should take in a set of factors
(possibly already reduced by evidence, which is handled earlier in CreateCliqueTree) and
a clique tree skeleton corresponding to those factors, and returns a cliqueTree with the
.val elds correctly lled in. Concretely, it should assign each factor to a clique, and for
each clique do a factor product over all factors assigned to it. For debugging, you can look
at the sample input clique tree InitPotential.INPUT and the sample output clique tree
InitPotential.RESULT in PA4Sample.mat.
The clique tree skeleton that CreateCliqueTree produces satises both family preservation and
the running intersection property, so you will not have to worry about writing error-checking /
input-validation code.
4 Message Passing in a Clique Tree
4.1 Message Ordering
With our correctly-initialized clique tree in hand, we now need to come up with a correct message
passing order. Recall that a clique is ready to transmit a message upstream toward the root after
it has received all of the messages from its downstream neighbors, and vice versa; it is ready to
transmit a message downstream after it has received its upstream message. (Since we’re working
with clique trees, each clique will only receive one upstream message.)
More generally, we say that a clique Ci
is ready to transmit to its neighbor Cj when Ci has
received messages from all of its neighbors except from Cj . In clique tree message passing, each
message is passed exactly once. The easiest way to come up with a valid message passing order
is to write the following function:
• GetNextCliques.m (10 points) This function looks at all the messages that have been
passed so far, and returns indices i and j such that clique Ci
is ready to pass a message
to its neighbor Cj . Do not return i and j if Ci has already passed a message to Cj . For
debugging, you can look at the sample input clique trees GetNextC.INPUTx and the sample
output clique trees GetNextC.RESULTx in PA4Sample.mat.
CS228 Programming Assignment #4 4
Note that there are computationally cheaper but slightly more involved methods of obtaining the
correct message passing order over the tree. The main computational bottleneck in our message
passing scheme does not lie in calculating the message passing ordering, though, so this has little
eect in practice.
4.2 Sum-Product Message Passing
We can now begin calibrating the clique tree. Recall that a clique tree is calibrated if all pairs
of adjacent cliques are calibrated. Two adjacent cliques Ci and Cj are calibrated if they agree
on the marginals over their shared variables, i.e.,:
P
Ci−Si,j
βi(Ci) = P
Cj−Si,j
βj (Cj ) = µi,j (Si,j ) ∀Si,j
Write the following function:
• CliqueTreeCalibrate.m (20 points) This function should perform clique tree calibration by running the sum-product message passing algorithm. It takes in an uncalibrated CliqueTree, calibrates it (i.e., setting the .val elds of the cliqueList to the
nal beliefs βi(Ci)), and returns it. To avoid numerical underow, normalize each
message as it is passed, such that it sums to 1. For consistency with our autograder, do not normalize the initial potentials nor the nal cluster beliefs.
This function also takes in an isMax ag that toggles between sum-product and max-sum
message passing. For this part, it will always be set to 0. For debugging, you can look at
SumProdCalibrate.INPUT and SumProdCalibrate.RESULT in PA4Sample.mat.
Finally, let’s bring together all the steps described above to compute the exact marginals for the
variables in our network.
• ComputeExactMarginalsBP.m (15 points) This function should take a network,
a set of initial factors, and a vector of evidence and compute the marginal probability
distribution for each individual variable in the network. A network is dened by a set of
factors. You should be able to extract all the network information that you need from
the list of factors. Marginals for every variable should be normalized at the end,
since they represent valid probability distributions. As before, this function takes
in an isMax ag, which should be set to 0 for now (at this point, you need to write the
function for only when isMax=0). For debugging, you can look at ExactMarginal.INPUT
and ExactMarginal.RESULT in PA4Sample.mat and use Evidence = [] (the sample case
makes this assumption). You may also try dierent values for Evidence if you like.
Congratulations! You have successfully implemented an ecient exact inference algorithm. Now
you can use the factors for the genotype network from PA2 and run inference using your own
inference engine. You can compare the marginals that you get to the ones from the inference
engines that were provided in PA2.
To test your inference engine on the genotype network shown in the gure above, run
load(‘PA4Sample.mat’, ‘SixPersonPedigree’);
ComputeExactMarginalsBP(SixPersonPedigree, [], 0);.
This 6-variable network is small enough that we can use the brute-force implementation in
Assignment 1 to double check the results. Use
CS228 Programming Assignment #4 5
ComputeMarginal(1, SixPersonPedigree, []);
for instance. Play around with observing evidence, and make sure that it works!
4.3 Max-Sum Message Passing
A typical OCR task could be to scan in a hand-written manuscript, decipher the writing, and
give others a typed version of the manuscript that they can easily read. In such a scenario,
we would want to output the most likely text corresponding to a set of images, instead of
calculating marginal probabilities over individual characters. To do this, we will use max-sum
message passing to calculate the MAP assignment.
Our aim is to output the most probable instantiation of the variables in the network instead
of solving the marginals for each of them. As mentioned in lecture, we will be working in logspace for this part of the assignment; the probability of any single assignment in a large enough
network, even the MAP assignment, is typically low enough to cause numerical issues if we do not
work in log space. Note that this algorithm is nearly identical to sum-product message passing,
with the sums replaced by maxima and the products replaced by sums in the denitions. You
should therefore have to write relatively little code.
Just as we used FactorMarginalization in the previous algorithm, let’s start by writing a
function FactorMaxMarginalization:
• FactorMaxMarginalization.m (10 points) Similar to FactorMarginalization (but
with sums replaced by maxima), this function takes in a factor and a set of variables to
marginalize out. For each assignment to the remaining variables, it nds the maximum
factor value over all possible assignments to the marginalized variables.
For debugging, in PA4Sample.mat, the sample input factor is FactorMax.INPUT1, the sample input variables are in FactorMax.INPUT2, and the sample output factor is FactorMax.RESULT.
With this, modify CliqueTreeCalibrate to do max-sum message passing when isMax=1:
• CliqueTreeCalibrate.m (20 points) This function should perform clique tree calibration by running the max-sum message passing algorithm when the isMax ag is set to
1. It takes in an uncalibrated CliqueTree, does a log-transform of the values in the factors/cliques, max-calibrates it (i.e., setting the .val elds of the cliqueList to the nal
beliefs βi(Ci)), and returns it in log-space. We are working in log-space, so do not
normalize each message as it is passed. For consistency with our autograder,
do not normalize the initial potentials nor the nal cluster beliefs. This function
takes in an isMax ag that toggles between sum-product and max-sum message passing.
For this part, it will always be set to 1, but make sure that it still does sum-product
message passing correctly when isMax=0.
In sum-product message-passing, we accomplished the product part of sum-product by
running FactorProduct. You might want to write a similar helper function, FactorSum,
that accomplishes the sum part of max-sum message passing. You can use if statements involving the isMax ag to decide when to run FactorProduct and when to run
FactorSum.
For debugging, you may look at MaxSumCalibrate.INPUT and MaxSumCalibrate.RESULT
in PA4Sample.mat.
CS228 Programming Assignment #4 6
Now we are ready to compute the max-marginals for the variables in our network. Modify
ComputeExactMarginalsBP.m to support max-marginal calculation:
• ComputeExactMarginalsBP.m (5 points) This function should take a network, a set
of initial factors, and a vector of evidence and compute the max-marginal for each individual
variable in the network (including variables for which there is evidence). Max-marginals
for every variable should not be normalized at the end. Leave the max-marginals in
log-space; do not re-exponentiate them. As before, this function takes in an isMax
ag, which will be set to 1 now; make sure it still works for computing the (non-max)
marginals when it is set to 0.
For debugging, you can look at MaxMarginals.INPUT and MaxMarginals.RESULT in PA4Sample.mat.
Finally, write a function that takes the max-marginals and extracts the MAP assignment:
• MaxDecoding (10 points) This function should take a list of max-marginals returned
by ComputeExactMarginalsBP and return a vector A, with A(i) the value of i
th variable
in the MAP assignment. You may assume that the max-marginals passed into this function
will be unambiguous (so no backtracking etc. is necessary for decoding).
In PA4Sample.mat, the sample input is MaxDecoded.INPUT, and the sample output is
MaxDecoded.RESULT.
To test out your MAP estimation pipeline, let’s try it on a sample word from Assignment 3.
Run the following commands:
load(‘PA4Sample.mat’, ‘OCRNetworkToRun’);
maxMarginals = ComputeExactMarginalsBP(OCRNetworkToRun, [], 1);
MAPAssignment = MaxDecoding(maxMarginals);
PredictedWord = DecodedMarginalsToChars(MAPAssignment)
You should see the word mornings!
5 Conclusion
Congratulations! You have successfully built an inference engine that can take relatively large
networks, such as the OCR network and the genetic inheritance network, and run exact inference
on them. Give yourself a pat on the back – in just 4 weeks, you have written end-to-end programs
that can do credit scoring, genetic counseling, and OCR.
However, the powers of exact inference are limited. Programming Assignment 5 will teach you
more about that, and explore approximate inference methods that will let us scale to signicantly
larger networks. Stay tuned!