Description
Pictorial Structures
The goal of this exercise is to implement a simplified version of the pictorial
structures model, although using an efficient algorithm, and test it on a pedestrian dataset. You are going to use a star model over (upper and lower) legs,
head, and torso as follows:
LUL
LLL
LT
LH
LUR
LLR
Each Li
in this distribution is actually a two-dimensional random variable
representing image coordinates of the center of each body part: Li = (xi
, yi).
Scale and rotation is not considered here. Unary factors represent likelihoods
1
pi(Li) and pairwise factors stand for kinematic priors pij (Li
, Lj ). You will
learn the priors from a training dataset. Then, you will perform inference on
test images, for which you are given the corresponding likelihood maps.
The first step is to download assignment02-data.zip from the lecture’s
website and import the data.mat. The variable likelihoods{k,i} is the 2D
likelihood map pi(Li) for test image number k. The images can be found in the
directory testset for reference. The index i is defined as follows:
i Body part
1 Lower Left leg
2 Upper Left leg
3 Upper Right leg
4 Lower Right leg
5 Head
6 Torso
The variable train contains the training data. More precisely, each row k
of train{i} represents the coordinates L
k
i
of body part i in training image k.
1 Learning Kinematic Priors
Points: 4
We set the prior as a 2D Gaussian pij (Li
, Lj ) ∼ N (Li − Tij (Lj ); 0, Σij )
and we define the transformation Tij (Lj ) = Lj + µij . This is a reasonable
approximation for small joint rotations, as it is the case for pedestrians. The
parameters to be determined for each prior are thus the covariance matrix Σij
and the mean µij .
Implement the function pairwisePots = learnPairwisePots(train)
which, for each body part i, computes the maximum-likelihood estimate of µij
as a row vector pairwisePots{i,1} and of Σij as pairwisePots{i,2}. Note
that there are only potentials between the torso and the remaining body parts,
hence we assume j = 6 is fixed. Use the built-in Matlab functions mean and
cov.
2 Maximal Marginal States
Points: 6
The goal is to compute maximal marginals of the model using the sumproduct algorithm. To this end, implement the corresponding function maxstates
= sumproduct(pairwisePots, unaryPots). It returns a 6×2 matrix of x, y coordinates.
• Note that the graph is not a chain but a tree, so you have to think about
the correct message scheduling.
2
• You will need computations like f(Li) = P
Lj N (Li − Tij (Lj ))g(Lj ) in
your code. However, summing over Lj is impractical due to the large
state space. That is why you should implement the sum as a convolution
(N ∗ (g ◦ T
−1
ij ))(Li) := P
s N (Li − s)g(T
−1
ij (s)) with s = Tij (Lj ) using
built-in functions. We additionally assume a diagonal covariance so that
the Gaussian is separable (simply ignore the off-diagonal entries of Σij ).
The Matlab functions fspecial, conv2 and the prepared file shiftimg.m
(it shifts a given image for a given 2D offset) will be useful here. Take
care when computing messages in the opposite direction.
• You can visualize your maxima using drawmaxima.m. You can compare
your results for the first 10 images with the official solution in the directory
solution.
• Hint: Recall that we work with (x, y) vectors but Matlab indexes its
matrices by (row, column).
3 Modes
Points: 6
The goal is to compute the maximum state of the joint distribution using the
min-sum algorithm (i.e. using negated log potentials). Implement the function
maxstates = minsum(pairwisePots, unaryPots). It returns a 6×2 matrix of
x, y coordinates.
• For reasons of efficiency, we compute the minima using the generalized
distance transform
DT− log[g(T
−1
ij (·))](Li) = min
s
δ(Li
, s) − log[g(T
−1
ij (s))]
= min
Lj
− log[N (Li − Tij (Lj ))] − log[g(Lj )].
You can find the code in DT.m, the function takes the covariance matrix
of the Gaussian as the second argument.
• Unfortunately, DT does not give you the argmin, only the min. For this
reason, you cannot backtrack and you need to implement the min-sum
algorithm as in the lecture on max-sum algorithm earlier, i.e. computing
all messages (two messages per edge) and taking a node-wise minimum.
This will probably fail on potential ties (multiple modes) but that is fine
in this exercise.
4 Evaluation
Points: 4
Now you are going to evaluate the model as a person detector by writing a
script evaluation.m. To keep it simple, fix a bounding box of size 80x200px
3
around the torso (horizontally centered at it, vertically offset in 1:2 ratio). A
predicted bounding box is considered correct if it overlaps more than 50% with
a ground-truth bounding box, otherwise the bounding box is considered a false
positive detection. Ground truth can be found in the variable GT in the supplied
mat file, each row is a rectangle [x1, y1, w, h]. Bounding box overlap is computed
using the boxoverlap.m function which is provided for your convenience.
Compute bounding boxes for each test image using three ways of choosing
the torso:
• Torso as the result of min-sum.
• Torso as the result of sum-product.
• Torso as the maximum of a torso’s likelihood, which corresponds to using
no model at all.
Your script should output the accuracy of each method, i.e. the percentage of
correctly predicted bounding boxes. Note that you can visualize your bounding
boxes and detections using drawmaxima.m.
4