$30.00 $18.00
1 Lucas-Kanade Tracking
In this section you will be implementing a simple Lucas & Kanade tracker with one single
template. In the scenario of two-dimensional tracking with a pure translation warp function,
W(x; p) = x + p . (1)
The problem can be described as follows: starting with a rectangular neighborhood of
pixels N ∈ {xd}
D
d=1 on frame It, the Lucas-Kanade tracker aims to move it by an offset
1
p = [px, py]
T
to obtain another rectangle on the next frame It+1, so that the pixel squared
difference in the two rectangles is minimized:
p
∗ = arg minp =
X
x∈N
||It+1(x + p) − It(x)||2
2
(2)
=
It+1(x1 + p)
.
.
.
It+1(xD + p)
−
It(x1)
.
.
.
It(xD)
2
2
(3)
Q1.1 (5 points) Starting with an initial guess of p (for instance, p = [0, 0]T
), we can
compute the optimal p
∗
iteratively. In each iteration, the objective function is locally
linearized by first-order Taylor expansion,
It+1(x
0 + ∆p) ≈ It+1(x
0
) + ∂It+1(x
0
)
∂x0T
∂W(x; p)
∂pT ∆p (4)
where ∆p = [∆px, ∆py]
T
, is the change in template offset. Further, x
0 = W(x; p) = x + p
and ∂I(x
0
)
∂x0T is a vector of the x− and y− image gradients at pixel coordinate x
0
. In a similar
manner to Equation 3 one can incorporate these linearized approximations into a vectorized
form such that,
arg min
∆p
||A∆p − b||2
2
(5)
Then, we update with the minimizer as p ← p + ∆p at each iteration. We repeat until
convergence.
• What is ∂W(x;p)
∂pT ?
• What is A and b?
• What conditions must AT A meet so that a unique solution to ∆p can be found?
Q1.2 (15 points) Implement a function with the following signature
LucasKanade(It, It1, rect, p0 = np.zeros(2))
that computes the optimal local motion from frame It to frame It+1 that minimizes Equation
3. Here It is the image frame It, It1 is the image frame It+1, rect is the 4-by-1 vector that
represents a rectangle describing all the pixel coordinates within N within the image frame
It, and p0 is the initial parameter guess (δx, δy). The four components of the rectangle
are [x1, y1, x2, y2]
T
, where [x1, y1]
T
is the top-left corner and [x2, y2]
T
is the bottom-right
corner. The rectangle is inclusive, i.e., it includes all the four corners. You will also need to
iterate the estimation until the change in ||∆p||2
2
is below a threshold.
To deal with fractional movement of the template, you will need to interpolate the image. We suggest using RectBivariateSpline from the scipy.interpolate package. Read
the documentation of defining the spline ( RectBivariateSpline) as well as evaluating the
spline using RectBivariateSpline.ev carefully. Please note that RectBivariateSpline.ev
can be used to evaluate the image values and gradients.
2
Q1.3 (10 points) Write a script testCarSequence.py that loads the video frames from
carseq.npy, and runs the Lucas-Kanade tracker that you have implemented in the previous
task to track the car. carseq.npy can be located in the data directory and it contains one
single three-dimensional matrix: the first two dimensions correspond to the height and width
of the frames respectively, and the third dimension contain the indices of the frames (that is,
the first frame can be visualized with imshow(frames[:, :, 0])). The rectangle in the first
frame is [x1, y1, x2, y2]
T = [59, 116, 145, 151]T
. Report your tracking performance (image +
bounding rectangle) at frames 1, 100, 200, 300 and 400 in a format similar to Figure 1.
Also, create a file called carseqrects.npy, which contains one single n × 4 matrix, where
each row stores the rect that you have obtained for each frame, and n is the total number
of frames. You are encouraged to play with the parameters defined in the scripts and report
the best results.
Similarly, write a script testGirlSequence.py that loads the video from girlseq.npy
and runs your Lucas-Kanade tracker on it. The rectangle in the first frame is [x1, y1, x2, y2]
T =
[280, 152, 330, 318]T
. Report your tracking performance (image + bounding rectangle like
in Figure 1) at frames 1, 20, 40, 60 and 80 in a format similar to Figure 1. Also, create a file
called girlseqrects.npy, which contains one single n × 4 matrix, where each row stores
the rect that you have obtained for each frame.
Figure 1: Lucas-Kanade Tracking with One Single Template
Q1.4 (20 points) As you might have noticed, the image content we are tracking in the first
frame differs from the one in the last frame. This is understandable since we are updating
the template after processing each frame and the error can be accumulating. This problem
is known as template drifting. There are several ways to mitigate this problem. One possible
approach comes from Iain Matthews et al.(2003).
https://www.ri.cmu.edu/publication_view.html?pub_id=4433
We want you to implement their approach (strategy 3 in section 2.1), so please read the
paper carefully. Write two scripts with a similar functionality to Q1.3 but with this template correction routine incorporated: testCarSequenceWithTemplateCorrection.py and
testGirlSequenceWithTemplateCorrection.py. Save the rects as carseqrects-wcrt.npy
and girlseqrects-wcrt.npy, and also report the performance at those frames. An example is given in Figure 2. Again, you are encouraged to play with the parameters defined in
the scripts to see how each parameter affects the tracking results.
Here the blue rectangles are created with the baseline tracker in Q1.3, the red ones with
the tracker in Q1.4. The tracking performance has been improved non-trivially. Note that
you do not necessarily have to draw two rectangles in each frame, but make sure that the
3
Figure 2: Lucas-Kanade Tracking with Template Correction
performance improvement can be easily visually inspected.
2 Affine Motion Subtraction
In this section, you will implement a tracker for estimating dominant affine motion in a
sequence of images and subsequently identify pixels corresponding to moving objects in the
scene. You will be using the images in the file aerialseq.npy, which consists of aerial views
of moving vehicles from a non-stationary camera.
2.1 Dominant Motion Estimation
In the first section of this homework we assumed the the motion is limited to pure translation.
In this section you shall implement a tracker for affine motion using a planar affine warp
function. To estimate dominant motion, the entire image It will serve as the template to
be tracked in image It+1, that is, It+1 is assumed to be approximately an affine warped
version of It. This approach is reasonable under the assumption that a majority of the pixels
correspond to the stationary objects in the scene whose depth variation is small relative to
their distance from the camera.
Using a planar affine warp function you can recover the vector ∆p = [p1, . . . , p6]
T
,
x
0 = W(x; p) =
1 + p1 p2
p4 1 + p5
x
y
+
p3
p6
. (6)
One can represent this affine warp in homogeneous coordinates as,
x˜
0 = Mx˜ (7)
where,
M =
1 + p1 p2 p3
p4 1 + p5 p6
0 0 1
. (8)
Here M represents W(x; p) in homogeneous coordinates as described in [1]. Also note
that M will differ between successive image pairs. Starting with an initial guess of p = 0
(i.e. M = I) you will need to solve a sequence of least-squares problem to determine ∆p
4
such that p → p + ∆p at each iteration. Note that unlike previous examples where the
template to be tracked is usually small in comparison with the size of the image, image
It will almost always not be contained fully in the warped version It+1. Hence, one must
only consider pixels lying in the region common to It and the warped version of It+1 when
forming the linear system at each iteration. Hint Use Equation 4 and determine ∂W(x;p)
∂pT
for the affine transform.
Q2.1 (15 points) Write a function with the following signature
LucasKanadeAffine(It, It1)
which returns the affine transformation matrix M, and It and It1 are It and It+1 respectively.
LucasKanadeAffine should be relatively similar to LucasKanade from the first section and
again we suggest you use RectBivariateSpline.
2.2 Moving Object Detection
Once you are able to compute the transformation matrix M relating an image pair It and
It+1, a naive way for determining pixels lying on moving objects is as follows: warp the
image It using M so that it is registered to It+1 and subtract it from It+1; the locations
where the absolute difference exceeds a threshold can then be declared as corresponding
to locations of moving objects. To obtain better results, you can check out the following
scipy.morphology functions: binary erosion, and binary dilation.
Figure 3: Lucas-Kanade Tracking with Motion Detection
Q2.2 (10 points) Using the function you have developed for dominant motion estimation,
write a function with the following signature
SubtractDominantMotion(image1, image2)
where image1 and image2 form the input image pair, and the return value mask is a binary image of the same size that dictates which pixels are considered to be corresponding
to moving objects. You should invoke LucasKanadeAffine in this function to derive the
transformation matrix M, and produce the aforementioned binary mask accordingly.
5
Q2.3 (10 points) Write two scripts testAntSequence.py and testAerialSequence.py
that load the image sequence from antseq.npy and aerialseq.npy and run the motion
detection routine you have developed to detect the moving objects. Try to implement
testAntSequence.py first as it involves little camera movement and can help you debug
your mask generation procedure. Report the performance at frames 30, 60, 90 and 120
with the corresponding binary masks superimposed, as exemplified in Figure 3. Feel free
to visualize the motion detection performance in a way that you would prefer, but please
make sure it can be visually inspected without undue effort.
3 Efficient Tracking
3.1 Inverse Composition
The inverse compositional extension of the Lucas-Kanade algorithm (see [1]) has been used
in literature to great effect for the task of efficient tracking. When used in tracking, the
main idea is that in each iteration we are trying to find the inverse warping W(x; ∆p)
−1
that best aligns our current tracking rectangle It+1(W(x; p) in the next frame with the
template in the current frame It(x). This is equivalent to finding the warping on the
template It(W(x; ∆p) that best aligns it with It+1(W(x; p). Thus, we can view the Inverse
Compositional Algorithm for tracking as attempting to minimize
p
∗ = arg minp =
X
x∈N
||It(W(x; ∆p) − It+1(W(x; p)||2
2
(9)
where we linearize just around the current frame It (instead of around the next frame It+1)
as
It(W(x; 0 + ∆p) ≈ It(x) + ∂It(x)
∂xT
∂W(x; 0)
∂pT ∆p . (10)
In a similar manner to the conventional Lucas-Kanade algorithm we can incorporate these
linearized approximations into a vectorized form such that
arg min
∆p
||A0∆p − b
0
||2
2
(11)
Then, we update our parameters using W(x; p) ← W(W(x; ∆p)
−1
; p). Note that for
our specific case of using an affine warp where p ⇔ M and ∆p ⇔ ∆M this results in the
update M ← M(∆M)
−1
. You can refer to the Lucas Kanade Lecture or section 2.2 of [2]:
https://www.ri.cmu.edu/pub_files/pub3/baker_simon_2003_3/baker_simon_2003_
3.pdf
Q3.1 (15 points) Reimplement the function LucasKanadeAffine(It,It1) as
InverseCompositionAffine(It,It1) using the inverse compositional method. In your
own words please describe why the inverse compositional approach is more computationally
efficient than the classical approach? Hint Is there anything special we can do if A0 doesn’t
depend on p?
6
4 HW3 Submission Checklist
The assignment (code and writeup) should be submitted to Gradescope with each part on
its own page. The code should be submitted as a zip named .zip. The zip
when uncompressed should produce the following files.
• code/LucasKanade.py
• code/LucasKanadeAffine.py
• code/SubtractDominantMotion.py
• code/InverseCompositionAffine.py
• code/testCarSequence.py
• code/testCarSequenceWithTemplateCorrection.py
• code/testGirlSequence.py
• code/testGirlSequenceWithTemplateCorrection.py
• code/testAntSequence.py
• code/testAerialSequence.py
• result/carseqrects.npy
• result/carseqrects-wcrt.npy
• result/girlseqrects.npy
• result/girlseqrects-wcrt.npy
*Do not include the data directory in your submission.
5 Frequently Asked Questions (FAQs)
Q1: Why do we need to use RectBivariateSpline for moving the rectangle template?
A1: When moving the rectangle template with ∆p, the new points can have fractional
coordinates. So, you need to use RectBivariateSpline to sample the image intensity at
those fractional coordinates.
Q2: What’s the right way of computing the image gradients Ix(x) and Iy(x)?
A2: You should first compute the entire image gradients Ix and Iy and then sample them
at x.
Q3: Can I use pseudo-inverse for the least-squared problem arg min∆p ||A∆p − b||2
2
?
A3: Yes, the pseudo-inverse solution of A∆p = b is also ∆p = (AT A)
−1AT b when A has
full column ranks, i.e., the linear system is overdetermined.
Q4: For inverse compositional Lucas Kanade, how should I deal with points outside out
the image?
7
A4: Since the Hessian in inverse compositional Lucas Kanade is precomputed, we cannot
simply remove points when they are outside the image since it can result in dimension
mismatch. However, we can set the error It+1(W(x; p)) − It(x) to 0 for x outside the
image.
Q4: How can I debug my inverse compositional Lucas Kanade?
A4: The easiest way is by checking whether you are getting similar results as Q2.3
on antseq.npy and aerialseq.npy. Additionally, InverseCompositionAffine(It,It1)
should be running much faster than LucasKanadeAffine(It,It1). For our implementation,
InverseCompositionAffine(It,It1) runs about 2 to 3 times faster.
8
WhatsApp us