COMP4901L Homework Assignment 7 Tracking Objects in Videos

$30.00

Category: Tags: , , , , You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

Overview
One incredibly important aspect of human and animal vision is the ability to follow objects and
people in our view. Whether it is a tiger chasing its prey, or you trying to catch a basketball, tracking
is so integral to our everyday lives that we forget how much we rely on it. In this assignment, you
will be implementing an algorithm that will track an object in a video.
You will first implement the Lucas-Kanade tracker, and then a more computationally efficient
version called the Matthew-Baker (or inverse compositional) method. This method is one of the
most commonly used methods in computer vision due to its simplicity and wide applicability. We
have provided two video sequences: a car on a road, and a helicopter approaching a runway.
To initialize the tracker you need to define a template by drawing a bounding box around the
object to be tracked in the first frame of the video. For each of the subsequent frames the tracker
will update an affine transform that warps the current frame so that the template in the first frame
is aligned with the warped current frame.
For extra credit, we will also look at ways to make tracking more robust, by incorporating
illumination invariance and implementing the algorithm in a pyramid fashion.
Preliminaries
An image transformation or warp is an operation that acts on pixel coordinates and maps pixel
values from one place to another in an image. Translation, rotation and scaling are all examples of
warps. We will use the symbol W to denote warps. A warp function W has a set of parameters p
associated with it and maps a pixel with coordinates x = [u v]
T
to x
0 = [u
0 v
0
]
T
.
x
0 = W(x; p) (1)
An affine transform is a warp that can include any combination of translation, anisotropic scaling
and rotations. An affine warp can be parametrized in terms of 6 parameters p = [p1 p2 p3 p4 p5 p6]
T
.
2
One of the convenient things about an affine transformation is that it is linear; its action on a point
with coordinates x = [u v]
T
can be described as a matrix operation



u
0
v
0
1


 = W(p)



u
v
1


 (2)
where W(p) is a 3 × 3 matrix such that
W(p) =



1 + p1 p3 p5
p2 1 + p4 p6
0 0 1


 (3)
Note that for convenience when we want to refer to the warp as a function we will use W(x; p)
and when we want to refer to the matrix for an affine warp we will use W(p). Table 1 contains a
summary of the variables used in the next two sections. It will be useful to keep these in mind.
Table 1: Summary of Variables
Symbol Vector/Matrix Size Description
u 1 × 1 Image horizontal coordinate
v 1 × 1 Image vertical coordinate
x 2 × 1 or 1 × 1 pixel coordinates: (u, v) or unrolled
I m × 1 Image unrolled into a vector (m pixels)
T m × 1 Template unrolled into a vector (m pixels)
W(p) 3 × 3 Affine warp matrix
p 6 × 1 parameters of affine warp
∂I
∂u m × 1 partial derivative of image wrt u
∂I
∂v m × 1 partial derivative of image wrt v
∂T
∂u m × 1 partial derivative of image wrt u
∂T
∂v m × 1 partial derivative of image wrt u
∇I m × 2 image gradient ∇I(x) = h
∂I
∂u
∂I
∂v i
∇T m × 2 image gradient ∇I(x) = h
∂T
∂u
∂T
∂v i
∂W
∂p
2 × 6 Jocobian of affine warp wrt its parameters
J m × 6 Jacobian of error function L wrt p
H 6 × 6 Pseudo Hessian of L wrt p
Lucas-Kanade: Forward Additive Alignment
A Lucas Kanade tracker maintains a warp W(x; p) which aligns a sequence of images It to a
template T. We denote pixel locations by x, so I(x) is the pixel value at location x in image
I. For the purposes of this derivation, I and T are treated as column vectors (think of them as
unrolled image matrices). W(x; p) is the point obtained by warping x with a transform that has
parameters p. W can be any transformation that is continuous in its parameters p. Examples of
valid warp classes for W include translations (2 parameters), affine transforms (6 parameters) and
3
full projective transforms (8 parameters). The Lucas Kanade tracker minimizes the pixel-wise sum
of square difference between the warped image I(W(x; p)) and the template T.
In order to align an image or patch to a reference template, we seek to find the parameter vector
p that minimizes L, where:
L =
X
x
[T(x) − I(W(x; p))]2
(4)
In general this is a difficult non-linear optimization, but if we assume we already have a close
estimate p of the correct warp, then we can assume that a small linear change ∆p is enough to
get the best alignment. This is the forward additive form of the warp. The objective can then be
written as:
L ≈
X
x
[T(x) − I(W(x; p + ∆p))]2
(5)
Expanding this to the first order with Taylor Series gives us:
L ≈
X
x
[T(x) − I(W(x; p)) − ∇I(x)
∂W
∂p
∆p]
2
(6)
Here ∇I(x) = h
∂I(x)
∂u
∂I(x)
∂v i
, which is the vector containing the horizontal and vertical gradient
at pixel location x. Rearranging the Taylor expansion, it can be rewritten as a typical least squares
approximation ∆p
∗ = arg min∆p ||A∆p − b||2
∆p
∗ = arg min
∆p
X
x

∇I
∂W
∂p
∆p − {T(x) − I(W(x; p))}
2
(7)
This can be solved with ∆p
∗ = (AT A)
−1AT
b where:
(A
T A) = H =
X
x

∇I
∂W
∂p
T
∇I
∂W
∂p

(8)
A =
X
x

∇I
∂W
∂p

(9)
b = T(x) − I(W(x; p)) (10)
Once ∆p is computed, the best estimate warp can be updated p ← p + ∆p, and the whole
procedure can be repeated again, stopping when ∆p is less than some threshold.
Matthew-Baker: Inverse Compositional Alignment
While Lucas-Kanade alignment works very well, it is computationally expensive. The inverse
compositional method is similar, but requires less computation, as the Hessian and Jacobian only
need to be computed once. One caveat is that the warp needs to be invertible. Since affine warps
are invertible, we can use this method.
In the previous section, we combined two warps by simply adding one parameter vector to
another parameter vector, and produce a new warp W(x, p + p
0
). Another way of combining
warps is through composition of warps. After applying a warp W(x; p) to an image, another warp
W(x; q) can be applied to the warped image. The resultant (combined) warp is
W(x; q) ◦ W(x; p) = W(W(x; p), q) (11)
4
Since affine warps can be implemented as matrix multiplications, composing two affine warps
reduces to multiplying their corresponding matrices
W(x; q) ◦ W(x; p) = W(W(x; p), q) = W(W(p)x, q) = W(q)W(p)x (12)
An affine transform can also be inverted. The inverse warp of W(p) is simply the matrix inverse
of W(p), W(p)
−1
. In this assignment it will sometimes be simpler to consider an affine warp as a
set of 6 parameters in a vector p and it will sometimes be easier to work with the matrix version
W(p). Fortunately, switching between these two forms is easy (Equation 3).
The minimization is performed using an iterative procedure by making a small change (∆p) to
p at each iteration. It is computationally more efficient to do the minimization by finding the p that
helps align the template to the image, than applying the inverse warp to the image. This is because
the image will change with each frame of the video, but the template is fixed at initialization. We
will see soon that doing this allows us to write the Hessian and Jacobian in terms of the template,
and so this can be computed once at the beginning of the tracking. Hence at each step, we want
to find the p to minimize
L =
X
x
[T(W(x; ∆p)) − I(W(x; ∆p))]2
(13)
For tracking a patch template, the summation is performed only over the pixels lying inside the
template region. We can expand T(W(x; ∆p)) in terms of its first order linear approximation to
get
L =
X
x

T(x) + ∇T(x)
∂W
∂p
∆p − I(W(x; ∆p))2
(14)
where T(x) = h
∂T(x)
∂u
∂T(x)
∂v i
. To minimize we need to take the derivative of L and set it to zero
∂L
∂∆p
= 2X
x

∇T
∂W
∂p
T
T(x) + ∇T(x)
∂W
∂p
∆p − I(W(x; p))
(15)
Setting to zero, switching from summation to vector notation and solving for ∆p we get
∆p = H−1J
T
[I(W(x; p)) − T] (16)
where J is the Jacobian of T(W(x; ∆p)), J = T∂W
∂p
, H is the approximated Hessian H = J
T J
and I(W(x; p)) is the warped image. Note that for a given template, the Jacobian J and Hessian
H are independent of p. This means they only need to be computed once and then they can be
reused during the entire tracking sequence.
Once ∆p has been solved for, it needs to be inverted and composed with p to get the new warp
parameters for the next iteration.
W(x; p) ← W(x; p) ◦ W(x; ∆p)
−1
(17)
The next iteration solves Equation 16 starting with the new value of p. Possible termination
criteria include the absolute value of ∆p falling below some value or running for some fixed number
of iterations.
5
1 Theory Questions (25 points)
Type down your answers for the following questions in your write-up. Each question
should only take a couple of lines. In particular, the “proofs” do not require any lengthy calculations.
If you are lost in many lines of complicated algebra you are doing something much too complicated
(or wrong).
Q1.1: Calculating the Jacobian (15 points)
Assuming the affine warp model defined in Equation 3, derive the expression for the Jacobian
Matrix J in terms of the warp parameters p = [p1 p2 p3 p4 p5 p6]
0
.
Q1.2: Computational complexity (10 points)
Find the computational complexity (Big O notation) for the initialization step (Precomputing J and
H−1
) and for each runtime iteration (Equation 16) of the Inverse Compositional method. Express
your answers in terms of n, m and p where n is the number of pixels in the template T, m is the
number of pixels in an input image I and p is the number of parameters used to describe the warp
W. How does this compare to the run time of the regular Lucas-Kanade method?
2 Lucas-Kanade Tracker (60 points)
For this section, TAs will grade your tracker based on the performance you achieved on the two provided video sequences: (1) data/car1/ and (2) data/landing/. The provided script files lk demo.m
and mb demo.m handle reading in images, template region marking, making tracker function calls
and displaying output onto the screen. The function prototypes provided are guidelines. Please
make sure that your code runs functionally with the original script and generates the outputs we
are looking for (a frame sequence with the bounding box of the target being tracked on each frame)
so that we can replicate your results.
Note that the only thing TAs would do for you during grading is change the input data directory,
and initialize your tracker based on what you mentioned in your write-up. Please submit one video
for each of them in the results/ directory, with file name car.mp4 and landing.mp4. Also, please
mention the initialization coordinates of your tracker for both video sequences in your write-up and
in your code.
Q2.1: Write a Lucas-Kanade Tracker for a Flow Warp (20 points)
Write the function with the following function signature:
[u,v] = LucasKanade(It, It1, rect)
that computes the optimal local motion from frame It to frame It+1 that minimizes Equation 1.
Here It is the image frame It, It1 is the image frame It+1 , and rect is the 4 × 1 vector that
represents a rectangle on the image frame It. The four components of the rectangle are [x, y, w,
h], where (x, y) is the top-left corner and (w, h) is the width and height of the bounding box.
The rectangle is inclusive, i.e., in includes all the four corners. To deal with fractional movement
of the template, you will need to interpolate the image using the Matlab function interp2. You
will also need to iterate the estimation until the change in warp parameters (u, v) is below a
threshold. Use the forward compositional (Lucas-Kanade method) for this question.
6
Q2.2: Initializing the Matthew-Baker Tracker (10 points)
Write the function initAffineMBTracker() that initializes the inverse compositional tracker by
precomputing important matrices needed to track a template patch.
function [affineMBContext] = initAffineMBTracker(img, rect)
The function will input a greyscale image (img) along with a bounding box (rect) (in the
format [x y w h]).
The function should output a Matlab structure affineMBContext that contains the Jacobian
of the affine warp with respect to the 6 affine warp parameters and the inverse of the approximated
Hessian matrix (J and H−1
in Equation 16).
Q2.3: The Main Matthew-Baker Tracker (20 points)
Write the function affineMBTracker() that does the actual template tracking.
function [Wout] = affineMBTracker(img, tmp, rect, Win, context)
The function will input a greyscale image of the current frame (img), the template image (tmp),
the bounding box rect that marks the template region in tmp, The affine warp matrix for the
previous frame (Win) and the precomputed J and H−1 matrices context.
The function should output the 3 × 3 matrix Wout that contains the new affine warp matrix
updated so that it aligns the current frame with the template.
You can either used a fixed number of gradient descent iterations or formulate a stopping criteria
for the algorithm. You can use the included image warping function to apply affine warps to images.
Q2.4: Tracking a Car (5 points)
Test your trackers on the short car video sequence (data/car1/) by running the wrapper scripts
lk demo.m and mb demo.m. What sort of templates work well for tracking? At what point does the
tracker break down? Why does this happen?
In your write-up: Submit your best video of the car being tracked. Save it as results/car.mp4.
Q2.5: Tracking Runway Markings (5 points)
Try running your tracker on the landing video (data/landing/). This video was taken during a
runway approach. With a model of the markings (the lengths of the line segments etc) the output
of the tracker can be used to estimate the camera position with respect to the runway and can be
used in an automated landing system.
In your write-up: Submit your best video of the runway markings being tracked. Save it as
results/landing.mp4.
Figure 2: Tracking in the car and runway image sequences
7
3 Extra Credit (40 points)
Q3.1x: Adding Illumination Robustness (20 points)
The LK tracker as it is formulated now breaks down when there is a change in illumination because
the sum of squared distances error it tries to minimize is sensitive to illumination changes. There are
a couple of things you could try to do to fix this. The first is to scale the brightness of pixels in each
frame so that the average brightness of pixels in the tracked region stays the same as the average
brightness of pixels in the template. The second is to use a more robust M-estimator instead of least
squares (so, e.g. a Huber or a Tukey M-estimator) that does not let outliers adversely affect the
cost function evaluation. Note that doing this will modify our least squares problem to a weighted
least squares problem, i.e. for each residual term you will have a corresponding weight Λii and
your minimization function will look like
L =
X
x
Λii[T(W(x; ∆p)) − I(W(x; p))]2
(18)
leading your jacobian computation to be a weighted jacobian instead of what you have seen earlier
(Eq. 8)
A
T ΛA∆p = A
T Λb (19)
⇒ ∆p = (A
T ΛA)
−1Λb (20)
Here Λ is a diagonal matrix of weights corresponding to the residual term computed as per
the choice of the robust M-estimator used, A is the jacobian matrix for each pixel of the template
considered in the cost function, and b is the corresponding vector of residuals. Implement these
two methods and test your new tracker on the car video sequence again (data/car/).
Implement these modifications for the forward compositional Lucas Kanade tracker that you
implemented for Q2.1. Use the same function names with Robust appended. Include both versions
of the code in your submission.
Q3.2x: LK Tracking on an Image Pyramid (20 points)
If the target being tracked moves a lot between frames, the LK tracker can break down. One way
to mitigate this problem is to run the LK tracker on a set of image pyramids instead of a single
image. The Pyramid tracker starts by performing tracking on a higher level (smaller image) to
get a course alignment estimate, propagating this down into the lower level and repeating until a
fine aligning warp has been found at the lowest level of the pyramid (the original sized image). In
addition to being more robust, the pyramid version of the tracker is much faster because it needs
to run fewer gradient descent iterations on the full scale image due to its coarse to fine approach.
Use the same function names as before but append ‘Pyramid’ to the function names while
submitting your code. Include both versions of the code in your submission. Implement these
modifications again for the forward compositional Lucas Kanade tracker on a flow warp.
4 Submission Summary
• Q1.1 Derive the expression for the Jacobian Matrix
• Q1.2 What is the computational complexity of inverse compositional method?
8
• Q2.1 Write the forward compositional tracker (LK Tracker)
• Q2.2 Initialize the inverse compositional tracker (MB Tracker)
• Q2.3 Write the inverse compositional tracker (MB Tracker)
• Q2.4 Run the inverse compositional tracker on the car dataset. What templates does work
well with? When does the tracker break down? Why does this happen?
• Q2.5 Run the inverse compositional tracker on the run markings dataset.
• Q3.1x Add illumination robustness
• Q3.2x LK Tracking on an image pyramid.
References
1. Simon Baker, et al. Lucas-Kanade 20 Years On: A Unifying Framework: Part 1, CMURITR-02-16, Robotics Institute, Carnegie Mellon University, 2002
2. Simon Baker, et al. Lucas-Kanade 20 Years On: A Unifying Framework: Part 2, CMURITR-03-35, Robotics Institute, Carnegie Mellon University, 2003
3. Bouguet, Jean-Yves. Pyramidal Implementation of the Lucas Kanade Feature Tracker: Description of the algorithm, Intel Corporation, 2001
9