Description
Part I: Theoretical Problems (70 marks)
[Question 1] RANSAC (10 marks)
We have two images of a planar object (e.g. a painting) taken from different viewpoints and
we want to align them. We have used SIFT to find a large number of point correspondences
between the two images and visually estimate that at least 70% of these matches are correct
with only small potential inaccuracies. We want to find the true transformation between the
two images with a probability greater than 99.5%.
1. (5 marks) Calculate the number of iterations needed for fitting a homography.
2. (5 marks) Without calculating, briefly explain whether you think fitting an affine
transformation would require fewer or more RANSAC iterations and why.
1
[Question 2] Camera Models (30 marks)
Assume a plane passing through point P⃗
0 = [X0, Y0, Z0]
T with normal ⃗n. The corresponding
vanishing points for all the lines lying on this plane form a line called the horizon. In this
question, you are asked to prove the existence of the horizon line by following the steps below:
1. (15 marks) Find the pixel coordinates of the vanishing point corresponding to a line
L, passing point P⃗
0 and going along direction ⃗d.
Hint: P⃗ = P⃗
0 +t
⃗d are the points on line L, and ⃗p =
ωx
ωy
ω
= K P⃗ = K
X0 + t dx
Y0 + t dy
Z0 + t dz
are pixel coordinates of the same line in the image, and K =
f 0 px
0 f py
0 0 1
, where f is
the camera focal length and (px, py) is the principal point.
2. (15 marks) Prove the vanishing points of all the lines lying on the plane form a line.
Hint: all the lines on the plane are perpendicular to the plane’s normal ⃗n; that is,
⃗n . ⃗d = 0, or nx dx + ny dy + nz dz = 0
[Question 3] Homogeneous Coordinates (30 marks)
Using the homogeneous coordinates:
1. (15 marks) (a) Show that the intersection of the 2D line l and l
′
is the 2D point
p = l × l
′
.
(here × denotes the cross product)
2. (15 marks) (b) Show that the line that goes through the 2D points p and p
′
is l = p×p
′
.
2
Part II: Implementation Tasks (90 marks)
[Question 4] Homography (60 marks)
You are given three images hallway1.jpg, hallway2.jpg, hallway3.jpg which were shot
with the same camera (i.e. same internal camera parameters), but held at slightly different
positions/orientations (i.e. with different external parameters).
hallway1.jpg hallway2.jpg hallway3.jpg
Consider the homographies H,
wexe
weye
we
=
x
y
1
that map corresponding points of one image I to a second image Ie, for three cases:
A. The right wall of I =hallway1.jpg to the right wall of Ie=hallway2.jpg.
B. The right wall of I =hallway1.jpg to the right wall of Ie=hallway3.jpg.
C. The floor of Ie=hallway1.jpg to the floor of Ie=hallway3.jpg.
For each of these three cases:
1. (10 marks) Use a Data Cursor to select corresponding points by hand. Select more
than four pairs of points. (Four pairs will give a good fit for those points, but may give
a poor fit for other points.) Also, avoid choosing three (or more) collinear points, since
these do not provide independent information. This is trickier for case C. Make two
figures showing the gray-level images of I and Ie with a colored square marking each
of the selected points. You can convert the image I or Ie to gray level using an RGB to
grayscale function (or the formula gray = 0.2989 × R + 0.5870 × G + 0.1140 × B).
2. (10 marks) Fit a homography H to the selected points. Include the estimated H in
the report, and describe its effect using words such as scale, shear, rotate, translate,
if appropriate. You are not allowed to use any homography estimation function in
OpenCV or other similar packages.
3. (10 marks) Make a figure showing the Ie image with red squares that mark each of
the selected (x, e ye), and green squares that mark the locations of the estimated (x, e ye),
that is, use the homography to map the selected (x, y) to the (x, e ye) space.
3
4. (25 marks) Make a figure showing a new image that is larger than the original one(s).
The new image should be large enough that it contains the pixels of the I image as a
subset, along with all the inverse mapped pixels of the Ie image. The new image should
be constructed as follows:
• RGB values are initialized to zero,
• The red channel of the new image must contain the rgb2gray values of the I
image (for the appropriate pixel subset only );
• The blue and green channels of the new image must contain the rgb2gray values
of the corresponding pixels (x, e ye) of Ie. The correspondence is computed as follows:
for each pixel (x, y) in the new image, use the homography H to map this pixel to
the (x, e ye) domain (not forgetting to divide by the homogeneous coordinate), and
round the value so you get an integer grid location. If this (x, e ye) location indeed
lies within the domain of the Ie image, then copy the rgb2gray’ed value from that
Ie(x, e ye) into the blue and green channel of pixel (x, y) in the new image. (This
amounts to an inverse mapping.)
If the homography is correct and if the surface were Lambertian∗
then corresponding points in the new image would have the same values of R,G, and B and so the
new image would appear to be gray at these pixels.
• Based on your results, what can you conclude about the relative 3D positions and
orientations of the camera? Give only qualitative answers here. Also, What can
you conclude about the surface reflectance of the right wall and floor, namely are
they more or less Lambertian? Limit your discussion to a few sentences.
(5 marks) Along with your writeup, hand in the program that you used to solve the problem. You should have a switch statement that chooses between cases A, B, C.
∗ Lambertian reflectance is the property that defines an ideal “matte” or diffusely reflecting
surface. The apparent brightness of a Lambertian surface to an observer is the same regardless
of the observer’s angle of view. Unfinished wood exhibits roughly Lambertian reflectance, but
wood finished with a glossy coat of polyurethane does not, since the glossy coating creates
specular highlights. Specular reflection, or regular reflection, is the mirror-like reflection of
waves, such as light, from a surface. Reflections on still water are an example of specular
reflection.
[Question 5] Mean Shift Tracking (30 marks)
In tutorial 10, we learned about the mean shift and cam shift tracking. In this question,
we first attempt to evaluate the performance of mean shift tracking in a single case and will
then implement a small variation of the standard mean shift tracking. For both parts you
can use the attached short video KylianMbappe.mp4 or, alternatively, you can record and
use a short (2-3 second) video of yourself. You can use any OpenCV (or other) functions you
want in this question.
4
1. (20 marks) Performance Evaluation
• Use the Viola-Jones face detector to detect the face on the first frame of the video.
The default detector can detect the face in the first frame of the attached video. If
you record a video of yourself, make sure your face is visible and facing the camera
in the first frame (and throughout the video) so the detector can detect your face
in the first frame.
• Construct the hue histogram of the detected face on the first frame using appropriate saturation and value thresholds for masking. Use the constructed hue
histogram and mean shift tracking to track the bounding box of the face over the
length of the video (from frame #2 until the last frame). So far, this is similar to
what we did in the tutorial.
• Also, use the Viola-Jones face detector to detect the bounding box of the face in
each video frame (from frame #2 until the last frame).
• Calculate the intersection over union (IoU) between the tracked bounding box and
the Viola-Jones detected box in each frame. Plot the IoU over time. The x axis
of the plot should be the frame number (from 2 until the last frame) and the y
axis should be the IoU on that frame.
• In your report, include a sample frame in which the IoU is large (e.g. over 50%)
and another sample frame in which the IoU is low (e.g. below 10%). Draw the
tracked and detected bounding boxes in each frame using different colors (and
indicate which is which).
• Report the percentage of frames in which the IoU is larger than 50%.
• Look at the detected and tracked boxes at frames in which the IoU is small (< 10%)
and report which (Viola-Jones detection or tracked bounding box) is correct more
often (we don’t need a number, just eyeball it). Very briefly (1-2 sentences) explain
why that might be.
2. (10 marks) Implement a Simple Variation
• In the examples in Tutorial 10 (and the previous part of this question) we used
a hue histogram for mean shift tracking. Here, we implement an alternative in
which a histogram of gradient direction values is used instead.
• After converting to grayscale, use blurring and the Sobel operator to first generate image gradients in the x and y directions (Ix and Iy). You can then use
cartToPolar (with angleInDegrees=True) to get the gradient magnitude and
angle at each frame. You can use 24 histogram bins and [0,360] (i.e. not [0,180])
directions.
• When constructing hue histograms, we thresholded saturation and value channels to create a mask. Here, you can threshold the gradient magnitude to create
a mask. For example, you can mask out pixels in the region of interest in which
the gradient magnitude is less than 10% of the maximum gradient magnitude in
the RoI.
5
• Calculate the intersection over union (IoU) between the tracked bounding box and
the Viola-Jones detected box in each frame. Plot the IoU over time. The x axis
of the plot should be the frame number (from 2 until the last frame) and the y
axis should be the IoU on that frame.
• In your report, include a sample frame in which the IoU is large (e.g. over 50%)
and another sample frame in which the IoU is low (e.g. below 10%). Draw the
tracked and detected bounding boxes in each frame using different colors (and
indicate which is which).
• Report the percentage of frames in which the IoU is larger than 50%.
6