Description
Part I
Theory (Harsh)
Before implementing our own 3D reconstruction, let’s take a look at some simple theory questions
that may arise. The answers to the below questions should be relatively short, consisting of a few
lines of math and text (maybe a diagram if it helps your understanding).
Q1.1 (5 points) Suppose two cameras fixate on a point x (see Figure 1) in space such that
their principal axes intersect at that point. Show that if the image coordinates are normalized
so that the coordinate origin (0, 0) coincides with the principal point, the F33 element of the
fundamental matrix is zero.
(0, 0)
(0, 0)
P
C1 C2
Figure 1: Figure for Q1.1. C1 and C2 are the optical centers. The principal axes intersect at point
w (P in the figure).
Q1.2 (5 points) Consider the case of two cameras viewing an object such that the second
camera differs from the first by a pure translation that is parallel to the x-axis. Show that the
epipolar lines in the two cameras are also parallel to the x-axis. Backup your argument with
relevant equations.
Q1.3 (5 points) Suppose we have an inertial sensor which gives us the accurate positions (Ri
and ti
, the rotation matrix and translation vector) of the robot at time i. What will be the effective
rotation (Rrel) and translation (trel) between two frames at different time stamps? Suppose the
camera intrinsics (K) are known, express the essential matrix (E) and the fundamental matrix (F)
in terms of K, Rrel and trel.
Q1.4 (10 points) Suppose that a camera views an object and its reflection in a plane mirror.
Show that this situation is equivalent to having two images of the object which are related by
a skew-symmetric fundamental matrix. You may assume that the object is flat, meaning that
all points on the object are of equal distance to the mirror (Hint: draw the relevant vectors to
understand the relationships between the camera, the object and its reflected image.)
2
Figure 2: Figure for Q1.3. C1 and C2 are the optical centers. The rotation and the translation is
obtained using inertial sensors. Rrel and trel are the relative rotation and translation between two
frames.
Part II
Practice
1 Overview
In this part you will begin by implementing the two different methods seen in class to estimate
the fundamental matrix from corresponding points in two images (Section 2). Next, given the fundamental matrix and calibrated intrinsics (which will be provided) you will compute the essential
matrix and use this to compute a 3D metric reconstruction from 2D correspondences using triangulation (Section 3). Then, you will implement a method to automatically match points taking
advantage of epipolar constraints and make a 3D visualization of the results (Section 4). Finally,
you will implement RANSAC and bundle adjustment to further improve your algorithm (??).
2 Fundamental matrix estimation (Rishi)
In this section you will explore different methods of estimating the fundamental matrix given a pair
of images. In the data/ directory, you will find two images (see Figure 3) from the Middlebury multiview dataset1
, which is used to evaluate the performance of modern 3D reconstruction algorithms.
1
http://vision.middlebury.edu/mview/data/
3
Figure 3: Temple images for this assignment
Figure 4: displayEpipolarF in helper.py creates a GUI for visualizing epipolar lines
The Eight Point Algorithm
The 8-point algorithm (discussed in class, and outlined in Section 10.1 of Forsyth & Ponce) is
arguably the simplest method for estimating the fundamental matrix. For this section, you can use
provided correspondences you can find in data/some corresp.npz.
Q2.1 (10 points) Finish the function eightpoint in submission.py. Make sure you follow
the signature for this portion of the assignment:
F = eightpoint(pts1, pts2, M)
where pts1 and pts2 are N × 2 matrices corresponding to the (x, y) coordinates of the N points
in the first and second image repectively. M is a scale parameter.
• You should scale the data as was discussed in class, by dividing each coordinate by M (the
maximum of the image’s width and height). After computing F, you will have to “unscale”
the fundamental matrix.
Hint: If xnormalized = Tx, then Funnormalized = TT FT.
You must enforce the singularity condition of the F before unscaling.
• You may find it helpful to refine the solution by using local minimization. This probably won’t
fix a completely broken solution, but may make a good solution better by locally minimizing
a geometric cost function.
4
For this we have provided a helper function refineF in helper.py taking in F and the two
sets of points, which you can call from eightpoint before unscaling F.
• Remember that the x-coordinate of a point in the image is its column entry, and y-coordinate
is the row entry. Also note that eight-point is just a figurative name, it just means that you
need at least 8 points; your algorithm should use an over-determined system (N > 8 points).
• To visualize the correctness of your estimated F, use the supplied function displayEpipolarF
in helper.py, which takes in F, and the two images. This GUI lets you select a point in one
of the images and visualize the corresponding epipolar line in the other image (Figure 4).
• Output: Save your matrix F, scale M to the file q2 1.npz.
In your write-up: Write your recovered F and include an image of some example output
of displayEpipolarF.
The Seven Point Algorithm
Q2.2 (15 points) Since the fundamental matrix only has seven degrees of freedom, it is
possible to calculate F using only seven point correspondences. This requires solving a polynomial
equation. In the section, you will implement the seven-point algorithm (described in class, and
outlined in Section 15.6 of Forsyth and Ponce). Manually select 7 points from provided point in
data/some corresp.npz, and use these points to recover a fundamental matrix F. The function
should have the signature:
Farray = sevenpoint(pts1, pts2, M)
where pts1 and pts2 are 7 × 2 matrices containing the correspondences and M is the normalizer
(use the maximum of the images’ height and width), and Farray is a list array of length either 1
or 3 containing Fundamental matrix/matrices. Use M to normalize the point values between [0, 1]
and remember to “unnormalize” your computed F afterwards.
• Use displayEpipolarF to visualize F and pick the correct one.
• Output: Save your matrix F, scale M, 2D points pts1 and pts2 to the file q2 2.npz.
In your write-up: Write your recovered F and print an output of displayEpipolarF.
Also, include an image of some example output of displayEpipolarF using the seven point
algorithm.
• Hints: You can use Numpy’s function roots(). The epipolar lines may not match exactly
due to imperfectly selected correspondences, and the algorithm is sensitive to small changes
in the point correspondences. You may want to try with different sets of matches.
3 Metric Reconstruction (Shumian)
You will compute the camera matrices and triangulate the 2D points to obtain the 3D scene structure. To obtain the Euclidean scene structure, first convert the fundamental matrix F to an essential
matrix E. Examine the lecture notes and the textbook to find out how to do this when the internal
camera calibration matrices K1 and K2 are known; these are provided in data/intrinsics.npz.
5
Q3.1 (5 points) Write a function to compute the essential matrix E given F, K1 and K2
with the signature:
E = essentialMatrix(F, K1, K2)
In your write-up: Write your estimated E using F from the eight-point algorithm.
Given an essential matrix, it is possible to retrieve the projective camera matrices M1 and M2
from it. Assuming M1 is fixed at [I, 0], M2 can be retrieved up to a scale and four-fold rotation
ambiguity. For details on recovering M2, see section 7.2 in Szeliski. We have provided you with
the function camera2 in python/helper.py to recover the four possible M2 matrices given E.
Note: The M1 and M2 here are projection matrices of the form: M1 =
I|0
and M2 =
R|t
.
Q3.2 (10 points) Using the above, write a function to triangulate a set of 2D coordinates in
the image to a set of 3D points with the signature:
[w, err] = triangulate(C1, pts1, C2, pts2)
where pts1 and pts2 are the N ×2 matrices with the 2D image coordinates and w is an N ×3 matrix
with the corresponding 3D points per row. C1 and C2 are the 3 × 4 camera matrices. Remember
that you will need to multiply the given intrinsics matrices with your solution for the canonical
camera matrices to obtain the final camera matrices. Various methods exist for triangulation –
probably the most familiar for you is based on least squares (see Szeliski Chapter 7 if you want to
learn about other methods):
For each point i, we want to solve for 3D coordinates wi =
xi
, yi
, zi
T
, such that when they
are projected back to the two images, they are close to the original 2D points. To project the
3D coordinates back to 2D images, we first write wi
in homogeneous coordinates, and compute
C1w˜i and C2w˜i to obtain the 2D homogeneous coordinates projected to camera 1 and camera 2,
respectively.
For each point i, we can write this problem in the following form:
Aiwi = 0,
where Ai
is a 4 × 4 matrix, and w˜i
is a 4 × 1 vector of the 3D coordinates in the homogeneous
form. Then, you can obtain the homogeneous least-squares solution (discussed in class) to solve
for each wi
.
In your write-up: Write down the expression for the matrix Ai
.
Once you have implemented triangulation, check the performance by looking at the reprojection
error:
err =
X
i
kx1i
, xc1ik
2 + kx2i
, xc2ik
2
where xc1i = P roj(C1, wi) and xc2i = P roj(C2, wi).
Note: C1 and C2 here are projection matrices of the form: C1 = K1M1 = K1
I|0
and
C2 = K2M2 = K2
R|t
.
Q3.3 (10 points) Write a script findM2.py to obtain the correct M2 from M2s by testing the
four solutions through triangulations. Use the correspondences from data/some corresp.npz.
Output: Save the correct M2, the corresponding C2, and 3D points P to q3 3.npz.
6