ROB501 Assignment 2: Camera Pose Estimation


Category: You will Instantly receive a download link for .zip solution file upon Payment


5/5 - (1 vote)


Camera pose estimation is a very robotic vision common task—we typically wish to know the pose (position
and orientation) of the camera in the environment at all times. Pose estimation is also closely related to camera

The general problem of pose estimation can be tricky (because you need to know something about
the scene geometry). In this assignment, you will learn how to estimate the pose of the camera relative to a
known object, in this case a planar checkerboard camera calibration target of fixed size.

The goals are to:
• provide practical experience with image smoothing and subpixel feature extraction, and
• assist in understanding the nonlinear least squares optimization algorithm.
The due date for assignment submission

All submissions
will be in Python 3 via Autolab; you may submit as many times as you wish until the deadline. To complete
the assignment, you will need to review some material that goes beyond that discussed in the lectures—more
details are provided below. The project has four parts, worth a total of 50 points.

Please clearly comment your code and ensure that you only make use of the Python modules and functions listed
at the top of the code templates. We will view and run your code.

Implementation Details

For problems in which the pose of the camera must be determined with a high degree of accuracy (e.g., camera
tracking for high fidelity 3D reconstruction), it is not unusual to insert a known calibration object into the
scene; knowledge of the geometry of the calibration object can then be used to assist with pose estimation
(assuming that the object is visible).

As part of this assignment, you will estimate the camera pose relative to a checkerboard target whose squares
are 63.5 mm in size (on a side). You may assume that the checkerboard is perfectly flat, that is, the z
coordinate of any point lying on the board is exactly zero (in the frame of the target). Sample images are
shown in Figure 1.

You may also assume that each image has already been unwarped to remove any lens
distortion effects (you will be estimating pose parameters only; the intrinsic parameters are also assumed to
be known already).
(a) (b) (c) (d) (e)
Figure 1: Sample images for use when testing your pose estimation algorithm.

Part 1: Image Smoothing and Subpixel Feature Extraction

To determine the camera pose, you will need to carefully extract a set of known feature points from an image
of the target. Correspondences between the observed 2D image coordinates and the known 3D coordinates
of the feature points (landmarks) then allows the pose to be determined (see Part 4).

Typically, for a planar
checkerboard target, cross-junction points are used, for two reasons: 1) they are easy to extract and 2) they
are invariant to perspective transformations. A cross-junction is defined as the (ideal) point at which the
diagonal black and white squares meet.

In the sample images, the number of cross-junctions is 8 × 6 = 48.
There are variety of ways to identify the cross-junctions (for example, using the Harris corner detector).
Usually, the coarse estimates of the cross-junction positions in each image are then refined using a saddle
point detector, such as the one described in the following paper (included in the assignment archive):
L. Lucchese and S. K. Mitra, “Using Saddle Points for Subpixel Feature Detection in Camera Calibration Targets,” in Proceedings of the Asia-Pacific Conference on Circuits and Systems (APCCAS),
vol. 2, (Singapore), pp. 191–195, December 2002.

The saddle point is the best subpixel estimate of the true position of the cross-junction. Images are typically
smoothed with a Gaussian filter prior to computing the saddle points—this is done because, in many cases,
the cross-junctions are not clearly defined (due to, e.g., image quantization errors). If you zoom in on one of
the sample images, you may notice this effect.

Your first task is to write a Python function that computes the position of the saddle point in a small image
patch. This can be carried out by fitting a hyperbolic paraboloid to the smoothed intensity surface. The
relevant fitting problem is defined by Eqn. (4) in the Lucchese paper, and is solved (unsurprisingly) using
linear least squares! For this portion of the assignment, you should submit:

• A single function in, which accepts a small image patch as input and attempts to find
a saddle point using the algorithm described in the paper. Note that the image coordinates of the saddle
point should be returned with subpixel precision (i.e., as double-precision floating point numbers).

You will have access to the SciPy gaussian_filter function, which will perform image blurring (smoothing with a symmetric Gaussian kernel of fixed standard deviation)—feel free to try it out! For convenience,
all of the Autolab tests for Part 1 use patches that have been pre-blurred in advance.

Note that two steps are required to find the saddle point: Eqn. (4) must be solved first, for the parameters α,
β, γ, δ, ϵ, and ζ; the coordinates of the saddle point are then found using the next equation in the paper.

Part 2: Extracting All Cross-Junctions

Your second task is to implement a Python function that returns an ordered list (row-major, from top left)
of the cross-junction points on the target, to subpixel precision as determined by the saddle point detector.

In every case, we will provide a bounding polygon that encloses the target (where the bounding polygon
has points ordered clockwise from the upper left corner of the target); the upper left cross-junction should
be taken as the origin of the target frame, with the x-axis pointing to the right in the image and the z-axis
extending into the page.

Using this information, you will be able compute the metric coordinates of each
(ideal) cross-junction (in metres)—we have also provided a world_points.npy file that contains this set
of 3D coordinates.

For this portion of the assignment, you should submit:
• A single function in, which accepts an image and a bounding polygon and extracts
all of the cross-junctions on the planar target, returning their coordinates with subpixel precision (in rowmajor order). The function should also accept the 3D (world) coordinates of the points (landmarks).

number of image features you extract must be the same as the number of landmarks.
The first set of (x, y) bounding polygon coordinates will always be closest to the upper leftmost cross-junction
on the target (this should make ordering easier). Note that you may need to copy and paste the ‘innards’ of
your function into the appropriate section of the function.

should develop a novel way to coarsely localize the cross-junctions, followed by subpixel refinement with the
saddle point detector. Note that this part of the assignment may require substantial effort.

Part 3: Camera Pose Jacobians

Upon completing Parts 1 and 2 of the assignment, you should have a function (in
that produces a series of 2D-3D feature correspondences that can be used for pose estimation. You will implement pose estimation using a nonlinear least squares (NLS) procedure that incorporates all of the available

The solution to the PnP problem (i.e., Perspective-n-Points) is described in Section 6.2 of the Szeliski text.
Although there is a linear algorithm for the problem, it does not work when all points are coplanar. Instead,
we will provide you with an initial guess for the camera pose (±10◦ and 20 cm, approximately).

You will
need to know the camera intrinsic calibration matrix, which in this case is
K =

564.9 0 337.3
0 564.3 226.5
0 0 1

 ,
where the focal lengths and principal point values are in pixels.

To make things easier, we will use an Euler
angle parameterization for the camera orientation. It will be necessary to solve for six parameters: the x, y,
and z camera translation, and the roll, pitch, and yaw Euler angles that define the camera orientation.

that we wish to solve for the pose of the camera relative to the target, TW C (i.e., a 4 × 4 homogeneous
pose matrix). When expressed in terms of the roll (ϕ), pitch (θ), and yaw (ψ) Euler angles, the rotation
matrix that defines the orientation of the camera frame relative to the world (target) frame is
CW C (ψ, θ, ϕ) = C(ψ) C(θ) C(ϕ) (1)

cos ψ − sin ψ 0
sin ψ cos ψ 0
0 0 1

cos θ 0 sin θ
0 1 0
− sin θ 0 cos θ

1 0 0
0 cos ϕ − sin ϕ
0 sin ϕ cos ϕ

 (2)

cos ψ cos θ cos ψ sin θ sin ϕ − sin ψ cos ϕ cos ψ sin θ cos ϕ + sin ψ sin ϕ
sin ψ cos θ sin ψ sin θ sin ϕ + cos ψ cos ϕ sin ψ sin θ cos ϕ − cos ψ sin ϕ
− sin θ cos θ sin ϕ cos θ cos ϕ

 . (3)

In order to compute the NLS solution, you will need the (full) Jacobian matrix for the image plane points
(observations) with respect to the (six) pose parameters. The full Jacobian is composed of a series of 2 × 6
sub-matrices, stacked vertically.

For this portion of the assignment, you should submit:
• A single function in that computes a 2 × 6 Jacobian matrix for each image plane
(feature) observation with respect to the camera pose parameters.
We will use the pinhole projection model, where the image plane coordinates of the projection of landmark
j, with 3D position p¯j , are
x˜ij = K˜ (TW Ci
p¯j (4)
for camera pose i. Two helper functions to convert between Euler angles and rotation matrices are available
on Autolab (and in the code package that accompanies this assignment document) to assist you.

Part 4: Camera Pose Estimation

The final step is to set up, and then solve, the nonlinear system of equations for pose estimation, using
nonlinear least squares.

For this portion of the assignment, you should submit:
• A function in, which accepts an intrinsic calibration matrix, a set of 2D-3D
correspondences (image-target) and an initial guess for the camera pose, and performs a nonlinear least
squares optimization procedure to compute an updated, optimal estimate of the camera pose.
For testing, you may use the example images included in the assignment archive.

Points for each portion of the assignment will be determined as follows:
• Saddle point function – 12 points (4 tests × 3 points per test)
Each test uses a different (pre-blurred) image patch containing one cross-junction. The estimate of the
(subpixel) position of the saddle point must be within 1.0 pixels of the reference position to pass.

• Cross-junction extraction – 15 points (3 tests × 5 points per test)
Each test uses a different image (of the calibration target); to pass, the extraction function must return
the full set of image plane points (48), and the average position error (over all cross-junctions) must be
less than 3 pixels.

• Jacobian function – 11 points (2 tests; 6 points and 5 points)
The computed Jacobian matrix (for each test) is checked for accuracy—there is an exact solution for
each camera pose and landmark point.

• NLS pose estimation function – 12 points (3 tests × 4 points per test)
There are three tests, each of which uses a different (holdout) image of the calibration target. The
returned, optimized pose solution must be ‘close’ to the reference solution (within 4 cm and 2◦
Total: 50 points
Grading criteria include: correctness and succinctness of the implementation of support functions, proper
overall program operation and code commenting, and a correct composite image output (subject to some
variation). Please note that we will test your code and it must run successfully. Code that is not properly
commented or that looks like ‘spaghetti’ may result in an overall deduction of up to 10%.
4 of 4