## Description

## Instructions

• This assignment is designed to get you familiar with stereo reconstruction, and non-linear optimization.

• Code can be written in Python or C++ only. OpenCV or any equivalent library can be used for image

input & output, and where ever specified. Make sure your code is modular since you might be reusing

them for future assignments.

• Submit your code files and a report (or as a single Jupyter notebook if you wish) as a zip file, named as

〈team id〉 assignment3.zip.

• The report should include all outputs and results and a description of your approach. It should also briefly

describe what each member worked on. The scoring will be primarily based on the report.

• Refer to the late days policy and plagiarism policy in the course information document.

• Start early! This assignment may take fairly long.

## Q1) Stereo dense reconstruction

3D point clouds are very useful in robotics for several tasks such as object detection, motion estimation

(3D-3D matching or 3D-2D matching), SLAM, and other forms of scene understanding. Stereo cameras

provide us with a convenient way to generate dense point clouds. Dense here, in contrast to sparse,

means all the image points are used for the reconstruction. In this part of the assignment you will be

generating a dense 3D point cloud reconstruction of a scene from stereo images as shown.

Figure 1: A reference reconstruction. The empty spaces are regions that could not be matched properly

or that have very little disparity. Courtesy: Dellaert et al.

Download the data from [here]. It includes a set of rectified and synchronized stereo image pairs from

a KITTI sequence, the calibration data, and the absolute ground truth poses of each stereo pair. The

procedure is as follows,

• Generate a disparity map for each stereo pair. Use OpenCV (e.g. StereoSGBM) for this. Note

that the images provided are already rectified and undistorted.

• Then, using the camera parameters and baseline information generate colored point clouds from

each disparity map. Some points will have invalid disparity values, so ignore them. Use [Open3D]

for storing your point clouds.

• Register (or transform) all the generated point clouds into your world frame by using the provided

ground truth poses.

• Visualize the registered point cloud data, in color. Use Open3D for this.

Report your observations with a screenshot of your reconstruction. [Bonus] Compare different stereo

matching algorithms.

## Q2) Motion estimation using iterative PnP

Using the generated reconstruction from the previous part, synthesize a new image taken by a virtual

monocular camera fixed at any arbitrary position and orientation. Your task in this part is to recover

this pose using an iterative Perspective-from-n-Points (PnP) algorithm.

The steps are as follows,

• Obtain a set of 2D-3D correspondences between the the image and the point cloud. Since here

you’re generating the image, this should be easy to obtain.

• For this set of correspondences compute the total reprojection error c =

P

i

kxi − PkXik

2 where

Pk = K[Rk|tk], Xi

is the 3D point in the world frame, xi

is its corresponding projection.

• Solve for the pose Tk that minimizes this non-linear reprojection error using a Gauss-Newton (GN)

scheme. Recall that in GN we start with some initial estimated value xo and iteratively refine the

estimate using x1 = ∆x+x0, where ∆x is obtained by solving the normal equations JT J∆x = −J

T

e,

until convergence.

The main steps in this scheme are computing the corresponding Jacobians and updating the estimates correctly. For our problem, use a 12 × 1 vector parameterization for Tk (the top 3 × 4

submatrix). Run the optimization for different choices of initialization and report your observations.

• [Bonus++] Solve the same optimization problem but using a smaller 1 × 6 se(3) parameterization

for the pose (ξ). The result in this case should theoretically be better. The Jacobian here would

be,

∂ei

∂ξk

=

”

fx(

1

z

) 0 −fx(

x

z

2 ) −fx(

xy

z

2 ) fx(1 + x

2

z

2 ) −fx(

y

z

)

0 fy(

1

z

) −fy(

y

z

2 ) −fy(1 + y

2

z

2 ) fy(

xy

z

2 ) fy(

x

z

)

#

where ei = xi − PkXi

.

The update step here would not be just ξk+1 = ξk + ∆ξ but ξk+1 =

log(exp(∆ξ).exp(ξk)). More details are available [here].