Description
1. (25 points) This problem is about constructing a camera matrix and applying it to project
points onto an image plane. The command line of your program should simply be
python p1_camera.py params.txt points.txt
Here params.txt contains parameters that can be used form the 3×4 camera matrix. Specifically, the following ten floating point values will appear in the file on three lines:
rx ry rz
tx ty tz
f d ic jc
Here’s what these mean: Relative to the world coordinate system, the camera is rotated by
rz degrees about the z-axis, then ry degrees about the y-axis, then rx degrees about the
x-axis. Then it is translated by vector (tx,ty,tz) millimeters. The focal length of the lens is
f millimeters, and the pixels are square with d microns on each side. The image is 4000×6000
(rows and columns) and the optical axis pierces the image plane in row ic, column jc. Use
this to form the camera matrix M. In doing so, please explicitly form the three rotations
matrices (see lecture notes) and compose them. Overall on this problem, be very, very careful
about the meaning of each parameter and its units. My results were obtained by converting
length measurements to millimeters.
Please output the 12 terms in the resulting matrix M, with one row per line. All values
should be accurate to 2 decimal places. I have provided two examples and in my example
I’ve also printed R and K – the test cases will not have these.
Next, apply the camera matrix M to determine the image positions of the points in points.txt.
Each line of this file contains three floating points numbers giving the x, y and z values of a
point in the world coordinate system. Compute the image locations of the points and determine if they are inside or outline the image coordinate system. Output six numerical values
on each line: the index of the point (first point has index 0), the x, y and z values that you
input for the point, and the row, col values. Also output on each line, the decision about
whether the point is inside or outside. For example, you might have
0: 45.1 67.1 89.1 => 3001.1 239.1 inside
1: -90.1 291.1 89.1 => -745.7 898.5 outside
1
All floating values should be accurate to just one decimal place.
One thing this problem does not address yet is whether the points are in front of or behind the
camera, and therefore are or not truly visible. Addressing this requires finding the center of
the camera and the direction of the optical axis of the camera. Any point is considered visible
if it is in front of the plane defined by the center of projection (the center of the hypothetical
lens) and the axis direction. As an example to illustrate, in our simple model that we started
with, the center of the camera is at (0, 0, 0) and the direction of the optical axis is the positive
z-axis (direction vector (0, 0, 1)) so any point with z > 0 is visible. To test that you have
solved this, as a final step, print the indices of the points that are and are not visible, with
one line of output for each. For example, you might output
visible: 0 3 5 6
hidden: 1 2 4
If there are no visible values (or no hidden values), the output should be empty after the
word visible:. This will be at the end of your output.
2. (15 points) You are given a series of images (all in one folder) taken of the same scene,
and your problem is to simply determine which image is focused the best. Since defocus
blurring is similar to Gaussian smoothing and we know that Gaussian smoothing reduces the
magnitude of the image’s intensity gradients, our approach is simply to find the image that
has the largest average squared gradient magnitude across all images. This is value is closely
related to what is referred to as the “energy” of the image. More specifically, this is
E(I) = 1
MN
M
X−1
i=0
N
X−1
j=0
∂I
∂x(i, j)
2
+
∂I
∂y (i, j)
2
.
Note that using the squared gradient magnitude is important here. In order to ensure
consistency across our implementations, use the two OpenCV Sobel kernels to compute the
x and y derivatives and then combine into the square gradient magnitude as in the above
equation.
The command-line of your program will be
python p2_best_focus.py image_dir
where image_dir is the path to the directory that contains the images to test. Assume all
images are JPEGs with the extension .jpg (in any combination of capital and small letters).
Sort the image names using the python list sort function. Read the images as grayscale
using the built-in cv2.imread. Then output for each image the average squared gradient
magnitude across all pixels. (On each line output just the name of the image and the average
squared gradient magnitude, accurate to two decimal places.) Finally output the name of the
best-focused image.
3. (40 points) Ordinarily, image resize functions, like the one in OpenCV, treat each pixel
equally — everything gets reduced or increased by the same amount. In 2007, Avidan and
Shamir published a paper called “Seam Carving for Content-Aware Image Resizing” in SIGGRAPH that does the resizing along contours in an image — a “seam” — where there is
not a lot of image content. The technique they described is now a standard feature in image
manipulation software such at Adobe Photoshop.
2
Here is an example of an image with a vertical seam drawn on it in red.
A vertical seam in an image contains one pixel per row, and the pixels on the seam are 8-
connected between rows, meaning that pixels in adjacent rows in a seam differ by at most
one column. Formally a vertical seam in an image with M rows and N columns is the set of
pixels
sr = {(i, c(i))}
M−1
i=0 s.t. ∀, i|c(i) − c(i − 1)| ≤ 1. (1)
In reading this, think of i as the row, and c(i) is the chosen column in each row. Similarly, a
horizontal seam in an image contains one row per column and is defined as the set of pixels
sc = {(r(j), j)}
N−1
j=0 s.t. ∀, j|r(j) − r(j − 1)| ≤ 1. (2)
Here, think of j as the column and r(j) as the row for that column.
Once a seam is selected — suppose for now that it is a vertical seam — the pixels on the
seam are removed from the image, shifting the pixels that are to the right of the seam over
to the left by one. This will create a new image that has M rows and N − 1 columns. (There
are also ways to use this to add pixels to images, but we will not consider this here!) Here is
an example after enough vertical seams have been removed to make the image square.
The major question is how to select a seam to remove from an image. This should be the
seam that has the least “energy”. Energy is defined in our case as the sum of the derivative
3
magnitudes at each pixel:
e[i, j] =|
∂I
∂x(i, j) | + |
∂I
∂y (i, j).
for i ∈ 1 . . . M − 2, j ∈ 1 . . . N − 2. (Note that this is neither the gradient magnitude nor
the sum-squared gradient magnitude.) The minimum vertical seam is defined as the one that
minimizes
M
X−1
i=0
e[i, c(i)]/M
over all possible seams c(·). Finding this seam appears to be a hard task because there is an
exponential number of potential seams.
Fortunately, our old friend (from CSCI 2300) dynamic programming comes to the rescue,
allowing us to find the best seam in linear time in the number of pixels. To realize this, we
need to recursively compute a seam cost function W[i, j] at each pixel, that represents the
minimum cost seam that runs through that pixel. Recursively this is defined as
W[0, j] = e[0, j] ∀j
W[i, j] = e[i, j] + min
W[i − 1, j − 1], W[i − 1, j], W[i − 1, j + 1]
∀i > 0, ∀j
Even if you don’t know dynamic programming, computing M[i, j] is pretty straightforward
(except for a few NumPy tricks — see below).
Once you have M, you have to trace back through it to find the actual seam. This is also
defined recursively. The seam pixels, as defined by the function c(·) from above, are
c(N − 1) = argmin
1≤j≤N−1
W[N − 1, j]
c(i) = argmin
j∈{c(i+1)−1,c(i+1),c(i+1)+1}
W[i, j] for i ∈ N − 2 downto 0
In other words, in the last row, the column with the minimum weight (cost) is the end point
of the seam. From this end point we trace back up the image, one row at a time, and at
each row we choose from the three possible columns that are offset by -1, 0 or +1 from the
just-established seam column in the next row.
A couple of quick notes on this.
• You need to be careful not to allow the seam to reach the leftmost or rightmost column.
The easiest way to do this is to introduce special case handling of columns 0 and N − 1
in each row, assigning an absurdly large weight.
• The trickiest part of this from the NumPy perspective is handling the computation of
the minimum during the calculation of W. While you clearly must explicitly iterate over
the rows (when finding the vertical seam), I don’t want you iterating over the columns.
Instead, use slicing in each row to create a view of the row that is shifted by +1, -1
or 0 and then take the minimum. For example, here is code that determines at each
location in an array, if the value at that is greater than the value at either its left or
right neighbors.
import numpy as np
a = np.random.randint( 0,100, 20 )
print(a)
is_max = np.zeros_like(a, dtype=np.bool)
left = a[ :-2]
right = a[ 2: ]
center = a[ 1:-1 ]
is_max[ 1:-1 ] = (center > right) * (center > left)
is_max[0] = a[0] > a[1]
is_max[-1] = a[-1] > a[-2]
print(“Indices of local maxima in a:”, np.where(is_max)[0])
’’’
Example output:
[93 61 57 56 49 40 51 85 5 13 28 89 31 56 11 10 60 93 26 86]
Indices of local maxima in a: [ 0 7 11 13 17 19]
’’’
Your actual work: Your program should take an image as input and remove enough rows
or columns to make the image square. The command-line should be
python p3_seam_carve.py in_img out_img graph_file
You must compute the sum of the magnitudes of the x and y derivatives of the image produced
by the OpenCV function Sobel. For the 0th, 1st and last seam, please print the index of the
seam (0, 1, …), whether the seam is vertical or horizontal, the starting, middle and end pixel
locations on the seam (e.g. if there are 10 pixels, output pixels 0, 5 and 9), and the average
energy of the seam (accurate to two decimal places). At the end, output the final resized
image to out_img. Finally, output a PyPlot plot to graph_file that shows the seam index
along the horizontal axis and the energy of the seam along the vertical axis.
Written Problems
1. (15 points) Give an algebraic proof that a straight line in the world projects onto a straight
line in the image. In particular
(a) Write the parametric equation of a line in three-space.
(b) Use the simple form of the perspective projection camera from the start of the Lecture 5
notes to project points on the line into the image coordinate system. This will give you
equations for the pixel locations x and y in terms of t.
(c) Combine the two equations to remove t and rearrange the result to show that it is in
fact a line. You should get the implicit form of the line.
(d) Finally, under what circumstances is the line a point? Show this algebraically.
2. (15 points) Evaluate the quality of the results of seam carving on several images. In particular, find images for which the results are good, and find images for which the results are
poor. Show these images before and after applying your code, and include the energy graphs.
Explain the results both intuitively in what you see in the images and more quantitatively in
terms of what you see in the the graphs.
5
3. (10 points, Grad Only!) Suppose you are given three pairs of points in 2d, (xi
, yi)
>,(ui
, vi)
>,
i = 1, 2, 3. If these points are not collineaer there is a unique 2×3 affine transformation matrix
A =
a11 a12 a13
a21 a22 a23
0 0 1
such that
A
xi
yi
1
=
ui
vi
1
.
Derive a matrix equation that gives the parameters of ai,j from the three pairs of points,
showing the details of your derivation. At some point you will end up with an inverse of
either a 3 × 3 or a 6 × 6 matrix. You do not have to explicitly compute the inverse.
4. (2 points) Please estimate the number of hours you spend on this assignment.
6