Description
Part I: Theoretical Problems (50 marks)
[Question 1] Laplacian of Gaussian (25 marks)
The Laplacian of Gaussian operator is defined as:
∇2G(x, y, σ) = ∂
2G(x, y, σ)
∂x2
+
∂
2G(x, y, σ)
∂y2
=
1
πσ4
x
2 + y
2
2σ
2
− 1
e
−
x
2+y
2
2σ2
,
where the Gaussian filter G is:
G(x, y, σ) = 1
2πσ2
e
−
x
2+y
2
2σ2
The characteristic scale is defined as the scale that produces the peak value (minimum or
maximum) of the Laplacian response.
1
1. (10 marks) What scale (i.e. what value of σ) maximises the magnitude of the response of the Laplacian filter to an image of a black circle with diameter D on a white
background? Justify your answer.
2. (5 marks) What scale should we use if we want to instead detect a white circle of the
same size on a black background?
3. (10 marks) Experimentally find the value of σ that maximizes the magnitude of the response for a black square of size 100×100 pixels on a sufficiently large white background.
Hint: You can simply implement this task and automatically test for a large set of
samples. You may also want to first generate the samples in log-domain to accurately
locate the optimal value in a large spectrum.
[Question 2] Corner Detection (25 marks)
For corner detection, we defined the Second Moment Matrix as follows:
M =
X
x
X
y
w(x, y)
I
2
x
IxIy
IxIy I
2
y
Let’s denote the 2×2 matrix used in the equation by N; i.e.:
N =
I
2
x
IxIy
IxIy I
2
y
1. (10 marks) Compute the eigenvalues of N denoted by λ1 and λ2?
2. (15 marks) Prove that matrix M is positive semi-definite.
2
Part II: Implementation Tasks (100 marks)
[Question 3] Local Descriptor: Histogram of oriented gradients (60 marks)
The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision
and image processing for the purpose of object detection. The technique counts occurrences
of gradient orientation in localized portions of an image. This method is similar to that of
scale-invariant feature transform (SIFT) descriptors, and shape contexts (a similar technique
we have not seen in class), but differs in the sense that it is computed on a dense grid
of uniformly spaced cells and uses overlapping local contrast normalization for improved
accuracy. Until deep learning, HOG was one of the long-standing top representations for
object detection.
In this assignment, you will implement a variant of HOG. Given an input image, your
algorithm will compute the HOG feature and visualize it as shown in Figure 1 (the line
directions are perpendicular to the gradient to show edge alignment).
Figure 1: HOG features plotted on an example image.
The orientation and magnitude of the red lines represent the gradient components in a
local cell. A HOG descriptor is formed at a specified image location as follows:
1. Compute image gradient magnitudes and directions over the whole image, thresholding
small gradient magnitudes to zero. You should empirically set a reasonable value for
the threshold for each of the input images.
2. Center a cell grid (m × n) on the image. To create this grid cell, assume the grid cells
are square and we have a fixed-size length for each of the cells in this grid; let us call
that size τ . For example, if your image size is 1021 ×975 and τ = 8, then you will have
a grid size of (m = 127) × (n = 121). You can ignore the boundary of the image that
can not be fit into a grid (on either end), i. e., just consider the crop of the image that
fits to the grid perfectly, which is 1016 × 968 in this example.
3. For each cell, form an orientation histogram by quantizing the gradient directions and,
for each such orientation bin, add the (thresholded) gradient magnitudes. This process
3
can be done in two steps: Imagine gradient orientations are discretized by 6 bins:
[−15◦
, 15◦
), [15◦
, 45◦
), [45◦
, 75◦
), [75◦
, 105◦
), [105◦
, 135◦
), [135◦
, 165◦
)
Remember 165◦
is equivalent to −15◦ where the orientation is not directed. Now create a 3D array (m × n × 6) where in element (i, j, k) of this 3D array you will store
the accumulated gradient magnitudes over all the pixels in the cell (i, j) with gradient
orientations corresponding to bin k.
Another approach for constructing the HOG, is to collect the number of occurrences
in each bin, rather than accumulating the magnitudes of occurrences; i.e. in element
(i, j, k) of the histogram, we store the number of pixels in cell (i, j) with gradient orientations corresponding to bin k
Choose reasonable values for the threshold and cell size, and then visualize the resulting
3D arrays (using both approaches) on the given images similar to the quiver plot of Figure 1. Briefly, compare the two approaches by inspecting the visualizations.(15 marks)
Hint: You can use any package/function for creating the visualization in Figure 1.
One way to do that is to superimpose 6 quiver plots (one for each bin), generated by
quiver function in matplotlib package.
For the remaining tasks, you can use either approaches for constructing HOG. Make
sure to explicitly mention your choice in the report.
4. To account for changes in illumination and contrast, the gradient strengths must be
locally normalized, which requires grouping the cells together into larger, spatially
connected blocks (adjacent cells). Given the histogram of oriented gradients, you apply
L2 normalization as follows:
• Build a descriptor for the first block by concatenating the HOG within the block.
You can use block size = 2, i.e., 2 × 2 block will contain 2 × 2 × 6 entries that will
be concatenated to form one long vector.
• Normalize the descriptor as follows:
hˆ
i = p
hi
P
i
h
2
i + e
2
where hi
is the i
th element of the vector and hˆ
i
is the normalized histogram. e is
the normalization constant to prevent division by zero (e.g., e = 0.001).
• Assign the normalized histogram to the first cell of a new histogram array, i.e. cell
(1,1).
• Move to the next block of old histogram array with stride 1 and iterate steps 1-3
above, to compute the next cell of the new histogram array.
4
The resulting new histogram array will have the size of (m − 1) × (n − 1) × 24. Compute
normalized histogram arrays for the provided images, and store them in a single line text file
where the data is stored row by row (first row then second row etc.). Submit both your code
and the files that are generated by your code. Please note that the file should have the same
name as the image (e.g. ‘image.jpg’ → ‘image.txt’). (15 marks)
In addition to the provided images, use your own camera (e.g. smartphone camera) to
capture two images of the same scene, one with flash and one without flash. Convert the
images to gray-scale, and down-sample the images if needed to avoid excessive computation
overhead.
First, compute the original HOG arrays for these two images and visualize them similar
to Figure 1. (5 marks)
Second, compute the normalized histogram arrays for each of these two images, and store
them in two txt files as instructed earlier. (5 marks)
Third, by comparing the results, argue why or why not the normalization of HOG may
be beneficial. Limit your discussion to a paragraph, containing the main points. You can
compare the histograms visually or you may want to define a quantifiable measure to compare
the histograms for pair of with-flash/no-flash images. If you choose to visually compare,
provide the details of your visualization approach for normalized HOG; alternatively, if you
decide to quantitatively compare the histograms, include the function you used and your
justification in the report. (20 marks)
5
[Question 4] Corner Detection (40 marks)
Download two images (I1 and I2) of the Sandford Fleming Building taken under two
different viewing directions:
• https://commons.wikimedia.org/wiki/File:University College, University of Toronto.jpg
• https://commons.wikimedia.org/wiki/File:University College Lawn, University of Toronto, Canada.jpg
1. Calculate the eigenvalues of the Second Moment Matrix (M) for each pixel of I1 and
I2.
2. Show the scatter plot of λ1 and λ2 for all the pixels in I1 (5 marks) and the same
scatter plot for I2 (5 marks). Each point shown at location (x, y) in the scatter plot,
corresponds to a pixel with eigenvalues: λ1 = x and λ2 = y.
3. Based on the scatter plots, pick a threshold for min(λ1, λ2) to detect corners. Illustrate
detected corners on each image using the chosen threshold (10 marks).
4. Constructing matrix M involves the choice of a window function w(x, y). Often a
Gaussian kernel is used. Repeat steps 1, 2, and 3 above, using a significantly different Gaussian kernel (i.e. a different σ) than the one used before. For example,
choose a σ that is significantly (e.g. 5 times, or 10 times) larger than the previous one
(10 marks). Explain how this choice influenced the corner detection in each of the
images (10 marks).
6