Lab 1: Image Processing

$30.00

Category: Tags: , , , You will Instantly receive a download link for .zip solution file upon Payment || To Order Original Work Click Custom Order?

Description

5/5 - (3 votes)

2) Introduction
Have you ever watched CSI or another procedural (http://www.imdb.com/title/tt0113243/) investigative
(https://www.youtube.com/watch?v=Vxq9yj2pVWk) fantasy (https://www.youtube.com/watch?
v=1Y2zo0JN2HE) show (http://www.imdb.com/title/tt3560060/)? If so, the words “Enhance this image!” must
sound familiar indeed. Reality is not quite so generous, and images cannot simply be enhanced
(https://6009.csail.mit.edu/fall17/labs/lab1#catsoop_footnote_1). In the real world, we do have a lot of
techniques to analyze images. Photoshop (http://www.photoshop.com/products/photoshop) and GIMP
(http://www.gimp.org/) are rich toolkits of image-manipulation algorithms that give a user great abilities to
evaluate, manipulate, and transform any digital image.
In this lab, you will build a few tools for manipulating images. Interestingly, many classic image filters are
implemented using the same ideas we’ll develop over the course of this lab.
2.1) Digital Image Representation and Color Encoding
Before we can get to manipulating images, we first need a way to represent images in Python
(https://6009.csail.mit.edu/fall17/labs/lab1#catsoop_footnote_2). While digital images can be represented in
myriad ways, the most common has endured the test of time: a rectangular mosaic of pixels — colored dots,
which together make up the image. An image, therefore, can be defined by specifying a width, a height, and an
array of pixels, each of which is a color value. This representation emerged from the early days of analog
television and has survived many technology changes. While individual file formats employ different encodings,
compression, and other tricks, the pixel-array representation remains central to most digital images.
For this lab, we will simplify things a bit by focusing on grayscale images. Each pixel is encoded as a single
integer in the range [0,255] (1 byte could contain 256 different values), 0 being the deepest black, and 255
being the brightest white we can represent. The full range is shown below:
0 255
For this lab, we’ll represent images as instances of the Image class, which has three instance variables:
width : the width of the image (in pixels),
1
2
height : the height of the image (in pixels), and
pixels : a Python list of pixels stored in row-major order (https://en.wikipedia.org/wiki/Row-major_order)
(listing the top row left-to-right, then the next row, and so on)
For example, consider this 2×3 image (enlarged here for clarity):
This image would be encoded as the following instance:
i = Image(2, 3, [0, 50, 50, 100, 100, 255])
2.2) Loading, Saving, and Displaying Images
We have provided three methods near the bottom of the Image class that may be helpful for debugging:
load , save , and show . Each of these functions is explained via a docstring
(https://en.wikipedia.org/wiki/Docstring#Python).
You do not need to dig deeply into the actual code in those methods, but take a look at those docstrings, and
try using the methods to:
load an image,
save it under a different filename, and
display it using show .
There are several example images in the test_images directory inside the lab’s code distribution, or you are
welcome to use images of your own to test, as well.
As you implement the various filters, these functions can provide a way to visualize your output, which can
help with debugging, and they also make it easy to show off your cool results to friends and family.
3) Image Filtering via Per-Pixel Transformations
As our first task in manipulating images, we will look at an inversion filter, which reflects pixels about the middle
gray value ( 0 black becomes 255 white and vice versa). For example, here is a photograph of Adam H’s cat.
On the left side is the original image, and on the right is an inverted version.
Most of the implementation of the inversion filter has been completed for you (it is invoked by calling the
method called inverted ), but some pieces have not been implemented correctly. Your task in this part of the
lab is to fix the implementation of the inversion filter.
Before you do that, however, let’s add a simple test case so that we can test whether our code is working.
Let’s start with a 1×4 image that is defined with the following parameters:
height: 1
width: 4
pixels: [30, 81, 141, 220]
If we were to run this image through a working inversion filter, what would the expected output be? In the box
below, enter a Python list representing the expected pixels key in the resulting image:
3.1) Adding a Test Case
Let’s also add this test case to the lab’s regular tests so that it is run when we execute test.py . If you open
test.py in a text editor, you will see that it is a Python file that makes use of Python’s unittest module
(https://docs.python.org/3/library/unittest.html) for unit testing.
Each class in this file serves as a collection of test cases, and each method within a class represents a
particular test case.
Running test.py will cause Python to run and report on all of the tests in the file. However, you can make
Python run only a subset of the tests by running, for example:
$ python3 test.py TestImage
This will run all the tests under the TestImage class. If you run this command, you should get a brief report
indicating that the lone test case passed.
If you want to be even more specific, you can tell unittest to run only a single test case, for example:
$ python3 test.py TestInvert.test_invert_1
(Note that this test case should fail right now because the given implementation of the inversion filter has
bugs!)
To add a test case, you can add a new method to one of the classes. Importantly, for it to be recognized as a
test case, its name must begin with the word test
(https://6009.csail.mit.edu/fall17/labs/lab1#catsoop_footnote_3).
In the skeleton you were given, there is a trivial test case called test_invert_2 in the TestInvert group.
Modify this method so that implements the test from above. Within that test case, you can define the expected
result as an instance of the Image class, and you can compare it against the result from calling the inverted
method of the original image.
For now, we should also expect this test case to fail, but we can expect it to pass once all the bugs in the
inversion filter have been fixed.
Throughout the lab, you are welcome to (and may find it useful to) add your own test cases for other parts of
the code, as you are debugging lab.py and any extensions or utility functions you write.
3.2) Debugging
3
Now, work through the process of finding and fixing the errors in the code for the inversion filter. Happy
debugging! When you are done and your code passes all the tests in the TestInvert test group (including,
especially, the one you just created), run your inversion filter on the test_images/bluegill.png image, save
the result as a PNG image, and upload it below (choose the appropriate file and click `Submit’). If your image is
correct, you will see a green check mark; if not, you will see a red X.
Inverted bluegill.png :
Select File No file selected
4) Image Filtering via Correlation
Next, we’ll explore some slightly more advanced image processing techniques involving correlation of images
with various kernels.
Given an input image I and a kernel k, applying k to I yields a new image O (perhaps with non-integer, out-ofrange pixels), equal in height and width to I, the pixels of which are calculated according to the rules described
by k.
The process of applying a kernel k to an image I is performed as a correlation: the pixel at position (x, y) in
the output image, which we’ll denote as O (with O being the upper-left corner), is expressed as a linear
combination of the pixels around position (x, y) in the input image, where the weights are given by the kernel
k.
As an example, let’s start by considering a 3×3 kernel:
When we apply this kernel to an image I, each output pixel O is a linear combination of the 9 pixels nearest
to (x, y) in I, where each input pixel’s value is multiplied by the associated value in the kernel:
In particular, for a 3×3 kernel k, we have:
Consider one step of correlating an image with the following kernel:
0.00 -0.07 0.00
-0.45 1.20 -0.25
0.00 -0.12 0.00
Here is a portion of a sample image, with the specific luminosities for some pixels given:
x,y 0,0
x,y
Ox,y =Ix−1,y−1 × k0,0 + Ix,y−1 × k1,0 + Ix+1,y−1 × k2,0+
Ix−1,y × k0,1 + Ix,y × k1,1 + Ix+1,y × k2,1+
Ix−1,y+1 × k0,2 + Ix,y+1 × k1,2 + Ix+1,y+1 × k2,2
What will be the value of the pixel in the output image at the location indicated by the red highlight? Enter a
single number in the box below. Note that at this point we have not yet rounded or clipped the value.
4.1) Edge Effects
When computing the pixels at the perimeter of O, fewer than 9 input pixels are available. For a specific
example, consider the top left pixel at position (0, 0). In this case, all of the pixels to the top and left of (0, 0)
are out of bounds. One option we have for dealing with these edge effects is to treat every out-of-bounds pixel
as having a value of 0. However, this can lead to unfortunate artifacts on the edges of our images.
When implementing correlation, we will instead consider these out-of-bounds pixels in terms of an extended
version of the input image. Values to the left of the image should be considered to have the values from
column 1, values to the top of the image should be considered to have the values from row 1, etc., as
illustrated in the following diagram (https://6009.csail.mit.edu/fall17/labs/lab1#catsoop_footnote_4) (note,
however, that the image should be extended in all four directions, not just to the upper-left):
To accomplish this, you may wish to implement an alternative to get_pixel , which returns pixel values from
within the image normally, but which handles out-of-bounds pixels by returning appropriate values as
discussed above rather than raising an exception. Your correlation code, then, will not have to worry itself
about whether pixels are in-bounds or not.
4.2) Example Kernels
There is a world of interesting operations that can be expressed as image kernels (some examples can be
seen below), and many scientific programs also use this pattern, so feel free to experiment.
Note that the output of a correlation need not be a legal 6.009 image (pixels may be outside the [0,255]
range or may be floats).
So the final step in every image-processing function is to clip negative pixel values to 0 and values > 255 to
255, and to ensure that all values in the image are integers. Hint: you may wish to implement a function to
perform this operation on an image.
4
4.2.1) Identity
0 0 0
0 1 0
0 0 0
The above kernel represents an identity transformation: applying it to an image yields the input image,
unchanged.
4.2.2) Translation
0 0 0 0 0
0 0 0 0 0
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
The above kernel shifts the input image two pixels to the right, discards the rightmost two columns of pixels,
and duplicates the leftmost two columns.
4.2.3) Average
0.0 0.2 0.0
0.2 0.2 0.2
0.0 0.2 0.0
The above kernel results in an output image, each pixel of which is the average of the 5 nearest pixels of the
input.
4.3) Correlation
If you have not done so already, you will want at this point to write a function to handle these correlations in a
general sense (for an arbitrary image and an arbibtrary kernel). It is up to you to choose how to represent a
kernel within your code.
(To help with debugging, you may wish to write a few test cases correlating test_images/centered_pixel.png
or another simple image with a few kernels.)
You can use the above kernels to help test that your code produces the expected results. You may wish to first
write your code specifically for 3-by-3 kernels, and then to generalize to kernels of arbitrary size (however, you
can assume that kernels will always be square and that they will have an odd number of rows and columns).
When you have implemented your code, try running it on test_images/pigbird.png with the following 9×9
kernel:
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
Save the result as a PNG image and upload it below:
Result:
Select File No file selected
5) Blurring and Sharpening
5.1) Blurring
For this part of the lab, we will implement a simple box blur (https://en.wikipedia.org/wiki/Box_blur), which can
be implemented by feeding an appropriate kernel to our correlation function.
For a box blur, the kernel is an n × n square of identical values that sum to 1. Because you may be asked to
experiment with different values of n, you may wish to define a function that takes a single argument n and
returns an n -by- n box blur kernel.
Implement a method blurred(n) that takes in a number n , and returns a new image with height and width
equal to the dimensions of the input image that has been blurred with an appropriately-sized box blur kernel.
Take care to output integer brightness values ( int(round(value)) , for example) in range [0, 255] . That is,
in many cases you will need to clip integer values explicitly to this range.
In addition, create two additional test cases under the TestFilters class (you will be expected to demonstrate
and discuss these test cases during a checkoff):
Box blurs of a 6-by-5 image consisting entirely of black pixels, with two different kernel sizes. The output
is trivially identical to the input.
Box blurs for the centered_pixel.png image with two different kernel sizes. You can compute the
output image manually with relative ease.
When you are done and your code passes all the blur-related tests (including the ones you just created), run
your blur filter on the test_images/cat.png image with a box blur kernel of size 5, save the result as a PNG
image, and upload it below:
Blurred cat.png :
Select File No file selected
5.2) Sharpening
Now we’ll come as close as we can to the “Enhance!” operation in this lab. We’ll do this by implementing a
sharpen filter.
The “sharpen” operation often goes by another name which is more suggestive of what it means: it is often
called an unsharp mask because it results from subtracting an “unsharp” (blurred) version of the image from a
scaled version of the original image.
More specifically, if we have an image (I) and a blurred version of that same image (B), the value of the
sharpened image S at a particular location is:
S = int round 2I − B
One way we could implement this operation is by computing a blurred version of the image, and then, for each
pixel, computing the value given by the equation above.
However, it is actually possible to perform this operation with a single correlation (with an appropriate kernel).
Check Yourself:
x,y ( ( x,y x,y ))
If we want to use a blurred version B that was made with a 3×3 blur kernel, what kernel k could we use to
compute the entire sharpened image with a single correlation?
Implement the unsharp mask as a method sharpened(n) , where n denotes the size of the blur kernel that
should be used to generate the blurred copy of the image. You are welcome to implement this as a single
correlation or using an explicit subtraction, though if you use an explicit subtraction, make sure that you do not
do any rounding until the end (the intermediate blurred version should not be rounded or clipped in any way).
When you are done and your code passes the tests related to sharpen, run your sharpen filter on the
test_images/python.png image with a box blur kernel of size 11, save the result as a PNG image, and upload
it below:
Sharpened python.png :
Select File No file selected
6) Edge Detection
For this part of the lab, we will implement a Sobel operator (https://en.wikipedia.org/wiki/Sobel_operator),
which is useful for detecting edges in images.
This edge detector is more complex than a simple image kernel but is a combination of two image kernels Kx
and Ky.
Kx:
-1 0 1
-2 0 2
-1 0 1
Ky:
-1 -2 -1
0 0 0
1 2 1
After computing Ox and Oy by correlating the input with Kx and Ky respectively, each pixel of the output is
the square root of the sum of squares of corresponding pixels in Ox and Oy:
O = int round
Again, take care to ensure the final image is made up of integer pixels in range [0,255] . But only clip the
output after combining Ox and Oy. If you clip the intermediate results, the combining calculation will be
incorrect.
Check Yourself:
What does each of the above kernels, on its own, do? Try running show on the results of those intermediate
correlations to get a sense of what is happening here.
Implement the edge detector as a method edges() within the Image class. It should take no arguments, and
it should return a new instance of Image resulting from the above operations.
Create a new test case: edge detection on the centered_pixel.png image. The correct result is a white ring
around the center pixel that is 1 pixel wide.
x,y ( (√Ox + Oy x,y
2
x,y
2
))
When you are done and your code passes the edge detection tests, run your edge detector on the
test_images/construct.png image, save the result as a PNG image, and upload it below:
Edges of construct.png :
Select File No file selected
7) Content-Aware Rescaling
As our last exercise in this lab, we will implement a version of content-aware resizing of images (often referred
to as retargeting). The goal of this technique is to scale down an image while preserving the perceptually
important parts (i.e., removing background but preserving subjects).
Two common approaches for resizing an image are cropping and scaling. However, in certain situations, both
of these methods can lead to undesirable results. The animation below shows these various kinds of resizing
in action on an image of two cats (https://6009.csail.mit.edu/fall17/labs/lab1#catsoop_footnote_5), which is
gradually scaled down from 300 pixels wide to 150 pixels wide. Cropping is shown on the first row; naive
scaling on the second row; and a form of retargeting on the third row.
The first two strategies have shortcomings: cropping causes one of the cats to be almost completely deleted,
and scaling distorts both cats. By focusing on removing “low-energy” portions of the image, however, the seam
carving technique manages to keep both cats almost completely intact while removing mostly background.
5
Here is another example, on a picture of several trees. Notice again that the trees are preserved almost
exactly despite decreasing the width of the image by 75 pixels.
These images were generated using a technique called seam carving. We’ll implement a variant of this
algorithm in this section (and a full implementation is described as an optional extension in subsection 10.2
(https://6009.csail.mit.edu/fall17/labs/lab1#catsoop_section_10_2)).
7.1) Algorithm
While normal cropping of images works from removing columns of pixels from the left-and-right sides of the
image indiscriminantly, this approach works by instead removing columns of pixels with low ‘energy’ (i.e.,
where not much is changing in the image).
Each time we want to decrease the horizontal size of the image by one pixel, we start by finding the column
that has the minimum total energy, and removing the pixels contained therein. To shrink the image further, we
can apply this process repeatedly.
Here, we will define the ‘energy’ based on the edge detection algorithm from earlier: the energy of any given
pixel will be the associated value in the output from edge detection; and the energy of a path is the sum total of
the energies of the pixels it contains.
This can be implemented by repeating the following three steps until the width of the image reaches the
desired size:
1. Compute energy map. Because we are using our edge detector as our ‘energy’ function, this should
simply involve calling the edges method of the image.
2. Find the minimum energy column by summing all the energies in each column and picking the
minimum.
3. Remove the computed path. Finally, we need to remove the pixels in the chosen column from the
image.
If we need to remove more columns, we can repeat all three steps of this process (including recomputing the
energy map).
7.2) Implementation
Now it’s time to implement the retargeting algorithm.
You are welcome to implement this using any name you prefer, and in any location in the file. It might make
sense, also, to define some “helper” functions (perhaps corresponding to the steps above) rather than
implementing everything in one function.
You will be expected to discuss your implementation with a staff member during a checkoff.
When you are done, run your retargeting code on the test_images/pattern.png image to remove the two
lowest-energy columns. Save the result as a PNG image, and upload it below:
Resized pattern.png :
Select File No file selected
Also run your code to do the following:
Remove 75 columns from tree.png
Remove 100 columns from twocats.png
Remove 100 columns from pigbird.png
Save the resulting images and be ready to discuss them during your checkoff. Note that these tests may take a
long time (several minutes) to run.
Note that you may notice some artifacts in these images (particularly in the sky in tree.png ). If you are
interested, check out the optional extension in subsection 10.2
(https://6009.csail.mit.edu/fall17/labs/lab1#catsoop_section_10_2) to improve on this.
8) Code Submission
When you have tested your code sufficiently on your own machine, submit your modified lab.py by clicking
on “Choose File”, then clicking the Submit button to send the file to the 6.009 server. The server will run the
tests and report back the results below.
Select File No file selected
9) Checkoff
Once you are finished with the code, please come to a tutorial, lab session, or office hour and add yourself to
the queue asking for a checkoff. You must be ready to discuss your code and test cases in detail before
asking for a checkoff.
You should be prepared to demonstrate your code (which should be well-commented, should avoid repetition,
and should make good use of helper functions). In particular, be prepared to discuss:
Your test case for inversion.
Your implementation of correlation.
Your test cases for blurring.
Your implementation of the unsharp mask.
Your implementation of content-aware resizing (including showing your results on the suggested
images).
9.1) Grade
You have not yet received this checkoff. When you have completed this checkoff, you will see a grade here.
10) Optional Additional Behaviors
If you have finished early, there are a lot of interesting image filters to tackle, and we encourage you to explore
(sorry, no additional points though!). Some ideas you may wish to consider:
10.1) Additional Filters
Implement a normalization filter that will scale pixel values linearly such that the darkest pixel in the
output has a value of 0 , and the brightest has a value of 255 .
Add capability for a Gaussian blur (https://en.wikipedia.org/wiki/Gaussian_blur) with arbitrary σ value.
Create a more complex filter with position-dependent effects (such as ripple or vignette). If you’ve
studied linear algebra (and remember it!), try rotating the image.
10.2) Seam Carving
If you take a look at your result of using our original retargeting code to resize tree.png down by 75 pixels,
you’ll notice some artifacts in the skyline. It is possible to produce much-improved results by using a variation
of this technique called seam carving (https://www.youtube.com/watch?v=6NcIJXTlugc).
You won’t be awarded any additional points for implementing this technique, but it is so cool!
This technique is very similar to the one we used earlier, except that instead of removing the lowest-energy
column in an image, we instead find and remove connected ‘paths’ of pixels from the top to the bottom of the
image, with one pixel in each row. Each time we want to decrease the horizontal size of the image by one
pixel, we start by finding the connected path from top to bottom that has the minimum total energy, and
removing the pixels contained therein. To shrink the image further, we can apply this process repeatedly.
As above, we will define the ‘energy’ based on the edge detection algorithm from earlier: the energy of any
given pixel will be the associated value in the output from edge detection; and the energy of a path is the sum
total of the energies of the pixels it contains.
The goal of finding the minimum-energy path is accomplished by an algorithm which is outlined briefly below,
and which is described in detail on the Wikipedia page for seam carving
(https://en.wikipedia.org/wiki/Seam_carving).
The algorithm works by repeating the following three steps until the width of the image reaches the desired
size:
1. Compute energy map. Because we are using our edge detector as our ‘energy’ function, this should
simply involve calling the edges method of the image.
2. Find the minimum-cost path from top to bottom. We can start by creating a “cumulative energy
map”, where the value at each position is the total cost of the least-cost path from the top of the image to
that pixel.
For the top row, this will be equal to the energy map’s top row. Subsequent rows can be computed using
the following algorithm (in pseudocode):
For each row of pixels in the image:
For each pixel of a row:
Find the minimum-cost path to reach that pixel by taking the
minimum of the costs to reach each of the three possible
preceding pixels and adding that to the cost of the current
pixel. The updated cost of the current pixel now reflects the
total cost of the minimum-cost path to reach that pixel.
The minimum seam is then calculated by backtracing from the bottom to the top edge. First, the
minimum value pixel in the bottom row of the accumulated cost map is located. This is the bottom pixel
of the minimum seam. The seam is then traced back up to the top row of the accumulated cost map by
following the neighboring pixels with the smallest cumulative costs.
3. Remove the computed path. Finally, we need to remove the pixels in the computed seam from the
image.
\
/ /\__/\
\__=( o_O )=
(__________)
|_ |_ |_ |_
Powered by CAT-SOOP (https://catsoop.mit.edu/) (development version).
CAT-SOOP is free/libre software (http://www.fsf.org/about/what-is-free-software), available under the terms
of the GNU Affero General Public License, version 3 (https://6009.csail.mit.edu/cs_util/license).
(Download Source Code (https://6009.csail.mit.edu/cs_util/source.zip?course=fall17))
Try running your code on pigbird.png , twocats.png , and construct.png , removing 100 seams from each.
10.3) Colors
Extend your code to work with color images. This can be accomplished by loading images from pillow in
mode “RGB” instead of mode “L” . In this mode, pixels will be a list of (r, g, b) tuples.
Most of the filters described here can be implemented for color images by applying the black-and-white
versions to the red, green, and blue channels separately, and then re-combining. The notable exception is that
edge detection should be based on luminosity, which can be computed as:
L = R + G + B
Footnotes
Though there is some work
(http://people.csail.mit.edu/billf/project%20pages/sresCode/Markov%20Random%20Fields%20for%20SuperResolution.html) to do the next best thing, some of it done right here at MIT (click to return to text)
(https://6009.csail.mit.edu/fall17/labs/lab1#catsoop_footnote_ref_1)
It turns out that the representation of digital images and colors as data is a rich and interesting topic. Have
you ever noticed, for example, that your photos of purple flowers are unable to capture the vibrant purple you
see in real life? This (https://en.wikipedia.org/wiki/CIE_1931_color_space) is the reason why. (click to return to
text) (https://6009.csail.mit.edu/fall17/labs/lab1#catsoop_footnote_ref_2)
You can also create new test groups by creating a new class whose name begins with Test . (click to return
to text) (https://6009.csail.mit.edu/fall17/labs/lab1#catsoop_footnote_ref_3)
This image, which is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License
(https://creativecommons.org/licenses/by-sa/3.0/deed.en), was created by Michael Plotke, and was obtained
via Wikimedia Commons (https://commons.wikimedia.org/wiki/File%3AExtend_Edge-Handling.png). (click to
return to text) (https://6009.csail.mit.edu/fall17/labs/lab1#catsoop_footnote_ref_4)
This animation was created by Adam Hartz and is licensed under a CC BY SA 4.0 International License
(https://creativecommons.org/licenses/by-sa/4.0/). The original image of two cats, which is licensed under a
CC BY 2.0 License (https://creativecommons.org/licenses/by/2.0/), comes from wellflat
(https://www.flickr.com/photos/95962563@N02/26830567365/) on Flickr. (click to return to text)
(https://6009.csail.mit.edu/fall17/labs/lab1#catsoop_footnote_ref_5)
1000
299
1000
587
1000
114
1
2
3
4
5