Description
Project description
This project will be similar in purpose to Project 1, but will take a different approach and have a different
application. Project 1 solved mazes by searching for any path that led from the starting location to the
ending location, but that path may not have been the shortest path (that is, the one with the least
number of locations). As you recall, this search strategy is called depth-first search, or DFS.
In this project, your program will use a different search algorithm, Breadth-First Search, to find shortest
paths. However, your instead of solving a maze, your program will solve word melt puzzles.
This project makes use of the STL map and set data structures, which may be new to you. This page
describes some of the details of what you will need of these data structures for this project; you may
also want to look at this STL cheatsheet about maps and sets. You should also take a look at the STL
string class, in particular the methods insert(size_t, size_t, char) and erase(size_t, size_t). Note that if
you try to call erase with only one parameter (e.g. s.erase(3)), it will erase from that character to the
end of the string. Probably not what you want.
Word melt puzzles
A word melt puzzle has three components: a start word, a goal word, and a dictionary of valid words.
Starting with the start word, the goal is to repeatedly change the current word by one character (by
changing a letter, inserting a letter, or deleting a letter) so that it is still valid, until you reach the goal
word. No valid word in a word melt puzzle will be less than two characters long (otherwise, deleting one
letter from a 1-letter word would result in an empty string).
Let’s consider an example. If the dictionary contains (only) the words “but, bat, bar, barn, yarn, yearn.” If
the start word is “but,” and the end word is “yearn,” then the following sequence would be a solution:
“but, bat (change u to a), bar (change t to r), barn (insert n at the end), yarn (change b to y), yearn
(insert e before the second letter)”. Here the letters that changed have been emphasized.
We are going to adapt our Maze solver to solve word melt puzzles. The two problems might seem
unrelated, but you’ll find that the only substantial changes are to the Location class (since it now
represents words instead of row/column positions), and to the driver (since it must use a different type
of search). We don’t have to change much because the Location class is a good example of information
hiding in an ADT: the implementation has changed, but the interface remains the same.
By the way, you might be interested that this project is related to something called the Levenshtein
distance, which measures the smallest sequence of changes that can turn one string into another.
While the Levenshtein distance is interesting in its own right, and it has an efficient solution, for this
project we don’t use it because we only consider changes that result in valid words (Levenshtein
distance considers all changes to be valid).
Breadth-first search
In order to find the shortest path (i.e. sequence of words) that solves the puzzle, we will breadth-first
search. In the last project, one stack kept track of the current path. In this project, we will instead use a
queue to keep track of the “frontier” of unknown territory, and an STL map to keep track of all the
shortest paths at the same time.
Breadth-first search (BFS) is a common algorithm in computer science, and it is guaranteed to find a
shortest path in our puzzle. This is different from the depth-first search (DFS) algorithm we used for
project 1. BFS usually uses a FIFO structure (like a queue) to explore broadly, and it never backs up or
reconsiders previous decisions. DFS usually uses a LIFO structure (like a stack) to explore down one
path as far as possible, then it backs up when necessary and makes other choices. Both search
methods have advantages; for example DFS is not guaranteed to find the shortest path, but it can often
be implemented to use less memory than BFS. The basic idea for BFS is this:
• start with the starting location on the queue
• while the queue is not empty and the ending location has not been found, do the following:
• pull one location off the queue
• look at all valid neighbor locations, and if they haven’t been visited, then put them on the queue
Note that if we implement this algorithm with the stack instead of a queue, we get DFS instead of BFS.
That’s pretty amazing!
BFS always finds a shortest path in this puzzle, since we assume that each location change “costs” the
same. We’ll discuss this more later this semester when we learn about graph algorithms. Note that
there might be more than one shortest path, so the order of exploration is important.
Why is it guaranteed to find a shortest path? Because of the queue ordering — FIFO. The BFS
algorithm ensures that first time we pull a location off the queue we must have found the shortest path
to that location. Think about this inductively: the first location pulled off the queue is the starting location
(since it’s the only one initially on the queue), so we have found the shortest path to the starting
location. Then we put on the back of the queue the neighbor locations of the one we just pulled off.
These will be the next things we pull off the front of the queue, and we have found the shortest paths to
those locations, and so on. The key idea is: each time we find a new unvisited location (call it L), the
location that led us to L must be on the shortest path from the starting location to L.
Keeping track of all the shortest paths
For this project, we will not keep a single path from the start location to a current location. Instead, we
will keep an STL map which, given any location L that has been visited, will store which was the
location immediately before L on the shortest path from the start location to L. Then if your program
finds the end location, it can reconstruct the path from the start to the end based on this map. This map
is also useful for knowing if we have already visited some location, so that we don’t visit it twice. This
eliminates the exponential-time search we saw in project 1.
An STL map represents any function — for any input (of a type called the Key type), it can produce an
output (of a type called the Value type, possibly different than the Key type). For example, we can make
a map of string (Key) to int (Value) that is useful for looking up student IDs based on name, so that if we
look up “Bobby Baylor”, we get back 123456789. The STL map has several methods that are useful for
this project:
• operator[](const Key &) — to look up a Key type in the map, and useful for assigning a Value to
associate with a Key, like this: myMap[“Bobby Baylor”] = 123456789;. You can also look up a Value
(if you are sure it’s in the map!): cout << myMap["Bobby Baylor"]; would print 123456789. If you're
not sure the Key is in the map, then you should first use find() — otherwise, the operator[] will
automatially create an entry in the map with the given key, which is probably not what you want!
• find(const Key &) — to find if a given Key is in the map. If so, find returns an iterator pointing at the
Key/Value pair. If not, find returns the end iterator.
• end() — returns the end iterator, which is useful for comparing with the result from find() to know if it
actually found the given Key.
• One more point about map iterators — they are actually STL pairs. An STL pair is a useful structure
for putting together any two types; you can read more about STL pairs here.
Conceptually, the map will contain a tree of back-links from each location L to the previous location on
the shortest path from the start location to L. For example, suppose the dictionary contains the following
words: "red, rad, rod, bed, bad, bod, cod", and the start word is red (it doesn't matter for this example
what the end word is).
Here's a picture of the neighbors, connected by lines:
Note that all of these words have the same length, but that doesn't matter. The neighbor relation applies
to any pair of words that are one letter different (whether that difference comes from changing,
inserting, or deleting a letter).
Then after the puzzle has been completely explored, the map (let's call it m) should have this
information, logically speaking:
• m["bed"] == "red"
• m["rad"] == "red"
• m["bad"] == "bed"
• m["cod"] == "rod"
• …
We could view this information visually as:
Here the arrows indicate the back-links, which is the information kept by the map. The number in
parentheses indicates the order in which each word will be explored in the BFS algorithm. The dotted
lines are just there to show neighbor relationships.
By doing this, we can find the shortest path between the start and the end by starting at the end, finding
its previous location, and keep on going until we reach the start. Of course, we want to print it from start
to end, rather than end to start — there are several ways to handle this.
How could we denote the previous location for the start? As itself! That way every location in the
previous map has an entry (which is important for the next point…).
If we keep the back-links in the STL map up to date as we explore the puzzle's dictionary (building it as
we discover new locations to explore), then we can tell if we have visited a location before by seeing if it
has an entry in the map (using the map's find(const T &) method, which is similar to the set's find
method). Knowing where we've previously been is important so that we don't visit a location more than
once. This fixes the problem with our implementation of project 1, which gave it an exponential running
time!
We will discuss this map of previous locations in detail in class. You should refer to the STL map
documentation for more information on the relevant features of the STL map.
Changes to the Maze and Location classes
The Maze class also now uses an STL set to keep track of validLocations, which is a faster lookup than
searching through an array, since an STL set is implemented as a Red-Black tree, which is a type of
fast binary search tree. You will use the following set methods:
• insert(const T &) — to place a Location object in the validLocations set,
• find(const T &) — returns an iterator that points at the Key you were searching for (which will be the
end iterator if the Key isn't found), and
• end() — which returns the special iterator that points to one past the last valid object in the set.
See the STL set documentation for more information.
The STL set and map are ordered data structures — that means that they need to be able to place the
objects they store in an order that is computable with the less-than operator (operator<). Therefore, we
must implement this operator for the Location class so that we can store them in the STL map and set
containers.
Sample inputs
Several sample inputs are available here. You can get the correct output by capturing the output of the
sample executable on these inputs.
• input 1
• input 2
• input 3
• input 4
Provided code
You must use the .h files provided here. I have provided a lot of code in the 'prof' files. You are not
allowed to change the code in these files. Instead, write your code in and submit the 'student' files. I
structure the templated files this way (split over two files) since all templated code must be in a .h file.
• arrayqueue-prof-proj2.h
• arrayqueue-student-proj2.h
• maze-proj2.h
• location-proj2.h
Remember that when using templates, all of the code goes in the .h file. So you will turn in 4 files for
this project: arrayqueue-student-proj2.h, location-proj2.cpp, maze-proj2.cpp, and your driver (a .cpp
file). Where appropriate, you should #include the 'student' file, and the student file should #include the
corresponding 'prof' file (which it already does).
Sample executables
Here are sample executables for you. When you design test cases, you can judge your output against
the output from my correct solution. Here are my compiled solutions:
• DOS executable
• Linux (Intel) executable
• MacOSX (Universal) executable
If you give a command-line argument to these executables, they will print extra information about how
they are exploring. For example, this will execute the program like normal, redirecting input from a file
called my_input.txt:
% project2_solution_dos.exe < my_input.txt
But here is the mode of operation that will cause the program to print out what it is doing in more detail:
% project2_solution_dos.exe printMore < my_input.txt
The command-line argument doesn't have to be the word "printMore", it can be anything.
Milestones
Since this is a large project, it helps to have a plan of attack. The following milestones should be turned
in via the upload site. I will look at your code and send you an email with notes. For each milestone you
should create and upload a test driver that tests what you have written; in general this should be
submitted as the "driver-projX.cpp" file for project numbered X (even though the test driver is not what
you will ultimately use for the project). You need not upload things that are not part of the current
milestone. Milestones are graded.
Step Finish by Milest
1 Thursday, September 14
by noon
WRITE and TEST the modifications for the Location and Maze
classes.
Final notes
Remember when writing this program to adhere to the coding style guidelines. This is an individual
project, so you must work alone. No credit will be given for a solution which does not pass all the
hidden tests I create, or does not pass in the allowed time. For more detailed instructions, read the
project submission guidelines.
Copyright © 2016 Greg Hamerly.
Computer Science Department
Baylor University
2 Tuesday, September 19
by noon
WRITE and TEST the methods for the ArrayQueue.
3 Thursday, September 21
by noon
WRITE and TEST the driver. Make your own test cases. Aim to
finish early, so you can be sure you will finish the whole project
before the due date.