Description

5/5 - (2 votes)

Nice and short this time!
2.1 The Knapsack Problem
The knapsack problem, like the Travelling Salesperson Problem, cannot be solved by an algorithm on a
silicon computer in less than exponential time (unless P = NP). The problem definition is as follows.
You have a set of n items and each item has a weight and a value. The i-th item has a weight wi and a
value vi
. You also have a knapsack (bag) that can carry a total amount of weight equal to W. The problem
is to find the most valuable set of items that you can fit into the knapsack. In other words, you need to
find the most valuable subset of items whose total weight is less than W.
For the more mathematically inclined, this can be formally stated as:
Maximize
n
∑
i=1
xivi subject to
n
∑
i=1
xiwi < W. where xi is either 0 or 1 (indicating whether or not the i-th item is in the knapsack). We will investigate backtracking and greedy solutions to the knapsack problem in Questions 2 and 3. Page 2 3 Your Tasks Question 1 (30 points): For this problem you will write a method (or methods) to sort an array of strings using the MSD Radix Sort. For purposes of this assignment, you may assume that strings contain only the uppercase letters A through Z. You have been provided with an IntelliJ module RadixSortMSD-Template which includes a short main program that will load a data file containing strings to be sorted. There are several files provided named words-XXXXXX.txt where “XXXXXX” denotes the number of words in the file. The file format starts with the number of words in the file, followed by one word per line. There is also a file words-basictest.txt which is a good set of words to use to determine whether your sort is running correctly. The pseudocode for MSD Radix Sort from the notes is duplicated on the next page for your convenience. Note that we are removing the optimization of sorting short lists with insertion sort, as indicated by the strikethrough text. You may just always recursively radix sort any list with more than one element on it.1 Complete the following tasks: 1. Write your sort method(s) within the RadixSortMSD class. It should accept an array of strings as input, and return nothing. When the method returns, the array that was passed in should be in lexicographic order (i.e. dictionary order). 2. Call your sort at the spot indicated in the main() function. 3. Record in a file called a8q1.txt/doc/pdf the time in milliseconds it takes to sort 50, 100, 500, 1000, 10000, 50000, and 235884 items (there are input files with each of these numbers of words provided). Include this file in your assignment submission. 4. When you hand in RadixSortMSD.java, leave the input file set at words-basictest.txt so that it is easy for the markers to run your program on this input file to see that it works. Assessment There will be marks allotted to the following aspects of your solution: Correctness. As always, the solution has to work! Design and speed of your implementation. Design and speed will be considered together because they influence each other — a poor design choice may result in a slower runtime. Design-wise, any reasonable design will be accepted; marks will be deducted for especially poor choices. Speed-wise, the bar to get full marks here will be fairly low, but if your sort is egregiously slow we will deduct some points. Javadoc and inline commenting. As usual, include a javadoc comment with each header, and document meaningful blocks of code with inline comments to enhance understanding of the code. 1You can, however, use the insertion-sort optimization if you want to, but you will probably have to implement your own insertion sort. If you choose to attempt this optional optimization, you are permitted to use resources from the internet to implement insertion sort, but only for the insertion sort, and only if you properly attribute any code you use to its original author or website. Page 3 Implementation Hints • One of the most important decisions you have to make in your implementation is the choice of data structure to represent the array of lists used in the sortByDigit() helper method. Choose carefully! Your choice will have an impact on the speed of your sort, and the ease of implementing it. When considering which data structure to use, you may select from containers in either lib280 or the standard Java API. Case in point: on my first attempt, I made a bad decision that caused the sort of 10000 words to take several minutes. Now it takes 24ms (on my computer). • Although runtimes will vary from machine to machine, on a decent machine you should be able to sort even the 235884-word input file in less than one second (1000ms). Even on slower machines if it is taking more than a few seconds, then you’ve done something particularly inefficient. • Don’t take the pseudocode below too literally. This is very high-level pseudocode which is intended to describe the operation of the algorithm, but intentionally glosses over a lot of details that become important at implementation-time (that’s what pseudocode is for!). You need to fill in those details as you go. Don’t be afraid to do what you need to do to get the job done, but that said, you should not need to write hundreds of lines of code (if you are, you should seek help and advice from Mark or a TA). Algoirthm MsdRadixSort ( keys , R ) keys - keys to be sorted R - the radix sortByDigit ( keys , R , 0) Algorithm sortByDigit ( keys , R , i ) keys - keys to be sorted R - the radix i - digit on which to partition -- i = 0 is the left - most digit for k = 0 to R -1 list [ k ] = new list // Make a new list for each digit for each key if the i - th digit of the key has value k add the key to list k for k = 0 to R -1 if there is another digit to consider if list[k] is small use an insertion sort to sort the items in list[k] else sortByDigit ( list [ k] , i +1) keys = new list // empty the input list For k = 0 to R -1 keys = keys append list [ k ] Page 4 Question 2 (20 points): This problem deals with backtracking solutions to NP-Complete and NP-hard problems. These are problems that cannot be solved by a silicon computer in less than exponential time, that is, O(k n ) for some k. The travelling salesperson problem (TSP) is an example of such a problem. You have been provided with an IntelliJ project, TSP which contains two backtracking solutions to TSP. The “Naive” solution simply traverses the entire search space and tries every possible solution The other version prunes the search space by detecting non-feasible partial solutions. If a partial solution has a cost that exceeds the cost of the best solution found so far, the search immediately backtracks. This greatly decreases the number of path extensions that are tried. You can see the large difference when you run the program because both algorithms will count the number of extensions of partial solutions that are tried and report them. It is possible to do even better than this, if you include heuristics. The project contains several files of the from graphXX.dat where "XX" denotes the number of nodes in the graph. Try changing the input file in the main program to graphs of different size to see the effect of early detection of non-feasible partial solutions. The number of path extensions will be much lower compared with trying every solution, and that there is a corresponding increase in efficiency. However, you will also note that at a certain point, the graph gets too large for either version to solve the problem in a reasonable amount of time because the savings gained from detecting non-feasible partial solutions is rapidly outpaced by number of possible solutions. Why? Because the number of possible solutions increases exponentially with the size of the graph, but the time savings gained by detection of non-feasible partial solutions increases only sub-exponentially. Try different input files and try to determine the graph size at which the runtime exceeds 10 or 15 minutes. It will probably be smaller than you think! The knapsack is another NP-complete problem that was described in Section 2. You have been provided with a project called Knapsack-template which contains a few things to get you started. In Knapsack.java you will find definitions of: Item class: Item is a non-public class that stores the properties of an item (i.e. it’s weight and value). KnapsackInstance class: The non-public KnapsackInstance class contains all of the information about an instance of the knapsack problem to be solved, including the capacity of the knapsack, the number of items, and the items themselves (stored as an array of Item objects). Knapsack class: The Knapsack class is where you will write code to solve the knapsack problem. Here you will find an already completed method called readKnapsackInstance which can read a data file describing an instance of the knapsack problem and return an instance of KnapsackInstance. Also in the project, you will find several input files named knapsack-XXXX.dat where the "XXXX" denotes the number of items in the problem instance. Each contains the data for a different instance of the knapsack problem. The first line of each file contains the total weight capacity of the knapsack as a floating-point number. The second line of each file contains the number of items n (an integer). The subsequent n lines each contain two floating-point numbers which describe a single item: the value of the item, and the weight the item. There is also a small knapsack-basictest.dat file which you should use first to ensure your solutions work. Now, here are your tasks: 1. Study the backtracking solutions to TSP to gain an understanding of how they work. 2. Write a backtracking solution to the Knapsack problem in the Knapsack class. The approach will be similar to that of the TSP. You need to systematically try every possible solution (i.e. every subset of the set of items) while using optimizations such as detection of non-feasible solutions and heuristics where possible to prune the search space. The solution should take the form of a method that returns the total value of the most valuable set of items that can fit in the Page 5 knapsack. You may, of course, write other helper methods if you deem it appropriate. Since you are implementing a backtracking solution it is expected that you should be able to obtain the optimum solution. The optimum solution for knapsack-basictest.dat is a set of items worth 540.0. Note: you do not have to determine the actual set of items, just the value of the most valuable set of items that will fit in the knapsack. Hints Note that the quality of your solution, how you prune the search space, and choice of heuristics will determine which input files you can process in a few minutes and which you cannot. For example, on my computer, my solution (which is not especially clever) starts choking when the problem instance gets up over 500 items. Depending on your solution and your computer, you may do better or worse. Assessment This question will be assessed on correctness of your implementation. While we will not be assessing your commenting, you should still practice good commenting style to aid the markers in understanding how you designed and implemented your solution. A happy marker is a generous marker! Page 6 Question 3 (25 points): The following is a greedy algorithm for solving the knapsack problem: Given : a set of items with known weight and value , and a knapsack with known capacity . Start with an empty knapsack . while there is at least one item not in the knapsack that will fit add to knapsack the most valuable item that can still fit The idea here is that we repeatedly look at how much weight capacity remains in the knapsack, and then add the most valuable item that has weight less than or equal to the remaining capacity. This algorithm will terminate when the remaining capacity in the bag is smaller than the weights of any remaining items not in the bag. It should be clear that this algorithm can be implemented to run in polynomial time. However, note that this algorithm is not guaranteed to produce the optimal solution. Here are the tasks for this question: 1. Implement the greedy algorithm for the knapsack problem given above. Write your solution in Knapsack.java (yes, Knapsack.java should contain your code for both questions 2 and 3!). Use comments to make sure that your solution to questions 2 and 3 are clearly labeled so that the marker can identify them easily. The solution should be in the form of a method that returns the total value of the items that were packed in the knapsack. Again, you do not have to return the actual set of items in the knapsack, just the sum of the value of the items in the knapsack. By luck, the greedy algorithm manages to find the optimum set of items worth 540.0 when given knapsack-basictest.dat as input. This will not be the case for many of the other given problem instances. 2. For each input file given, compare the result of the greedy algorithm to the result for the backtracking algorithm. In a file called a8q3.txt/doc/pdf: (a) For each input file, record the total value of the items in the knapsack for each algorithm, let’s call these Vbacktrack and Vgreedy. (b) For each input file, record the ratio Vgreedy Vbacktrack . This ratio represents the percentage of the true optimum value that the greedy algorithm was able to achieve. (c) Extrapolate or hypothesize, to the extent possible, on the relative performance of the greedy algorithm vs the backtracking algorithm. Do you think that the greedy algorithm can provide any guaranteed minimum performance relative to the optimum? Will the greedy algorithm always choose items with value at least 30% of the optimum? 50%? 90%? Based on your observations, what do you think? Hints If you want to generate more problem instances to test your hypothesis for part 2(c), you can use the RandomKnapsackInstance class that is provided. At the top of the main file you can change the capacity of the knapsack, the number of items, the maximum weight, and value of an item. Run the program and it will generate a random datafile that describes an instance of the knapsack problem that can be read in by the readKnapsackInstance method of the Knapsack class. Assessment This question will be assessed on correctness of your implementation (tasks 1 and 2(a)), and the observations that you make for task 2(c). Page 7 4 Files Provided lib280-asn8: A copy of lib280 which includes: • solutions to assignment 7; RadixSortMSD-Template: The project template for question 1. TSP: Examples of backtracking solutions to the TSP. Knapsack-Template: The project template for question 2. 5 What to Hand In RadixSortMSD.java: Your completed radix sort from question 1. a8a1.txt/doc/pdf: Your timing observations from your sort in question 1. Knapsack.java: Your backtracking and greedy solutions to the knapsack problem of questions 2 and 3. a8q3.txt/doc/pdf: Your comparison of the greedy and backtracking algorithms and discussion of your observations from Question 3, task 2. Page 8

CMPT 280– Intermediate Data Structures and Algorithms Assignment 8

Description

Related products

CMPT 280– Intermediate Data Structures and Algoirthms Assignment 3

CMPT 280– Assignment 8

CMPT 280– Intermediate Data Structures and Algoirthms Assignment 4