Description
Introduction It’s a sad fact of life: computers crash. One minute everything is working just fine, and the next minute everything that was in volatile memory is gone forever. To deal with this problem on your personal machine, you’ve probably gotten into the habit of saving frequently. In this project, you will implement an algorithm that database systems commonly use to address the same problem without slowing down the system too much. Some Basics of Transactions 1 A transaction is any one execution of a user program in a database management system (DBMS). For example, suppose Ada and Bob both have accounts at the same bank. Ada logs onto the bank’s website and instructs the bank to make a payment of $100 from her account into Bob’s account. If the bank’s computer runs normally, it will deduct $100 from Ada’s account balance, add $100 to Bob’s account, and trigger emails to Ada and Bob confirming the transfer. Those steps—deducting, adding, and confirming that it was done—make up one transaction. But suppose that the moment after the bank’s computer deducts the money from Ada’s account balance, it crashes. When it reboots, the instructions about that $100 that were in its memory are gone. We would like to ensure that if this happens, the $100 has not simply vanished—it is back in Ada’s account, or it is in Bob’s account. Either the whole transaction goes through, or the part of the transaction that happened before the crash gets undone. Another way of saying this is that we want a transaction to be atomic. 1 Transactions are covered in lecture. Check out chapter 16 of the textbook for more in-depth analysis. 1 Suppose the timing of the crash was a little different. The bank’s computer read Ada’s balance into memory and modified it there, read Bob’s balance into memory and modified it there, and triggered the emails. Then, just as it was about to write the updated balances back onto disk, it crashed. We want to make sure that if the system says a transaction completed successfully, the changes the transaction made will persist even through a crash. We call this durability. (In lecture, you also learn about two other properties that transactions should have, called consistency and isolation. These have to do with concurrency and are important to understand, but you don’t need to worry about them for this project.) To ensure atomicity and durability while minimizing the harm to performance, DBMSs use write-ahead logging and with a recovery algorithm, such as ARIES. In this project, you will be implementing a single-threaded version of ARIES. What you need to do: Read. Read sections 16.7 through 16.7.2 and 18.1 through 18.6 in your textbook. This is important. The rest of this project will not make sense until you have understood this material. This will also greatly help for the project. (Since the textbook is required, we assume you already own the textbook. The book can also be checked out for two hours at a time from the Art Architecture & Engineering library’s course reserve desk on the second floor. Older editions of the book Database Management Systems by Raghu Ramakrishnan (1998 and 2000) also have most of the same material — though the chapter numbering is different. Older editions may be available cheaply at Amazon.) Download. Download the recovery simulator from Canvas and unzip it. You will see that the directory contains directories called StorageEngine, StudentComponent, output, correct, and testcases. The section on Understanding the Simulator below explains what these pieces do. You’ll also want to spend some time familiarizing yourself with the header files. Implement. Implement ARIES in LogMgr.cpp. You will just be submitting this file to the autograder. Do not alter the other files in any way. The autograder will compile and run your LogMgr.cpp with unaltered copies of all of the other files, so it is in your best interest to test with exactly the same files. Please do not add any other header files; if you need helper functions, you may declare them and define them in LogMgr.cpp. You may use the C++11 STL, except that you may not use do file I/O or interact with the network. The autograder may reject such submissions and your submission will still count. 2 Submit. An autograder will be available by 11/9/2015 (or possibly earlier) at https://grader484.eecs.umich.edu. Submit your LogMgr.cpp file to the autograder. You may submit up to four times per day with autograder feedback. Please take advantage of this, and start submitting early and often. We will keep your best score from various submissions. If you submit before the regular deadline and also during the late days, we will take the best of your scores, max(regular score without late penalty, late score with late penalty). So, there is no disincentive of submitting late. If there is a bug in the autograder, we reserve the right to regrade your past submissions, which may change your best score. In that case, we will notify you of the change as soon as feasible. Ultimately, you are also responsible for testing your LogMgr.cpp with your own test cases. Grading Your grade will be based on your code’s performance on the autograder test cases. We will diff your log file and your recovered db file against our known correct files for each test case. There are hidden test cases on the autograder that are different from the public test cases we have provided you. Passing the public test cases means that you are on the right path, but doesn’t ensure full marks on the private test case. The autograder test cases may also cover corner cases the public test cases do not. Thus, make sure you think carefully about all the cases that the algorithm needs to handle. However, your submission will not be checked against test cases not included in autograder. Hence, the final grade on autograder is your grade. Understanding the Simulator The simulator has two main pieces: the StorageEngine and the LogMgr. The StorageEngine handles all disk reads and writes. The LogMgr handles logging and recovery. If the LogMgr needs to write something to disk, it must go through the StorageEngine. Your implementation of LogMgr.cpp must not include any static variables or functions, and it must not write directly to disk. If LogMgr wants to write to disk, it must use StorageEngine’s public methods (interface). The StorageEngine keeps track of Pages. Each Page has a page_id, a pageLSN, a dirty bit, and a string of data. All Pages are kept on disk. If a page needs to be read or written, 2 StorageEngine adds it to the in-memory page buffer. If the page buffer is full, the StorageEngine chooses a page to flush to disk, informs the LogMgr of the flush through LogMgr’s pageFlushed 2 In a real database, of course, pages contain records, but here we’ve abstracted them away, so you don’t need to worry about them. 3 function (Hint: that probably means LogMgr should do something when the page is flushed), and flushes it, freeing up memory for the next page. StorageEngine’s constructor declares the MEMORY_SIZE that it uses as its buffer pool: StorageEngine::StorageEngine() : MEMORY_SIZE(10) { page_writes_permitted = 0; } Thus, in the above, it has 10 pages. If it ever needs to bring 11th page to memory, it selects a victim page and calls LogMgr:flushPage(page_id). See the StorageEngine:findPage(int page_id) method. For more thorough testing, you may want to change 10 to a smaller or larger value (no other change to StorageEngine should be required and you are not submitting StorageEngine.cpp). That will cause more stealing or less stealing, respectively. It will also change the output disk and log and you will have to manually check whether the resulting disk and log is correct. Sometimes LogMgr will need to write Pages; for example, recovering from a crash and aborting a transaction both require LogMgr to alter Pages. When this is necessary, LogMgr can call StorageEngine::pageWrite. You should always try to minimize the number of Pages LogMgr writes. To force you to do this, StorageEngine limits the number of pageWrites you are permitted. If you try to exceed the limit, pageWrites will stop writing pages and return false. If 3 this happens, LogMgr should stop what it is doing. The LogMgr also needs to keep the log. A log is made up of LogRecords of various types. You can keep the most recent part of the log (called the log tail) in memory, but sometimes you will need to write the log to disk. LogMgr can call a LogRecord’s toString method to transform the LogRecord into a string, and then pass a string to StorageEngine::updateLog to append a string to the log on disk. The log on disk will have one record per line; you can append multi-line strings to it if you want to add more than one record at once. If LogMgr wants to read old log entries, it can call StorageEngine::getLog(), which will return a (multi-line) string. LogRecord::stringToRecordPtr can parse a line of that string and give you a pointer to a LogRecord of that line. StorageEngine provides a few other utilities LogMgr might find useful: ● StorageEngine::nextLSN provides a unique id number assigned in monotonically increasing order ● StorageEngine::store_master and StorageEngine::get_master write an int to stable storage and retrieve it. ● StorageEngine::getLSN(int page_id) returns the page LSN of the specified page. 3 The limit at any given time is set by the testcase. Sometimes we give you enough page writes to finish an operation. Sometimes, to simulate a crash in the middle of an operation, you will run out of pageWrites early, and the next event in the testcase will be a crash. 4 In the StorageEngine directory, you will also find main.cpp. Main takes the relative path to a testcase as a command line argument. For example, once you have compiled the whole project, you might call ./main testcases/test00 from the directory to run testcase 0. Main calls the runTestcase function, which will read in the testcase line by line and run a simulation. Test Cases Public testcases are given to you in the testcases folder. The first line of the testcase specifies a “database” file (db) to use for this run. For this simulation, we are using an oversimplified format, so that you can see the effects of writing to pages: a database file is a text file, and each line of text represents a page of data on disk. If the first line of a database file looks like this: -1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx it means that the page with page_id 1 (because it is the first line of the file) has pageLSN -1, and the data currently written to that page is 50 consecutive x’s. The remaining lines of the testcase are instructions to write, abort, commit, make a checkpoint, crash, or end the simulation. Each of these is described in more detail below, with examples. ● write (e.g., 1 write 34 27 ABC ): An instruction from a particular transaction (1) to write an update (ABC) to a particular page (34) at a particular offset (27). Main’s runTestcase function calls StorageEngine’s write function, which in turn calls LogMgr’s write function, giving it the transaction id, page id, offset, input string, and what was already written at that page-offset location. StorageEngine expects to get back a pageLSN for the write; once it receives this, it updates the page. ● abort (e.g., 1 abort 5): An instruction to abort a particular transaction (1), and a cap on the number of page writes the LogMgr is permitted (5). RunTestcase calls StorageEngine’s abort function; StorageEngine sets the number of page writes permitted to five and then calls LogMgr::abort(1). ● commit (e.g., 1 commit): An instruction to commit a transaction (1). RunTestcase calls LogMgr::commit(1). ● checkpoint (checkpoint): An instruction to create a checkpoint. RunTestcase calls LogMgr::checkpoint(). ● crash {v} (e.g., crash {2 5} ): where v is a list of integers. An instruction to simulate n crashes and recovery phases, where n is the number of values in the list v. During crash i, v[i] number of writes are permitted during each recovery phase. In this example, the first crash happens, and the LogMgr gets two writes before another crash happens; for the second crash, LogMgr gets five writes to recover. Sometimes the number of writes will allow a full recovery before the next crash; sometimes it will not, and the second crash will begin with StorageEngine no longer writing when LogMgr requests it. When crashing, RunTestcase will destroy the current instance of LogMgr and create a new one, then call StorageEngine::crash(). StorageEngine will destroy all of the Pages it has in memory, then call LogMgr’s recover method. LogMgr is responsible for recovering according to the ARIES protocol. 5 ● end (end): An instruction to end the simulation. Saves the final state of the on-disk pages and exits the program. At the end of the testcase, the simulator will have generated two files: a log and a modified version of the input db file. These can be found in the output/log/ and output/dbs/ , respectively. Each filename includes the number of the corresponding testcase. NOTE: If you already have a file in the log folder with the same name as the new one, the new log entries will be appended to the file, rather than overwriting it. You might want to rename or delete old log files to avoid some confusing bugs. Reading the Log The log file is saved as a multi-line string, with one line for each log entry, so that you can easily review it. The format is loosely based on Figure 18.2 in the textbook. For example, a log might have content like this (heading and table borders added for clarity): LSN prevLSN tx_id type page_id offset before image after image 2 -1 1 update 5 0 xxxxxxxx update10 3 -1 2 update 3 0 xxxxxxxx update20 4 3 2 commit End checkpoint log records also include a record of the transaction table and the dirty page table. Each table is enclosed in curly braces, and each entry in the table is enclosed in square braces. For example, LSN prevLSN tx_id type TX_table Dirty Page Table 5 4 -1 end_checkpoint {[ 1 2 U ] [ 2 3 U ]} {[ 3 3 ] [ 5 2 ]} This transaction table has two entries, and so does the dirty page table. At times, you have to read the log from the disk, which is in string form (see the sample logs under the correct/log s folder). You will have to add a function to LogMgr.cpp, which is declared in LogMgr.h: vector LogMgr::stringToLRVector(string logstring) The above function converts a log in the string form to reconstruct a vector of log records. To help do the conversion, a helper function is available in LogRecord.cpp that converts one line to a LogRecord of appropriate type, allocating the record using new(). 6 LogRecord *LogRecord::stringToRecordPtr(rec_string) Testing We have given you some public test cases. The correct output for these can be found in the correct/logs/ and correct/dbs/ directories. You will likely need to add your own tests. Testing is challenging as your goal is to write testcases that uncover behavior that is not being tested by existing testcases. You can use a GNU test coverage tool called gcov that works with g++. gcov will tell you whether your tests are covering every (or at least most) of the line in LogMgr.cpp. Makefile2 will build your project for test coverage analysis. Simply run “make” with the -f argument, specifying Makefile2 instead of Makefile. % make -f Makefile2 Look at the contents of Makefile2 to see what it does. You may have to do a few edits as you add new test cases. It should run on CAEN and, hopefully, on Macs running recent versions of XCode with command-line tools. Trouble Debugging? If you are using gdb on the CAEN computers to debug, you might find that it has some difficulty displaying stl::map and other data structures that your implementation might use. To fix this, you can use Pretty Print: Make a folder for it wherever you like to keep such things (mine is ~/gdb_printers) and, from inside that folder, run svn co svn://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch/libstdc++-v3/python Once you have that, create a file ~/.gdbinit (or append to it if the file already exists) and add the following lines to it: python import sys sys.path.insert(0, ‘/path/to/gdb_printers/python’) from libstdcxx.v6.printers import register_libstdcxx_printers register_libstdcxx_printers (None) end Make sure you replace /path/to/gdb_printers/python with the actual path. 7 Now try running gdb –args ./main.o testcases/test00 again, and you should be able to view the maps properly. Important Information about the Honor Code Remember that this assignment is to be done individually. All work on the assignment must be your own. No sharing of code, use of code (or pseudo-code), or sharing of test case files written by others is permitted. We do run cheating detection software against all past and current submissions and will report any suspected violations to the Honor Council. Trying to hack the autograder or trick the autograder into giving you a good grade without doing the assigned work is cheating (though we welcome bug reports with the autograder, including any security vulnerabilities, as long as you don’t use them to your advantage). Any violations of these policies caught will receive a zero on the project and will be referred to the Honor Council. Your assignment is to implement ARIES within the simulator environment. As mentioned above, your LogMgr.cpp file must not attempt to read or write any file or use network calls directly, and it must not include any static functions or variables. They are not needed for this assignment. 8 A Few Clarifications: ● When you’re undoing a transaction, any time you see a log record for that transaction, you need to get the prevLSN from that log record and add it to your toUndo list (or end the transaction if the prevLSN is -1). If the record is a CLR or an update record, you then process it the way the book says; if it’s something else (like an abort record), you just move on to the next item in your toUndo list. ● You may have noticed in lecture and some of the problems in the textbook that the log tail is sometimes flushed to disk without a triggering commit or page write or end checkpoint. ARIES permits a regular background flushing of the log tail; for example, a certain amount of space in memory called a log buffer may be set aside, and any time it fills up, the log tail can be flushed to disk. You don’t need to worry about that background flushing for this project. Any reasonably sized log buffer would be bigger than the log tail ever gets in these test cases. ● The book’s description of abort is a little brief. What you need to know is that your abort function should write an abort record, then do basically the same thing as undo, only on a smaller scale. For undo, you need to look at all of the loser transactions and put their most recent LSNs in your toUndo queue. For abort, you just need to find the most recent LSN for one transaction you are aborting. Look at the log record for that LSN. Its prevLSN field will tell you which log record you need to look at next (or if you’re done); its type will tell you if you need to process it the way you process records during undo. When you get to a log record with prevLSN = -1, process that record, then write your end log record, and you’re done. 9