Description
Part (I) Building an Automated Bug Detection Tool You will learn likely-invariants from call graphs in a way that is similar to the SOSP’01 paper discussed in class (Coverity). In particular, you will learn what pairs of functions are called together. You will then use these likely-invariants to automatically detect software bugs. (a) Inferring Likely Invariants for Bug Detection Take a look at the following contrived code segment. void scope1() { A(); B(); C(); D(); } void scope2() { A(); C(); D(); } void scope3() { A(); B(); B(); } void scope4() { B(); D(); scope1(); } void scope5() { B(); D(); A(); } void scope6() { B(); D(); } We can learn that function A and function B are called together three times in function scope1, scope3, and scope5. Function A is called four times in function scope1, scope2, scope3, and scope5. We infer that the one time that function A is called without B in scope2 is a bug, as function A and B are called together 3 times. Please note that we only count B once in scope3 although scope3 calls B two times. We define support as the number of times a pair of functions appears together. Therefore, support({A, B}) is 3. We define conf idence({A, B}, {A}) as support({A,B}) support({A}) , which is 3 4 . We set the threshold for support and confidence to be T SUP P ORT and T CONF IDENCE, whose default values are 3 and 65%. You only print bugs with confidence T CONF IDENCE or more and with support T SUP P ORT times or more. For example, function B is called five times in function scope1, scope3, scope4, scope5, and scope6. The two times that function B is called alone are not printed as a bug as the confidence is only 60% ( support(A,B) support(B) = 3 5 ), lower than the T T HRESHOLD, which is 65%. Please note that both support(A, B) and support(B, A) are the same, 3. Perform intra-procedural analysis. For example, do not expand scope1 in scope4 to the four functions A, B, C, and D. Match function names only. Do not consider function parameters. For example, scope1() and scope1(int) are considered the same function. The sample output with the default support and confidence thresholds should be: bug: A in scope2, pair: (A B), support: 3, confidence: 75.00% bug: A in scope3, pair: (A D), support: 3, confidence: 75.00% bug: B in scope3, pair: (B D), support: 4, confidence: 80.00% bug: D in scope2, pair: (B D), support: 4, confidence: 80.00% Generate call graphs using LLVM. Given a bitcode file, you use LLVM opt to generate call graphs into plain text format, and then analyze the textual call graphs to generate function pairs and detect bugs. The opt tool is installed on ecelinux under /usr/local/bin. If you want to work on your own computer, you need to use LLVM 3.0 64 bits in order to avoid compatibility issues. LLVM 3.0 is available as a package in many popular Linux distros. A sample call graph in text format is shown as follows: 2 Call graph node for function: ‘ap_get_mime_headers_core’<<0x42ff770>> #uses=4 CS<0x4574458> calls function ‘ap_rgetline_core’ CS<0x4575958> calls external node CS<0x4575a50> calls external node CS<0x4575b20> calls function ‘apr_table_setn’ CS<0x45775a8> calls external node CS<0x45776a0> calls external node CS<0x4577770> calls function ‘apr_table_setn’ CS<0x45783b8> calls external node CS<0x4579dc0> calls function ‘apr_table_setn’ CS<0x4579f78> calls function ‘strchr’ CS<0x457a948> calls external node CS<0x457aa40> calls external node CS<0x457ab10> calls function ‘apr_table_setn’ CS<0x457d9d0> calls function ‘apr_table_addn’ CS<0x457e4e8> calls function ‘apr_table_compress’ and Call graph node <><<0x2411e40>> #uses=0 The following explanation should help you understand the call graph: • The above call graph represents the function ap get mine headers core calls functions ap rgetline core, apr table setn, etc. • #uses= gives the number of times the caller is called by other functions. For example, ap get mine headers core is called 4 times. • CS denotes call site. • 0x????? indicates the memory address of the data structure, i.e., call graph nodes and call sites. • An external node denotes a function that is not implemented in this object file. Ignore external node functions. • A call graph node of null function represents all external callers. Ignore null function nodes entirely. Performance. We will evaluate the performance/scalability of your program on a pass/fail basis. The largest program that we test will contain up to 20k nodes and 60k edges. Each test case will be given a timeout of two minutes. A timed-out test case will receive zero points. Hints: • Avoid string comparison. Use some kind of ID instead. • Avoid using strings as keys in hash tables. Use some kind of ID instead. • Avoid using nested loops. An O(nlgn) approach is MUCH more efficient than an O(n 2 ) approach. • In a nested loop, prune an iteration in the outer loop is more beneficial than in the inner loop. • Profile your code if the provided test cases take more than 5 seconds to run. Submission protocol. Please follow this protocol and read it CAREFULLY: • Submit your source code. Please document/comment your code properly so that it is understandable. You must use the skeleton files and include a Makefile to compile your program from source. • Put your Makefile directly under pi directory. To reduce the size of your submission, please remove the sample test cases and the scripts given to you from your submission. • The script (verify.sh) that we plan to use to verify your output is given to you, or will be given to you soon. Please write your code and makefile to fit the script. • Please make sure your code runs on ecelinux machines. We will use the provided scripts to grade your project on ecelinux machines. If your code doesn’t run on ecelinux, you will receive at most 20 points out of the full 100 points. You must use C/C++ or Java for part (I). • Marking will be done with strict line comparison. We will count how many pairs are missing (false negatives) and how many pairs are false positives. 3 • You should print one pair per line to stdout in this format: bug: %s in %s, pair: (%s %s), support: %d, confidence: %.2f\%\n • If you use Java, make sure you round decimal numbers in the same way as C/C++ on Linux by setting the appropriate RoundingMode. For example: NumberFormat numf = NumberFormat.getNumberInstance(); numf.setMaximumFractionDigits(2); numf.setRoundingMode (RoundingMode.HALF_EVEN); System.out.println(“confidence:” + numf.format(confidence*100.0) + “%”); • The order of different pairs does not matter. We will run sort before diff as you can see from the script verify.sh. • The order of the two functions in a pair does not matter in the calculations. Both (foo, bar) and (bar, foo) contribute to the count of pair (bar, foo). However, in order to have consistent output (for easier grading), sort the pair alphabetically while printing. For example, print (bar, foo) instead of (foo, bar). • Ignore duplicate calls to the same function. For example, if we have the code segment scope() { A(); B(); A(); B(); A(); }, the support for the pair (A, B), support(A, B), is 1. • Name your program binary pipair. We should be able to run it with the following command: ./pipair , e.g., ./pipair hello.bc 10 80, to specify the support of 10, and the confidence of 80%, or ./pipair , e.g., ./pipair hello.bc, to use the default support of 3, and the default confidence of 65%. • If running your program requires special command line format such as java YourProgram arguments, your pipair should be a wrapper, e.g.: #!/bin/bash java YourProgram $@ # $@ represents all command line arguments Skeleton files and marking. • It is recommended to develop on ecelinux or your own computer running a popular flavour of Linux. There were reports show that “sort”, as an example, behaves differently in Mac than in Linux. Cygwin and MinGW were not tested. • To generate the output for one test, run “make” in the test’s directory. Your output should be identical to the “gold” file. Your output should be passed through “sort” before “diff”ing with the “gold” file, i.e. cat | sort | diff – gold x xx • To run all tests together, execute “verify.sh”. Logs of all output can be found in /tmp. • clean.sh runs “make clean” in all test directories. • For marking, we will untar your submission, copy the content of the pi directory, run make, copy over the skeleton files and the full test suite with six tests and run then verify.sh. Note that Each test is given 2 minutes. My prototype runs test3 in 2 seconds and test6 in 25 seconds, and all other tests in tens or hundreds of milliseconds. • Since the skeleton files and tests are copied over during marking, do NOT modify any files given in proj-skeleton since they will be over-written. Common issues. • It says “Can’t make”. This error indicates there was a problem while verify.sh tried to run make inside the test directories. The error message is usually in the testx x xx.out, or sometimes in /tmp/testing–pi-