This replication package contains the necessary files to reproduce the results of the ECES/FSE 2015 paper ``Staged Program Repair with Condition Synthesis''. Our paper presents SPR, a novel patch generation system that uses the staged condition synthesis technique. The replication package provides a way to reproduce all of our experiments with SPR. It contains the following components:
1) A public AMI image for reproducing all of our experiments except fbc. Region: US East AMI id: ami-d3610ec4 AMI name: spr-0.23-replication
2) A 32-bit VM image for reproducing all of the fbc experiments. http://rhino.csail.mit.edu/spr-rep/Ubuntu-14.04.1-32bit.tar.gz
3) Scenario tarballs that contain necessary files to reproduce each of the defects in our experiments. http://rhino.csail.mit.edu/spr-rep/scenarios/
4) SPR generated patches for the benchmark defects: http://rhino.csail.mit.edu/spr-rep/spr-result/ http://rhino.csail.mit.edu/spr-rep/spr-wsf-result/
Note that SPR source code in the replication packages are licensed under GPLv3. Relevant license documents are included in the AMI/VM images.
A user can use the replication package to reproduce all of our SPR experiments we performed in the paper. Specifically, the replication package is able to reproduce all SPR results in Table 1, Table 2, and Table 3 in the paper. Note that the results of GenProg and AE are obtained from previous work.
The results obtained from the replication package will be in general consistent with the reported results in the paper. For each defect for which we claim, in the paper, that SPR generates a plausible or correct patch, the user can use the replication package to obtain the same patch.
Program defects often require specific environment setup to reproduce. To facilitate the reproduction we packed our systems into VM images and our benchmark defects into scenario tarballs. We also provide script to facilitate the generation of every number in Table 1, Table 2, and Table 3. Note that our experiments include manual analysis of the generated patches to identify whether the patch is correct or not. We provide descriptions for each SPR generated correct patches.
One benefit to the future research is that SPR in this replication package can be used as the baseline system for the future research. As all the source code of SPR is available inside the provided VM images, researcher who want to build new patch generation systems can build their systems on top of SPR and reuse (part of) the SPR source code.
Another benefit to the future research in the field is that the benchmark scenario tarballs in the replication package provide useful infrastructures (test cases and scripts) to evaluate future patch generation systems. Note that SPR is evaluated on the GenProg 2012 benchmark set and we fixed several known issues in the original GenProg 2012 infrastructure.
Section 2 provides step-by-step instructions of reproducing SPR experiments with this replication package. Section 3 describes our manual analysis of each correct SPR patch. Section 4 describes how to apply SPR to new defects if desired. We recommend the user of this replication package starts with the step-by-step instructions in Section 2.
As described in our paper, we performed all experiments except fbc in Amazon EC2 machines (running Ubuntu) and we performed fbc experiments on a 32-bit VM (running Ubuntu). In this replication package, we provide AMI image of our Amazon EC2 experiment environment and the VM image of our fbc experiment environment.
We evaluated 105 defects/changes in the GenProg 2012 benchmark set. Due to the disk space limit, we cannot package all of the defects/changes into the AMI and VM images. Instead, we separately provide a scenario tarball for each defect/change we evaluated in our experiments.
We next provide a step by step instructions of running SPR to generate patches for the php defect php-309579-309580. Our experiments on other defects can be reproduced similarly.
(1) Launch an m3.large instance in US.East with our AMI "ami-d3610ec4" a) Go to the website: http://aws.amazon.com/ec2 b) Login with your account. c) Click on Compute/EC2 to go to the EC2 Dashboard d) If necessary, change your region to US East (N. Virginia) (upper right button) e) Click "Instances" (left bar). f) Click Launch Instance g) Click Community AMIs (left bar), and search for ami-d3610ec4 h) It should have name "spr-0.22-replication". Click "Select" Note that if you do not find the image, make sure you changed your region to US East and try again. i) Select "m3.large" instance type. Click "Review and Launch". Click "Launch" j) In the pop-up window, you may need to create a new private key file. Assume the key you created/downloaded is at ~/fse.pem. k) View Instances l) In your local Unix shell on your machine (not Amazon), go to your local directory ~ type chmod 400 fse.pem to setup silent permission complaints. m) In the "instances" of the EC2 Dashboard, click "Connect" for the new started instance. The only purpose is to get the ip address of the new instance. You will get the ip address of the new instance. Close n) Suppose the ip is "126.96.36.199". From your local terminal in directory ~, type ssh -i fse.pem firstname.lastname@example.org in the terminal.
(2) Go to the directory ~/Workspace/prophet/build/tests. Type "cd ~/Workspace/prophet/build/tests" in the terminal.
(3) Run "../../tests/scripts/reproduce.py php-309579-309580" to reproduce the php-309579-309580 case with SPR. This case takes approximately 40 minutes to complete.
This script automatically downloads the corresponding scenario tarball from our server, untars the tarball and runs SPR on the defect scenario. For php-309579-309580, the untared directory php-case-2adf58 contains all files of the scenario. 2adf58 is the revision number of the php case from github repository. For some applications, we use different repository systems than the GenProg 2012 paper because old repository systems are often maintained anymore. If you want to reproduce the experiments of running SPR with a specified source file name to repair. Run: "../../tests/scripts/reproduce.py --bug-file php-309579-309580" This "--bug-file" flag causes SPR to explore only those locations inside the specified source file name that the developer patch modifies. Note that GenProg and AE require this information to run, so we provide this option to enable a fair comparison between SPR and those systems. Note that SSH connection may break because the script runs for a long time. If that happens, you may lose your connection to the running session. In the AMI we have tmux installed. You can use tmux to open a terminal before running the script to avoid this problem.
(4) The produced php-fix-2adf58XXX.c is the generated patch file, where XXX is the filename of the original source file to be modified.
(5) At the end of the execution, SPR prints a line like: Total XXXX different repair candidate schemas!!!!
The XXXX corresponds to the size of SPR search space. This corresponds to the number presented in the column "Search Space" in Table 3.
(6) At the end of the execution, SPR prints a line like: Generate a patch at candidate schema no XXXX
The XXXX identifies the rank of the candidate schema that SPR generates the first plausible patch. It corresponds to "Gen At" in Table 3.
(6) At the end of the script, it will print time information. Note that we are keep improving the SPR system, the search space numbers and the running time may be slightly different than the number in the submission. See our draft of final version for the updated numbers.
The timing numbers are presented in the column "SPR Time" and "SPR(WSF) Time" in Table 3. It is also used to compute the average time "SPR(WSF) Time" in Table 1.
(7) At the end of the script, SPR also prints two lines like: Total cnt of passed cond schemas: XXX Total cnt of cond schemas: YYY
The YYY corresponds to the total number of schemas that manipulates branch conditions (that SPR encounters). The XXX corresponds to the number of such schemas that SPR discovers a sequence of abstract condition values to generate correct output for the test case inputs. For the 13 defects that SPR generates correct patches, these two numbers correspond to the column "Condition Value Search On" in Table 2.
(8) We manually search "prophet" in the generated patch source code and we will locate the SPR changes.
In our experiments, we manually analyzed each of the generated patches and determines whether it is correct. For php-309579-309580, SPR generates a patch that changes the condition at ext/date/php_date.c:3766. It is a correct patch and it is equivalent to the developer patch of this defect. See Section 2 of our paper for more detail. This manual analysis obtains the columns "Result" for both SPR and SPR (With Specified File Name) in Table 3. "SPR" and "SPR(WSF)" columns in Table 1 are the summarizations of these results. See Section 3 of this README for more details of each correct SPR patches.
(9) For the 20 defects that the search space of SPR contains at least one correct patch, we manually identify the correct patch in our experiments. To obtain the position of the correct patches in the search space. We wrote scripts to dump the SPR search space and parse the dumped search space.
Run "../../tests/scripts/reproduce.py --parse-space php-309579-309580". It takes approximately 5min to finish and you will see a line like: Correct at schema XX blowup YY ratio ZZ The XX corresponds to the position of the correct patch in the search space ("Correct At" in Table 3), and the ZZ corresponds to the search space blowup if we turn off the staged condition synthesis ("Condition Value Search Off" in Table 2). Run "../../tests/scripts/reproduce.py --parse-space --bug-file php-309579-309580". You will see a line like: Correct at schema XX blowup YY ratio ZZ The XX corresponds to the position of the correct patch in the search space when running SPR with a specified source file name information ("Correct At" in Table 3).
(10) The "Init Time" column in Table 1 is the average running of SPR to initialize a defect scenario. At the initialization step, SPR 1) verifies the original program passes all positive test cases, 2) verifies that the original program fails all negative test cases, and 3) runs the error localization algorithm to identify potential statements (program points) to modify.
If you want to reproduce the initialization step of SPR for the defect php-309579-309580, you can run "../../tests/scripts/reproduce.py --init php-309579-309580". It takes approximately 40 minutes to finish. At the end of the execution, it will print the running time of the initialization.
(11) It is similar to replicate the rest of cases in the benchmark set. Just replace "php-309579-309580" in the above commands with other case id. Run "../../tests/scripts/reproduce.py" without any argument will print out all of the case ids. Section 4 of this README contains general instructions about how to apply SPR to other applications and defects.
(12) All SPR generated fix are checked against all negative and positive test cases by automated scripts. For example, the php-309579-309580 fix is checked with ~/Workspace/prophet/tools/php-test.py. Inside the php-test.py it invokes run-test.php like our manual test for GenProg for test cases. Unlike GenProg, our test script for php checks not only the exit code but also the output of the execution. If you want to manually test the generated fix for php-309579-309580, inside AMI you can: a) Make sure the current directory is ~/Workspace/prophet/build/tests, and you have run the reproduce script in previous steps to generate patches for php-309579-309580. c) Replace php-case-2adf58/php-2adf58-workdir/src/ext/date/php_date.c with the generated fix file. The name of this fix file is php-fix-2adf58ext_date_php_date.c in our example. "cp php-fix-2adf58ext_date_php_date.c php-case-2adf58/php-2adf58-workdir/src/ext/date/php_date.c" d) "cd php-case-2adf58/php-2adf58-workdir/src" and "make" e) The newly built php binary sits in: "~/Workspace/prophet/build/tests/php-case-2adf58/php-2adf58-workdir/src/sapi/cli/php" To test the negative case, type in:
"./sapi/cli/php run-tests.php -p ./sapi/cli/php ../tests/03996.phpt" You should see output indicating that the test case passed. In the file ../../php-2adf58.revlog, you will find the ids for the positive and negative test cases. For php, all cases have a five digit id. Edit the ../../php-2adf58.revlog file and find the id 00051 for positive cases. "./sapi/cli/php run-tests.php -p ./sapi/cli/php ../tests/00051.phpt" Again, you should see that the tests pass.
(13) To avoid additional costs, you need to terminate the instance once you have done with replication. You may also need to delete volumes that used by the instance after its termination.
The instructions above applies also to the fbc experiments. The only difference is that at the step (1) you need to start the provided 32-bit VM image instead of an Amazon EC2 instance. For fbc experiments:
(1) Launch VM image (a) Download VMWare image from: http://rhino.csail.mit.edu/spr-rep/Ubuntu-14.04-32bit.tar (b) Untar the tarball to obtain the VMWare image directory (c) Use VMWare to start this image. The login name is fanl and the password is "password".
(2)-(13) Same as the instructions in Section 2.1. There are three fbc cases "fbc-5251-5252", "fbc-5458-5459", and "fbc-5556-5557".
The current SPR implementation fails to perform initialization on six cases in the GenProg 2012 benchmark set: php-308046-308051, libtiff-ed4969a-8a184dc, python-69831-69833, wireshark-35419-35414, wireshark-37171-37170, and wireshark-37190-37191. Note that we report that SPR generates no patch for these defects in our paper.
The plausible (but incorrect) patch SPR generates for python-70019-70023 is compiler dependent and possibly machine dependent. The patch attempts to get around the test case via messing with memory library calls. Its behavior is therefore highly dependent on the underlying implementation of the memory routine it links to. It is known that the binary generated during SPR repair process can pass the test case in Amazon EC2 (via clang compiler); however, the binary generated by gcc compiler cannot pass the test case. Thanks to Alex Zhikhartsev for reporting this.
SPR generates correct patches for 12 defects. For each defect, we provide an url that contains the developer patch and we either identifies the SPR patch is semantically equivalent to the developer patch or provides a brief analysis for why the SPR patch is correct.
The developer patch: https://github.com/php/php-src/commit/f455f8
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-result/php-307562-307561/php-fix-f455f8%5e1-f455f8ext_dom_document.c
The SPR generated patch is identical to the developer patch. Note that this is a regression that occurs in the repository. The reference correct revision occurs before the buggy revision.
The developer patch: https://github.com/php/php-src/commit/1e91069
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-result/php-307846-307853/php-fix-1e91069ext_date_php_date.c
Analysis: The SPR generated patch is identical to the developer patch.
The developer patch: https://github.com/php/php-src/commit/1d984a7
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-result/php-308734-308761/php-fix-1d984a7ext_tokenizer_tokenizer.c
Analysis: The statement order of the developer patch is slightly different from that of the SPR generated patch, but two patches are semantically equivalent at high level.
The developer patch: https://github.com/php/php-src/commit/991ba131
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-result/php-309516-309535/php-fix-991ba131ext_date_php_date.c
Analysis: The SPR generated patch is identical to the developer patch.
The developer patch: https://github.com/php/php-src/commit/2adf58c
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-result/php-309579-309580/php-fix-2adf58ext_date_php_date.c
Analysis: The SPR generated patch is semantically equivalent to the developer patch. This is our motivating example in our paper. See our paper for details.
The developer patch: https://github.com/php/php-src/commit/5a8c917
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-result/php-309892-309910/php-fix-5a8c917ext_standard_string.c
Analysis: The developer patch removes an if statement block. The SPR generated patch conjoins the branch condition of the if statement with 0, which effectively nullifies the whole if statement block. Two patches are semantically equivalent.
The developer patch: https://github.com/php/php-src/commit/8ba00176
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-result/php-310991-310999/php-fix-8ba00176Zend_zend_compile.c
The developer patch changes an if statement condition from (A || (B && C)) to ((A || B) && C). The SPR patch changes the condition to ((A || (B && C)) && C), which is semantically equivalent to the developer patch.
The developer patch: https://github.com/php/php-src/commit/1056c57f
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-result/php-311346-311348/php-fix-1056c57fext_standard_url_scanner_ex.c
The functionality of the developer patched code is that if "ctx->buf.len" (which holds the length of "ctx->buf") is not zero, then "handled_output" is assigned as the concatenation of "ctx->buf" and "output"; otherwise "handled_output" is assigned as the "output". When "ctx->buf.len" is zero, the code in the then branch has the same effect as the else branch since the string "ctx->buf" is empty. So the SPR patch, which eliminates the condition and lets the program always do the then branch, is also correct.
The developer patch: https://github.com/vadz/libtiff/commit/eec7ec0
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-result/libtiff-ee2ce5b7-b5691a5a/libtiff-fix-tests-eec7ec0tools_tiff2pdf.c
The SPR patch is identical to the developer patch.
The developer patch: https://gmplib.org/repo/gmp/rev/131005cc271b
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-result/gmp-13420-13421/gmp-fix-13421mpn_generic_powm.c
The developer patch removes the variable "b2p" and the assignment statement "b2p = tp + 2 * n". It then replaces every occurrence or "b2p" to "rp". The SPR patch simply changes the assignment "b2p = tp + 2 * n" to "b2p = rp", which is semantically equivalent to the developer patch at high level.
The developer patch: http://git.savannah.gnu.org/cgit/gzip.git/commit/?id=f17cbd13a1d0a7
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-wsf-result/gzip-a1d3d4019d-f17cbd13a1/gzip-fix-f17cbd13a1d0a7gzip.c
Both the developer patch and the SPR patch inserts an assignment statement to initialize the variable "ifd" to 0. Two patches are semantically equivalent at high level.
The developer patch: https://github.com/php/php-src/commit/09273098521913a
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-result/php-307914-307915/php-fix-09273098521913aext_phar_phar.c
The developer patch: https://hg.python.org/cpython/rev/69784
The SPR correct patches: http://rhino.csail.mit.edu/spr-rep/spr-wsf-result/python-69783-69784/python-fix-69784Modules_timemodule.c
Analysis: Both the developer patch and the SPR patch remove an if statement block. They are semantically equivalent. Note that this case is a deliberate functionality change during development not a defect.
Defects often require specific environments to reproduce. We strongly recommend any user of this replication package to reproduce our experiments via provided AMI and VM images with the above instructions. However, SPR is able to apply to other UNIX like environments, other applications, and other defects. Here are general instructions about how to do so:
0) The application needs to be able to build with both gcc and clang. SPR uses clang to run its error localization algorithm. It requires llvm and clang 3.6.1.
1) Write a script that builds the application. See, for example, ~/Workspace/prophet/tools/fbc-build.py in the SPR AMI vm is the script for building fbc.
2) Write another script that tests the application. See, for example, ~/Workspace/prophet/tools/fbc-test.py in the SPR AMI. The script takes the built src directory, the testcase directory if any, and a set of testcase ids. It outputs the list of passed testcase ids.
3) Write or generate a log file that specifies the testcase ids of the positive testcase set and the testcase ids of the negative testcase set. See, for example, if you untar the scenario fbc-5458-5459 (fbc-5459.tar.gz), you can find the log file for the scenario at ~/Workspace/prophet/build/tests/fbc-case-5459/fbc-5459.revlog in SPR 32-bit vm.
4) Write a configuration file that specifies: a) the location of the scripts b) the source location of the application c) the location of the test cases d) the location of the log file that specifies negative and positive testcases e) add a line "localizer=profile" to enable error localizer See, for example, if you untar the scenario fbc-5458-5459 (fbc-5459.tar.gz), you can find the configuration file of the scenario at: ~/Workspace/prophet/build/tests/fbc-case-5459/fbc-5459.conf in SPR 32-bit vm.
5) Invoke SPR with the configuration file: for example, you can call: ../src/prophet ~/Workspace/prophet/build/tests/fbc-case-5459/fbc-5459.conf to run SPR.
When you run it in this way, SPR will create a temporary workdir. You can specify the worker name and make the workdir permanent. You can run initialization step first by invoking: ../src/prophet ~/Workspace/prophet/build/tests/fbc-case-5459/fbc-5459.conf -r workdir -init-only SPR will create a directory called "workdir" to hold all results after initialization.
Then after initialization you can call ../src/prophet -r workdir -skip-verify or ../src/prophet -r workdir -skip-verify -first-n-loc 200 -consider-all to run SPR.
a) skip-verify flag tells SPR to skip initialization step b) first-n-loc and consider-all flags tell SPR to ignore the supplied bug file information if any and run SPR without file information.
The benefit of having a working directory is that you can avoid running error localization algorithm and testcase verification again.