Transcript .ppt
RECOMPUTING COVERAGE INFORMATION TO ASSIST REGRESSION TESTING Written by: Pavan Kumar Chittimalli Mary Jean Harrold Reviewed by: Joan Baldriche OUTLINE Introduction Background Code Examples and coverage matrices Algorithm ReCover Tool Empirical Studies Motivation Problem Statement Coverage Data Subjects Study 1 Study 2 Study 3 Related work Conclusions Paper critique Questions INTRODUCTION MOTIVATION Software is changed for a variety of reasons, such as correcting errors, adding new features, and improving performance. After the software is changed, regression testing is applied to the modified version of the software to ensure that it behaves as intended, and that modifications have not adversely impacted its quality. Regression testing is expensive and it could consume as much as 80% of the testing budget and up to 50% of the cost of software maintenance. INTRODUCTION PROBLEM STATEMENT Need to develop a technique that saves time and resources by not rerunning the entire test suite. Need to create a procedure that does not use inaccurate data from outdated coverage data or estimated coverage data. Need to produce a method that offers precise results as if all the test cases in the test suite were run. INTRODUCTION COVERAGE DATA Coverage data is a measure used in software testing. It describes the degree to which the source code of a program has been tested. Coverage data collected when testing a version of software is used by regression testing techniques to assist in identifying the testing that should be performed on the new version of the software. In this paper we will discuss three kinds of coverage data: outdated, updated and estimated coverage data. INTRODUCTION COVERAGE DATA One approach is to reuse the coverage data collected when the test suite Ti is run on one version of the program Pi for tasks on Pi+1 and subsequent versions so that the expense of recomputing it for each subsequent version of Pi is avoided. This is called outdated coverage data. The next approach reruns all test cases in Ti on Pi+1 to get accurate coverage data on Pi+1. This is called updated coverage data. However, this approach defeats the purpose of techniques that aim to reduce the number of test cases that need to be rerun because it reruns all test cases in Ti on Pi+1. The last approach estimates coverage data for Pi+1 based on coverage data for Pi. This is called estimated coverage data. BACKGROUND Since it is very expensive to rerun all the test cases in a test suite during regression testing, researchers have developed techniques to improve the efficiency of the retesting. For example, regression test selection (RTS) techniques select a subset of Ti ; Ti’ and use it to test Pi+1. We use an RTS technique implemented as DEJAVOO to illustrate the impact of outdated data on regression testing. DEJAVOO creates control-flow graphs for the original (Porig) and modified (Pmod) versions of a program. BACKGROUND The technique performs the traversal in a depth-first order to identify dangerous edges. Dangerous edges are edges whose sinks differ and for which test cases in T that executed the edge in Porig should be rerun on Pmod because they may behave differently in Pmod. An example of these graphs are shown in Fig. 4, from running DEJAVOO on the program Grade. CODE EXAMPLES AND COVERAGE MATRICES CODE EXAMPLES AND COVERAGE MATRICES CODE EXAMPLES AND COVERAGE MATRICES CONTROL-FLOW GRAPHS (a) v0 (b) v1 (c) v2 ALGORITHM The algorithm, RECOMPUTEMATRIX, shown in Fig. 5, for recomputing coverage data after changes are made to a program provides the same coverage data as rerunning all test cases in the original test suite but requires running only those test cases selected to run on the modified program. RECOMPUTEMATRIX takes four inputs: Porig and Pmod, T and morig. RECOMPUTEMATRIX outputs mmod. ALGORITHM RECOMPUTEMATRIX consists of five main steps: Creating and initializing the coverage matrix mmod for Pmod (lines 1 and 2) Identifying T’ the set of test cases in T to rerun on Pmod and computing the entity mappings entityMap between Porig and Pmod (line 3) Creating the selectively instrumented version of Pmod ; Pmod-inst (line 4) Running Pmod-inst with T’ to get coverage data for the affected entities in Pmod (lines 5-7) Transferring the coverage data for the unaffected parts of Pmod using the mappings stored in entityMap and the affected entities in insEntities (lines 8-18) ALGORITHM RECOMPUTEMATRIX ALGORITHM SELECTIVEINSTRUMENT ALGORITHM MATRICES RECOVER TOOL EMPIRICAL STUDIES SUBJECTS EMPIRICAL STUDIES STUDY 1 Question: What are the effects of the three techniques for providing coverage data—outdated, estimated, and updated—on regression test selection (RTS)? To answer this research question, the authors used all six subjects described earlier. For these subjects, the authors populated outdated, estimated, and updated coverage data. Outdated coverage data was obtained by running the test suite. Estimated coverage data was obtained using JDIFF. Updated coverage data was obtained using ReCover. Then DEJAVOO was run on the subject programs with the three different coverage data sets. EMPIRICAL STUDIES STUDY 1 Jakarta Regexp ProAX Assent EMPIRICAL STUDIES STUDY 1 NanoXML EMPIRICAL STUDIES STUDY 1 JABA EMPIRICAL STUDIES STUDY 2 Question: What is the effect of selective instrumentation in reducing the expense of running the test suite selected by the RTS algorithm? To answer this research question, the authors performed two experiments. In the first experiment, the authors measured and compared the number of branches instrumented by the full instrumentation and by the selective instrumentation. In the second experiment, the authors measured and compared the time to run the selected test suite T’ on Pmod instrumented with full instrumentation and the time to run T’ on Pmod instrumented with selective instrumentation. EMPIRICAL STUDIES STUDY 2 Experiment 1 Experiment 2 EMPIRICAL STUDIES STUDY 3 Question: What is the efficiency of our technique for updating coverage data as part of a regression testing process? To answer this question, the authors measured and compared regression-testing time for four approaches: Running all test cases in T on all versions of the program P. Selecting T’ using DEJAVOO and running the test cases in T’ on all modified versions of P. Selecting T’ and recording mappings using MOD-DEJAVOO, updating coverage data for T-T’ using RECOVER, instrumenting the modified versions of P with full instrumentation, and running the test cases in T’ on the fully instrumented modified versions of P. Selecting T’ and recording the mappings using MOD-DEJAVOO, updating coverage data for T-T’ using RECOVER, instrumenting modified versions of P using selective instrumentation, and running test cases in T’ on the selectively instrumented modified versions of P. EMPIRICAL STUDIES STUDY 3 The proposed technique using RECOVER with selective instrumentation saves, on average, 17.35 percent of the regression-testing time for all of the experimental subjects. EMPIRICAL STUDIES STUDY 3 RELATED WORK To the best of the author’s knowledge, no other technique has been presented to solve the problem of providing accurate coverage information without rerunning all test cases in the test suite. However, several techniques are related in that they confirm the existence of the problem or provide alternative approaches. CONCLUSIONS In this paper, was presented a technique that provides updated coverage data for a modified program without running all test cases in the test suite that was developed for the original program and used for regression testing. The technique is safe and precise in that it computes exactly the same information as if all test cases in the test suite were rerun. It was also shown the results of three empirical studies on a set of subject programs of varying sizes, along with versions of those programs and test suites used to test them. CONCLUSIONS The first study confirms that regression test selection using outdated and estimated coverage data causes the regression test selection algorithm to both select unnecessary test cases and omit important test cases. The second study shows that selective instrumentation saves in the number of probes that are required for running the test cases selected by the regression test selection algorithm. This reduction results in a savings in the time to run the test cases selected, and thus, reduces the overall regression testing time. The third study shows that the technique with selective instrumentation reduces the time required for regression testing over DEJAVOO. PAPER CRITIQUE Benefits: Description of a novel technique that computes accurate, updated coverage data when a program is modified, without rerunning unnecessary test cases. Discussion of a tool, RECOVER, that implements the technique and integrates it with RTS. Set of empirical studies that show, for the subjects studied, that the technique provides an effective and efficient way to update coverage data for use on subsequent regression-testing tasks. QUESTIONS