TY - JOUR T1 - Developing a Single Model and Test Prioritization Strategies for Event-Driven Software JF - Software Engineering, IEEE Transactions on Y1 - 2011 A1 - Bryce,R.C. A1 - Sampath,S. A1 - Memon, Atif M. KW - EDS KW - event-driven software KW - graphical user interface KW - Graphical user interfaces KW - GUI testing KW - Internet KW - program testing KW - service-oriented architecture KW - test prioritization strategy KW - Web application testing AB - Event-Driven Software (EDS) can change state based on incoming events; common examples are GUI and Web applications. These EDSs pose a challenge to testing because there are a large number of possible event sequences that users can invoke through a user interface. While valuable contributions have been made for testing these two subclasses of EDS, such efforts have been disjoint. This work provides the first single model that is generic enough to study GUI and Web applications together. In this paper, we use the model to define generic prioritization criteria that are applicable to both GUI and Web applications. Our ultimate goal is to evolve the model and use it to develop a unified theory of how all EDS should be tested. An empirical study reveals that the GUI and Web-based applications, when recast using the new model, show similar behavior. For example, a criterion that gives priority to all pairs of event interactions did well for GUI and Web applications; another criterion that gives priority to the smallest number of parameter value settings did poorly for both. These results reinforce our belief that these two subclasses of applications should be modeled and studied together. VL - 37 SN - 0098-5589 CP - 1 M3 - 10.1109/TSE.2010.12 ER - TY - JOUR T1 - GUI Interaction Testing: Incorporating Event Context JF - Software Engineering, IEEE Transactions on Y1 - 2011 A1 - Xun Yuan A1 - Cohen,M. B A1 - Memon, Atif M. KW - automatic test case generation KW - automatic test pattern generation KW - combinatorial interaction testing KW - event driven nature KW - graphical user interface KW - Graphical user interfaces KW - GUI interaction testing KW - program testing AB - Graphical user interfaces (GUIs), due to their event-driven nature, present an enormous and potentially unbounded way for users to interact with software. During testing, it is important to #x201C;adequately cover #x201D; this interaction space. In this paper, we develop a new family of coverage criteria for GUI testing grounded in combinatorial interaction testing. The key motivation of using combinatorial techniques is that they enable us to incorporate #x201C;context #x201D; into the criteria in terms of event combinations, sequence length, and by including all possible positions for each event. Our new criteria range in both efficiency (measured by the size of the test suite) and effectiveness (the ability of the test suites to detect faults). In a case study on eight applications, we automatically generate test cases and systematically explore the impact of context, as captured by our new criteria. Our study shows that by increasing the event combinations tested and by controlling the relative positions of events defined by the new criteria, we can detect a large number of faults that were undetectable by earlier techniques. VL - 37 SN - 0098-5589 CP - 4 M3 - 10.1109/TSE.2010.50 ER - TY - JOUR T1 - Generating Event Sequence-Based Test Cases Using GUI Runtime State Feedback JF - Software Engineering, IEEE Transactions on Y1 - 2010 A1 - Xun Yuan A1 - Memon, Atif M. KW - automatic model driven technique KW - event interaction coverage equivalent counterparts KW - event semantic interaction relationships KW - event sequence based test cases KW - Graphical user interfaces KW - GUI runtime state feedback KW - program testing KW - Software quality AB - This paper presents a fully automatic model-driven technique to generate test cases for graphical user interfaces (GUIs)-based applications. The technique uses feedback from the execution of a ??seed test suite,?? which is generated automatically using an existing structural event interaction graph model of the GUI. During its execution, the runtime effect of each GUI event on all other events pinpoints event semantic interaction (ESI) relationships, which are used to automatically generate new test cases. Two studies on eight applications demonstrate that the feedback-based technique 1) is able to significantly improve existing techniques and helps identify serious problems in the software and 2) the ESI relationships captured via GUI state yield test suites that most often detect more faults than their code, event, and event-interaction-coverage equivalent counterparts. VL - 36 SN - 0098-5589 CP - 1 M3 - 10.1109/TSE.2009.68 ER - TY - CONF T1 - Repairing GUI Test Suites Using a Genetic Algorithm T2 - Software Testing, Verification and Validation (ICST), 2010 Third International Conference on Y1 - 2010 A1 - Huang,Si A1 - Cohen,M. B A1 - Memon, Atif M. KW - automated functional testing KW - genetic algorithm KW - Genetic algorithms KW - graph model KW - graphical user interface KW - Graphical user interfaces KW - GUI test suite KW - program testing KW - random algorithm KW - synthetic program KW - test case AB - Recent advances in automated functional testing of Graphical User Interfaces (GUIs) rely on deriving graph models that approximate all possible sequences of events that may be executed on the GUI, and then use the graphs to generate test cases (event sequences) that achieve a specified coverage goal. However, because these models are only approximations of the actual event flows, the generated test cases may suffer from problems of infeasibility, i.e., some events may not be available for execution causing the test case to terminate prematurely. In this paper we develop a method to automatically repair GUI test suites, generating new test cases that are feasible. We use a genetic algorithm to evolve new test cases that increase our test suite's coverage while avoiding infeasible sequences. We experiment with this algorithm on a set of synthetic programs containing different types of constraints and for test sequences of varying lengths. Our results suggest that we can generate new test cases to cover most of the feasible coverage and that the genetic algorithm outperforms a random algorithm trying to achieve the same goal in almost all cases. JA - Software Testing, Verification and Validation (ICST), 2010 Third International Conference on M3 - 10.1109/ICST.2010.39 ER - TY - CONF T1 - Using methods & measures from network analysis for gui testing T2 - Software Testing, Verification, and Validation Workshops (ICSTW), 2010 Third International Conference on Y1 - 2010 A1 - Elsaka,E. A1 - Moustafa,W. E A1 - Nguyen,Bao A1 - Memon, Atif M. KW - betweenness clustering method KW - event sequences KW - event-flow graph model KW - Graphical user interfaces KW - GUI quality assurance KW - GUI testing KW - network analysis KW - network centrality measures KW - program testing KW - Software quality AB - Graphical user interfaces (GUIs) for today's applications are extremely large. Moreover, they provide many degrees of freedom to the end-user, thus allowing the user to perform a very large number of event sequences on the GUI. The large sizes and degrees of freedom create severe problems for GUI quality assurance, including GUI testing. In this paper, we leverage methods and measures from network analysis to analyze and study GUIs, with the goal of aiding GUI testing activities. We apply these methods and measures on the event-flow graph model of GUIs. Results of a case study show that "network centrality measures" are able to identify the most important events in the GUI as well as the most important sequences of events. These events and sequences are good candidates for test prioritization. In addition, the "betweenness clustering" method is able to partition the GUI into regions that can be tested separately. JA - Software Testing, Verification, and Validation Workshops (ICSTW), 2010 Third International Conference on M3 - 10.1109/ICSTW.2010.61 ER - TY - CONF T1 - An Extensible Heuristic-Based Framework for GUI Test Case Maintenance T2 - Software Testing, Verification and Validation Workshops, 2009. ICSTW '09. International Conference on Y1 - 2009 A1 - McMaster,S. A1 - Memon, Atif M. KW - extensible heuristic-based framework KW - graphical user interface KW - Graphical user interfaces KW - GUI code KW - GUI test case maintenance KW - program testing KW - Software maintenance AB - Graphical user interfaces (GUIs) make up a large portion of the code comprising many modern software applications. However, GUI testing differs significantly from testing of traditional software. One respect in which this is true is test case maintenance. Due to the way that GUI test cases are often implemented, relatively minor changes to the construction of the GUI can cause a large number of test case executions to malfunction, often because GUI elements referred to by the test cases have been renamed, moved, or otherwise altered. We posit that a general solution to the problem of GUI test case maintenance must be based on heuristics that attempt to match an applicationpsilas GUI elements across versions. We demonstrate the use of some heuristics with framework support. Our tool support is general in that it may be used with other heuristics if needed in the future. JA - Software Testing, Verification and Validation Workshops, 2009. ICSTW '09. International Conference on M3 - 10.1109/ICSTW.2009.11 ER - TY - CONF T1 - An Initial Characterization of Industrial Graphical User Interface Systems T2 - Software Testing Verification and Validation, 2009. ICST '09. International Conference on Y1 - 2009 A1 - Brooks,P.A. A1 - Robinson,B.P. A1 - Memon, Atif M. KW - Graphical user interfaces KW - GUI-based software systems KW - industrial graphical user interface systems KW - model-based GUI testing techniques KW - program testing KW - software metrics KW - source code change metrics AB - To date we have developed and applied numerous model-based GUI testing techniques; however, we are unable to provide definitive improvement schemes to real-world GUI test planners, as our data was derived from open source applications, small compared to industrial systems. This paper presents a study of three industrial GUI-based software systems developed at ABB, including data on classified defects detected during late-phase testing and customer usage, test suites, and source code change metrics. The results show that (1) 50% of the defects found through the GUI are categorized as data access and handling, control flow and sequencing, correctness, and processing defects, (2) system crashes exposed defects 12-19% of the time, and (3) GUI and non-GUI components are constructed differently, in terms of source code metrics. JA - Software Testing Verification and Validation, 2009. ICST '09. International Conference on M3 - 10.1109/ICST.2009.11 ER - TY - CONF T1 - Introducing a test suite similarity metric for event sequence-based test cases T2 - Software Maintenance, 2009. ICSM 2009. IEEE International Conference on Y1 - 2009 A1 - Brooks,P.A. A1 - Memon, Atif M. KW - event driven software systems KW - event sequence-based test cases KW - open source systems KW - program testing KW - public domain software KW - software metrics KW - Software testing KW - test suite similarity metric AB - Most of today's event driven software (EDS) systems are tested using test cases that are carefully constructed as sequences of events; they test the execution of an event in the context of its preceding events. Because sizes of these test suites can be extremely large, researchers have developed techniques, such as reduction and minimization, to obtain test suites that are ldquosimilarrdquo to the original test suite, but smaller. Existing similarity metrics mostly use code coverage; they do not consider the contextual relationships between events. Consequently, reduction based on such metrics may eliminate desirable test cases. In this paper, we present a new parameterized metric, CONTeSSi(n) which uses the context of n preceding events in test cases to develop a new context-aware notion of test suite similarity for EDS. This metric is defined and evaluated by comparing four test suites for each of four open source applications. Our results show that CONT eSSi(n) is a better indicator of the similarity of EDS test suites than existing metrics. JA - Software Maintenance, 2009. ICSM 2009. IEEE International Conference on M3 - 10.1109/ICSM.2009.5306305 ER - TY - CONF T1 - Prioritizing component compatibility tests via user preferences T2 - Software Maintenance, 2009. ICSM 2009. IEEE International Conference on Y1 - 2009 A1 - Yoon,Il-Chul A1 - Sussman, Alan A1 - Memon, Atif M. A1 - Porter, Adam KW - compatibility testing prioritization KW - component configurations KW - computer clusters KW - Middleware KW - Middleware systems KW - object-oriented programming KW - program testing KW - software engineering KW - Software systems KW - third-party components KW - user preferences AB - Many software systems rely on third-party components during their build process. Because the components are constantly evolving, quality assurance demands that developers perform compatibility testing to ensure that their software systems build correctly over all deployable combinations of component versions, also called configurations. However, large software systems can have many configurations, and compatibility testing is often time and resource constrained. We present a prioritization mechanism that enhances compatibility testing by examining the ldquomost importantrdquo configurations first, while distributing the work over a cluster of computers. We evaluate our new approach on two large scientific middleware systems and examine tradeoffs between the new prioritization approach and a previously developed lowest-cost-configuration-first approach. JA - Software Maintenance, 2009. ICSM 2009. IEEE International Conference on M3 - 10.1109/ICSM.2009.5306357 ER - TY - CONF T1 - Towards Dynamic Adaptive Automated Test Generation for Graphical User Interfaces T2 - Software Testing, Verification and Validation Workshops, 2009. ICSTW '09. International Conference on Y1 - 2009 A1 - Xun Yuan A1 - Cohen,M. B A1 - Memon, Atif M. KW - adaptive automated test generation KW - computational complexity KW - event sequence length KW - evolutionary algorithm KW - evolutionary computation KW - graphical user interface KW - Graphical user interfaces KW - GUI test case KW - program testing AB - Graphical user interfaces (GUIs) present an enormous number of potential event sequences to users. During testing it is necessary to cover this space, however the complexity of modern GUIs has made this an increasingly difficult task. Our past work has demonstrated that it is important to incorporate "context” into GUI test cases, in terms of event combinations, event sequence length, and by considering all possible starting and ending positions for each event. Despite the use of our most refined modeling techniques, many of the generated test cases remain unexecutable. In this paper, we posit that due to the dynamic state-based nature of GUIs, it is important to incorporate feedback from the execution of tests into test case generation algorithms. We propose the use of an evolutionary algorithm to generate test suites with fewer unexecutable test cases and higher event interaction coverage. JA - Software Testing, Verification and Validation Workshops, 2009. ICSTW '09. International Conference on M3 - 10.1109/ICSTW.2009.26 ER - TY - CONF T1 - Relationships between Test Suites, Faults, and Fault Detection in GUI Testing T2 - Software Testing, Verification, and Validation, 2008 1st International Conference on Y1 - 2008 A1 - Strecker,J. A1 - Memon, Atif M. KW - Fault detection KW - fault-related factors KW - Graphical user interfaces KW - GUI testing KW - program testing KW - software-testing KW - test suites KW - test-suite-related factors AB - Software-testing researchers have long sought recipes for test suites that detect faults well. In the literature, empirical studies of testing techniques abound, yet the ideal technique for detecting the desired kinds of faults in a given situation often remains unclear. This work shows how understanding the context in which testing occurs, in terms of factors likely to influence fault detection, can make evaluations of testing techniques more readily applicable to new situations. We present a methodology for discovering which factors do statistically affect fault detection, and we perform an experiment with a set of test-suite- and fault-related factors in the GUI testing of two fielded, open-source applications. Statement coverage and GUI-event coverage are found to be statistically related to the likelihood of detecting certain kinds of faults. JA - Software Testing, Verification, and Validation, 2008 1st International Conference on M3 - 10.1109/ICST.2008.26 ER - TY - CONF T1 - Fault Detection Probability Analysis for Coverage-Based Test Suite Reduction T2 - Software Maintenance, 2007. ICSM 2007. IEEE International Conference on Y1 - 2007 A1 - McMaster,S. A1 - Memon, Atif M. KW - coverage-based test suite reduction KW - fault detection probability analysis KW - Fault diagnosis KW - force coverage-based reduction KW - percentage fault detection reduction KW - percentage size reduction KW - program testing KW - software reliability KW - statistical analysis AB - Test suite reduction seeks to reduce the number of test cases in a test suite while retaining a high percentage of the original suite's fault detection effectiveness. Most approaches to this problem are based on eliminating test cases that are redundant relative to some coverage criterion. The effectiveness of applying various coverage criteria in test suite reduction is traditionally based on empirical comparison of two metrics derived from the full and reduced test suites and information about a set of known faults: (1) percentage size reduction and (2) percentage fault detection reduction, neither of which quantitatively takes test coverage data into account. Consequently, no existing measure expresses the likelihood of various coverage criteria to force coverage-based reduction to retain test cases that expose specific faults. In this paper, we develop and empirically evaluate, using a number of different coverage criteria, a new metric based on the "average expected probability of finding a fault" in a reduced test suite. Our results indicate that the average probability of detecting each fault shows promise for identifying coverage criteria that work well for test suite reduction. JA - Software Maintenance, 2007. ICSM 2007. IEEE International Conference on M3 - 10.1109/ICSM.2007.4362646 ER - TY - JOUR T1 - Reliable Effects Screening: A Distributed Continuous Quality Assurance Process for Monitoring Performance Degradation in Evolving Software Systems JF - Software Engineering, IEEE Transactions on Y1 - 2007 A1 - Yilmaz,C. A1 - Porter, Adam A1 - Krishna,A. S A1 - Memon, Atif M. A1 - Schmidt,D. C A1 - Gokhale,A.S. A1 - Natarajan,B. KW - configuration subset KW - distributed continuous quality assurance process KW - evolving software systems KW - in house testing KW - main effects screening KW - performance bottlenecks KW - performance degradation monitoring KW - performance intensive software systems KW - process configuration KW - process execution KW - program testing KW - regression testing KW - reliable effects screening KW - software benchmarks KW - Software performance KW - software performance evaluation KW - Software quality KW - software reliability KW - tool support AB - Developers of highly configurable performance-intensive software systems often use in-house performance-oriented "regression testing" to ensure that their modifications do not adversely affect their software's performance across its large configuration space. Unfortunately, time and resource constraints can limit in-house testing to a relatively small number of possible configurations, followed by unreliable extrapolation from these results to the entire configuration space. As a result, many performance bottlenecks escape detection until systems are fielded. In our earlier work, we improved the situation outlined above by developing an initial quality assurance process called "main effects screening". This process 1) executes formally designed experiments to identify an appropriate subset of configurations on which to base the performance-oriented regression testing, 2) executes benchmarks on this subset whenever the software changes, and 3) provides tool support for executing these actions on in-the-field and in-house computing resources. Our initial process had several limitations, however, since it was manually configured (which was tedious and error-prone) and relied on strong and untested assumptions for its accuracy (which made its use unacceptably risky in practice). This paper presents a new quality assurance process called "reliable effects screening" that provides three significant improvements to our earlier work. First, it allows developers to economically verify key assumptions during process execution. Second, it integrates several model-driven engineering tools to make process configuration and execution much easier and less error prone. Third, we evaluate this process via several feasibility studies of three large, widely used performance-intensive software frameworks. Our results indicate that reliable effects screening can detect performance degradation in large-scale systems more reliably and with significantly less resources than conventional t- echniques VL - 33 SN - 0098-5589 CP - 2 M3 - 10.1109/TSE.2007.20 ER - TY - CONF T1 - Using GUI Run-Time State as Feedback to Generate Test Cases T2 - Software Engineering, 2007. ICSE 2007. 29th International Conference on Y1 - 2007 A1 - Xun Yuan A1 - Memon, Atif M. KW - application under test KW - automated test case generation KW - Feedback KW - feedback-based technique KW - Graphical user interfaces KW - GUI run-time state KW - model-driven technique KW - open-source software KW - program testing KW - public domain software KW - reverse engineering KW - reverse-engineering algorithm KW - seed test suite AB - This paper presents a new automated model-driven technique to generate test cases by using feedback from the execution of a "seed test suite" on an application under test (AUT). The test cases in the seed suite are designed to be generated automatically and executed very quickly. During their execution, feedback obtained from the AUT's run-time state is used to generate new, "improved" test cases. The new test cases subsequently become part of the seed suite. This "anytime technique" continues iteratively, generating and executing additional test cases until resources are exhausted or testing goals have been met. The feedback-based technique is demonstrated for automated testing of graphical user interfaces (GUIs). An existing abstract model of the GUI is used to automatically generate the seed test suite. It is executed; during its execution, state changes in the GUI pinpoint important relationships between GUI events, which evolve the model and help to generate new test cases. Together with a reverse- engineering algorithm used to obtain the initial model and seed suite, the feedback-based technique yields a fully automatic, end-to-end GUI testing process. A feasibility study on four large fielded open-source software (OSS) applications demonstrates that this process is able to significantly improve existing techniques and help identify/report serious problems in the OSS. In response, these problems have been fixed by the developers of the OSS in subsequent versions. JA - Software Engineering, 2007. ICSE 2007. 29th International Conference on M3 - 10.1109/ICSE.2007.94 ER - TY - CONF T1 - Studying the Characteristics of a "Good" GUI Test Suite T2 - Software Reliability Engineering, 2006. ISSRE '06. 17th International Symposium on Y1 - 2006 A1 - Xie,Qing A1 - Memon, Atif M. KW - Fault detection KW - Fault diagnosis KW - graphical user interface testing KW - Graphical user interfaces KW - program debugging KW - program testing AB - The widespread deployment of graphical-user interfaces (GUIs) has increased the overall complexity of testing. A GUI test designer needs to perform the daunting task of adequately testing the GUI, which typically has very large input interaction spaces, while considering tradeoffs between GUI test suite characteristics such as the number of test cases (each modeled as a sequence of events), their lengths, and the event composition of each test case. There are no published empirical studies on GUI testing that a GUI test designer may reference to make decisions about these characteristics. Consequently, in practice, very few GUI testers know how to design their test suites. This paper takes the first step towards assisting in GUI test design by presenting an empirical study that evaluates the effect of these characteristics on testing cost and fault detection effectiveness. The results show that two factors significantly effect the fault-detection effectiveness of a test suite: (1) the diversity of states in which an event executes and (2) the event coverage of the suite. Test designers need to improve the diversity of states in which each event executes by developing a large number of short test cases to detect the majority of "shallow" faults, which are artifacts of modern GUI design. Additional resources should be used to develop a small number of long test cases to detect a small number of "deep" faults JA - Software Reliability Engineering, 2006. ISSRE '06. 17th International Symposium on M3 - 10.1109/ISSRE.2006.45 ER - TY - CONF T1 - Call stack coverage for test suite reduction T2 - Software Maintenance, 2005. ICSM'05. Proceedings of the 21st IEEE International Conference on Y1 - 2005 A1 - McMaster,S. A1 - Memon, Atif M. KW - call stack coverage KW - component reuse KW - Fault detection KW - language-independent information KW - multi-language implementation KW - program testing KW - software development KW - software fault tolerance KW - Software maintenance KW - software reusability KW - space antenna-steering application KW - stringent performance requirement KW - systems analysis KW - test suite reduction algorithm AB - Test suite reduction is an important test maintenance activity that attempts to reduce the size of a test suite with respect to some criteria. Emerging trends in software development such as component reuse, multi-language implementations, and stringent performance requirements present new challenges for existing reduction techniques that may limit their applicability. A test suite reduction technique that is not affected by these challenges is presented; it is based on dynamically generated language-independent information that can be collected with little run-time overhead. Specifically, test cases from the suite being reduced are executed on the application under test and the call stacks produced during execution are recorded. These call stacks are then used as a coverage requirement in a test suite reduction algorithm. Results of experiments on test suites for the space antenna-steering application show significant reduction in test suite size at the cost of a moderate loss in fault detection effectiveness. JA - Software Maintenance, 2005. ICSM'05. Proceedings of the 21st IEEE International Conference on M3 - 10.1109/ICSM.2005.29 ER - TY - CONF T1 - Rapid "crash testing" for continuously evolving GUI-based software applications T2 - Software Maintenance, 2005. ICSM'05. Proceedings of the 21st IEEE International Conference on Y1 - 2005 A1 - Xie,Q. A1 - Memon, Atif M. KW - crash testing KW - graphical user interface software retesting KW - Graphical user interfaces KW - GUI-based software application KW - immediate feedback KW - program testing KW - rapid-feedback-based quality assurance KW - software evolution KW - Software maintenance KW - software prototyping KW - Software quality AB - Several rapid-feedback-based quality assurance mechanisms are used to manage the quality of continuously evolving software. Even though graphical user interfaces (GUIs) are one of the most important parts of software, there are currently no mechanisms to quickly retest evolving GUI software. We leverage our previous work on GUI testing to define a new automatic GUI re-testing process called "crash testing" that is integrated with GUI evolution. We describe two levels of crash testing: (1) immediate feedback-based in which a developer indicates that a GUI bug was fixed in response to a previously reported crash; only select crash test cases are rerun and the developer is notified of the results in a matter of seconds, and (2) between code changes in which new crash test cases are generated on-the-fly and executed on the GUI. Since the code may be changed by another developer before all the crash tests have been executed, hence requiring restarting of the process, we use a simple rotation-based scheme to ensure that all crash tests are executed over a series of code changes. We show, via empirical studies, that our crash tests are effective at revealing serious problems in the GUI. JA - Software Maintenance, 2005. ICSM'05. Proceedings of the 21st IEEE International Conference on M3 - 10.1109/ICSM.2005.72 ER - TY - JOUR T1 - Studying the fault-detection effectiveness of GUI test cases for rapidly evolving software JF - Software Engineering, IEEE Transactions on Y1 - 2005 A1 - Memon, Atif M. A1 - Xie,Q. KW - daily automated regression tester KW - Fault diagnosis KW - fault-detection KW - formal specification KW - formal verification KW - Graphical user interfaces KW - GUI test cases KW - program testing KW - quality assurance mechanism KW - rapidly evolving software KW - smoke regression testing technique KW - software development KW - software fault tolerance KW - Software maintenance KW - software prototyping KW - Software quality KW - test oracles AB - Software is increasingly being developed/maintained by multiple, often geographically distributed developers working concurrently. Consequently, rapid-feedback-based quality assurance mechanisms such as daily builds and smoke regression tests, which help to detect and eliminate defects early during software development and maintenance, have become important. This paper addresses a major weakness of current smoke regression testing techniques, i.e., their inability to automatically (re)test graphical user interfaces (GUIs). Several contributions are made to the area of GUI smoke testing. First, the requirements for GUI smoke testing are identified and a GUI smoke test is formally defined as a specialized sequence of events. Second, a GUI smoke regression testing process called daily automated regression tester (DART) that automates GUI smoke testing is presented. Third, the interplay between several characteristics of GUI smoke test suites including their size, fault detection ability, and test oracles is empirically studied. The results show that: 1) the entire smoke testing process is feasible in terms of execution time, storage space, and manual effort, 2) smoke tests cannot cover certain parts of the application code, 3) having comprehensive test oracles may make up for not having long smoke test cases, and 4) using certain oracles can make up for not having large smoke test suites. VL - 31 SN - 0098-5589 CP - 10 M3 - 10.1109/TSE.2005.117 ER - TY - CONF T1 - Empirical evaluation of the fault-detection effectiveness of smoke regression test cases for GUI-based software T2 - Software Maintenance, 2004. Proceedings. 20th IEEE International Conference on Y1 - 2004 A1 - Memon, Atif M. A1 - Xie,Qing KW - daily automated regression tester KW - daily builds KW - fault-detection effectiveness KW - graphical user interface KW - Graphical user interfaces KW - GUI-based software KW - program testing KW - Quality assurance KW - Regression analysis KW - smoke regression test cases KW - software development KW - software fault tolerance KW - Software maintenance KW - Software quality KW - software quality assurance KW - test oracle complexity KW - test oracles KW - test-case length AB - Daily builds and smoke regression tests have become popular quality assurance mechanisms to detect defects early during software development and maintenance. In previous work, we addressed a major weakness of current smoke regression testing techniques, i.e., their lack of ability to automatically (re)test graphical user interface (GUI) event interactions - we presented a GUI smoke regression testing process called daily automated regression tester (DART). We have deployed DART and have found several interesting characteristics of GUI smoke tests that we empirically demonstrate in this paper. We also combine smoke tests with different types of test oracles and present guidelines for practitioners to help them generate and execute the most effective combinations of test-case length and test oracle complexity. Our experimental subjects consist of four GUI-based applications. We generate 5000-8000 smoke tests (enough to be run in one night) for each application. Our results show that: (1) short GUI smoke tests with certain test oracles are effective at detecting a large number of faults; (2) there are classes of faults that our smoke test cannot detect; (3) short smoke tests execute a large percentage of code; and (4) the entire smoke testing process is feasible to do in terms of execution time and storage space. JA - Software Maintenance, 2004. Proceedings. 20th IEEE International Conference on M3 - 10.1109/ICSM.2004.1357785 ER - TY - CONF T1 - Using transient/persistent errors to develop automated test oracles for event-driven software T2 - Automated Software Engineering, 2004. Proceedings. 19th International Conference on Y1 - 2004 A1 - Memon, Atif M. A1 - Xie,Qing KW - automated test oracles KW - Automatic testing KW - event driven software KW - Graphical user interfaces KW - persistent errors KW - program testing KW - resource allocation KW - resource utilization KW - software intensive systems KW - test case execution KW - transient errors AB - Today's software-intensive systems contain an important class of software, namely event-driven software (EDS). All EDS take events as input, change their state, and (perhaps) output an event sequence. EDS is typically implemented as a collection of event-handlers designed to respond to individual events. The nature of EDS creates new challenges for test automation. In this paper, we focus on those relevant to automated test oracles. A test oracle is a mechanism that determines whether a software executed correctly for a test case. A test case for an EDS consists of a sequence of events. The test case is executed on the EDS, one event at a time. Errors in the EDS may "appear" and later ”disappear" at several points (e.g., after an event is executed) during test case execution. Because of the behavior of these transient (those that disappear) and persistent (those that don't disappear) errors, EDS require complex and expensive test oracles that compare the expected and actual output multiple times during test case execution. We leverage our previous work to study several applications and observe the occurrence of persistent/transient errors. Our studies show that in practice, a large number of errors in EDS are transient and that there are specific classes of events that lead to transient errors. We use the results of this study to develop a new test oracle that compares the expected and actual output at strategic points during test case execution. We show that the oracle is effective at detecting errors and efficient in terms of resource utilization JA - Automated Software Engineering, 2004. Proceedings. 19th International Conference on M3 - 10.1109/ASE.2004.1342736 ER - TY - CONF T1 - DART: a framework for regression testing "nightly/daily builds" of GUI applications T2 - Software Maintenance, 2003. ICSM 2003. Proceedings. International Conference on Y1 - 2003 A1 - Memon, Atif M. A1 - Banerjee,I. A1 - Hashmi,N. A1 - Nagarajan,A. KW - automated retesting KW - automatic test software KW - coverage evaluation KW - daily automated regression tester KW - DART KW - frequent retesting KW - graphical user interface KW - Graphical user interfaces KW - GUI software KW - instrumentation coding KW - program testing KW - regression testing KW - Software development management KW - Software development process KW - Software maintenance KW - Software quality KW - structural GUI analysis KW - Test Case Generation KW - test cases regeneration KW - Test execution KW - test oracle creation AB - "Nightly/daily building and smoke testing" have become widespread since they often reveal bugs early in the software development process. During these builds, software is compiled, linked, and (re)tested with the goal of validating its basic functionality. Although successful for conventional software, smoke tests are difficult to develop and automatically rerun for software that has a graphical user interface (GUI). In this paper, we describe a framework called DART (daily automated regression tester) that addresses the needs of frequent and automated re-testing of GUI software. The key to our success is automation: DART automates everything from structural GUI analysis; test case generation; test oracle creation; to code instrumentation; test execution; coverage evaluation; regeneration of test cases; and their re-execution. Together with the operating system's task scheduler, DART can execute frequently with little input from the developer/tester to retest the GUI software. We provide results of experiments showing the time taken and memory required for GUI analysis, test case and test oracle generation, and test execution. We also empirically compare the relative costs of employing different levels of detail in the GUI test cases. JA - Software Maintenance, 2003. ICSM 2003. Proceedings. International Conference on M3 - 10.1109/ICSM.2003.1235451 ER - TY - CONF T1 - What test oracle should I use for effective GUI testing? T2 - Automated Software Engineering, 2003. Proceedings. 18th IEEE International Conference on Y1 - 2003 A1 - Memon, Atif M. A1 - Banerjee,I. A1 - Nagarajan,A. KW - empirical studies KW - graphical user interface KW - Graphical user interfaces KW - GUI testing KW - oracle information KW - oracle procedure KW - oracle space requirement KW - oracle time requirements KW - program testing KW - software engineering KW - Software testing KW - test cost KW - test effectiveness KW - test oracle AB - Test designers widely believe that the overall effectiveness and cost of software testing depends largely on the type and number of test cases executed on the software. In this paper we show that the test oracle used during testing also contributes significantly to test effectiveness and cost. A test oracle is a mechanism that determines whether software executed correctly for a test case. We define a test oracle to contain two essential parts: oracle information that represents expected output; and an oracle procedure that compares the oracle information with the actual output. By varying the level of detail of oracle information and changing the oracle procedure, a test designer can create different types of test oracles. We design 11 types of test oracles and empirically compare them on four software systems. We seed faults in software to create 100 faulty versions, execute 600 test cases on each version, for all 11 types of oracles. In all, we report results of 660,000 test runs on software. We show (1) the time and space requirements of the oracles, (2) that faults are detected early in the testing process when using detailed oracle information and complex oracle procedures, although at a higher cost per test case, and (3) that employing expensive oracles results in detecting a large number of faults using relatively smaller number of test cases. JA - Automated Software Engineering, 2003. Proceedings. 18th IEEE International Conference on M3 - 10.1109/ASE.2003.1240304 ER - TY - JOUR T1 - Hierarchical GUI test case generation using automated planning JF - Software Engineering, IEEE Transactions on Y1 - 2001 A1 - Memon, Atif M. A1 - Pollack,M. E A1 - Soffa,M. L KW - Artificial intelligence KW - automated planning KW - automatic test case generation KW - Automatic testing KW - correctness testing KW - goal state KW - Graphical user interfaces KW - hierarchical GUI test case generation KW - initial state KW - Microsoft WordPad KW - operators KW - plan-generation system KW - planning (artificial intelligence) KW - Planning Assisted Tester for Graphical User Interface Systems KW - program testing KW - software AB - The widespread use of GUIs for interacting with software is leading to the construction of more and more complex GUIs. With the growing complexity come challenges in testing the correctness of a GUI and its underlying software. We present a new technique to automatically generate test cases for GUIs that exploits planning, a well-developed and used technique in artificial intelligence. Given a set of operators, an initial state, and a goal state, a planner produces a sequence of the operators that will transform the initial state to the goal state. Our test case generation technique enables efficient application of planning by first creating a hierarchical model of a GUI based on its structure. The GUI model consists of hierarchical planning operators representing the possible events in the GUI. The test designer defines the preconditions and effects of the hierarchical operators, which are input into a plan-generation system. The test designer also creates scenarios that represent typical initial and goal states for a GUI user. The planner then generates plans representing sequences of GUI interactions that a user might employ to reach the goal state from the initial state. We implemented our test case generation system, called Planning Assisted Tester for Graphical User Interface Systems (PATHS) and experimentally evaluated its practicality and effectiveness. We describe a prototype implementation of PATHS and report on the results of controlled experiments to generate test cases for Microsoft's WordPad VL - 27 SN - 0098-5589 CP - 2 M3 - 10.1109/32.908959 ER - TY - CONF T1 - Fault injection based on a partial view of the global state of a distributed system Y1 - 1999 A1 - Michel Cukier A1 - Chandra,R. A1 - Henke,D. A1 - Pistole,J. A1 - Sanders,W. H. KW - bounding technique KW - clock synchronization KW - distributed programming KW - distributed software systems KW - fault injection KW - Loki KW - post-runtime analysis KW - program testing KW - program verification KW - software reliability KW - Synchronisation AB - This paper describes the basis for and preliminary implementation of a new fault injector, called Loki, developed specifically for distributed systems. Loki addresses issues related to injecting correlated faults in distributed systems. In Loki, fault injection is performed based on a partial view of the global state of an application. In particular, facilities are provided to pass user-specified state information between nodes to provide a partial view of the global state in order to try to inject complex faults successfully. A post-runtime analysis, using an off-line clock synchronization and a bounding technique, is used to place events and injections on a single global time-line and determine whether the intended faults were properly injected. Finally, observations containing successful fault injections are used to estimate specified dependability measures. In addition to describing the details of our new approach, we present experimental results obtained from a preliminary implementation in order to illustrate Loki's ability to inject complex faults predictably M3 - 10.1109/RELDIS.1999.805093 ER - TY - CONF T1 - Using a goal-driven approach to generate test cases for GUIs T2 - Software Engineering, 1999. Proceedings of the 1999 International Conference on Y1 - 1999 A1 - Memon, Atif M. A1 - Pollack,M. E A1 - Soffa,M. L KW - Artificial intelligence KW - automatic test case generation KW - goal state KW - goal-driven approach KW - Graphical user interfaces KW - GUIs KW - hierarchical planning operators KW - initial state KW - Microsoft Word-Pad KW - operators KW - planning (artificial intelligence) KW - program testing KW - software KW - verification commands AB - The widespread use of GUIs for interacting with software is leading to the construction of more and more complex GUIs. With the growing complexity comes challenges in testing the correctness of a GUI and the underlying software. We present a new technique to automatically generate test cases for GUIs that exploits planning, a well developed and used technique in artificial intelligence. Given a set of operators, an initial state and a goal state, a planner produces a sequence of the operators that will change the initial state to the goal state. Our test case generation technique first analyzes a GUI and derives hierarchical planning operators from the actions in the GUI. The test designer determines the preconditions and effects of the hierarchical operators, which are then input into a planning system. With the knowledge of the GUI and the way in which the user will interact with the GUI, the test designer creates sets of initial and goal states. Given these initial and final states of the GUI, a hierarchical planner produces plans, or a set of test cases, that enable the goal state to be reached. Our technique has the additional benefit of putting verification commands into the test cases automatically. We implemented our technique by developing the GUI analyzer and extending a planner. We generated test cases for Microsoft's Word-Pad to demonstrate the viability and practicality of the approach. JA - Software Engineering, 1999. Proceedings of the 1999 International Conference on ER - TY - CONF T1 - Performance measurement using low perturbation and high precision hardware assists T2 - , The 19th IEEE Real-Time Systems Symposium, 1998. Proceedings Y1 - 1998 A1 - Mink, A. A1 - Salamon, W. A1 - Hollingsworth, Jeffrey K A1 - Arunachalam, R. KW - Clocks KW - Computerized monitoring KW - Counting circuits KW - Debugging KW - Hardware KW - hardware performance monitor KW - high precision hardware assists KW - low perturbation KW - measurement KW - MPI message passing library KW - MultiKron hardware performance monitor KW - MultiKron PCI KW - NIST KW - online performance monitoring tools KW - Paradyn parallel performance measurement tools KW - PCI bus slot KW - performance bug KW - performance evaluation KW - performance measurement KW - program debugging KW - program testing KW - real-time systems KW - Runtime KW - Timing AB - We present the design and implementation of MultiKron PCI, a hardware performance monitor that can be plugged into any computer with a free PCI bus slot. The monitor provides a series of high-resolution timers, and the ability to monitor the utilization of the PCI bus. We also demonstrate how the monitor can be integrated with online performance monitoring tools such as the Paradyn parallel performance measurement tools to improve the overhead of key timer operations by a factor of 25. In addition, we present a series of case studies using the MultiKron hardware performance monitor to measure and tune high-performance parallel completing applications. By using the monitor, we were able to find and correct a performance bug in a popular implementation of the MPI message passing library that caused some communication primitives to run at one half their potential speed JA - , The 19th IEEE Real-Time Systems Symposium, 1998. Proceedings PB - IEEE SN - 0-8186-9212-X M3 - 10.1109/REAL.1998.739771 ER - TY - JOUR T1 - Going beyond integer programming with the Omega test to eliminate false data dependences JF - IEEE Transactions on Parallel and Distributed Systems Y1 - 1995 A1 - Pugh, William A1 - Wonnacott,D. KW - Algorithm design and analysis KW - Arithmetic KW - Computer science KW - Data analysis KW - false data dependences KW - integer programming KW - Linear programming KW - Omega test KW - Privatization KW - Production KW - production compilers KW - program compilers KW - Program processors KW - program testing KW - program transformations KW - Testing AB - Array data dependence analysis methods currently in use generate false dependences that can prevent useful program transformations. These false dependences arise because the questions asked are conservative approximations to the questions we really should be asking. Unfortunately, the questions we really should be asking go beyond integer programming and require decision procedures for a subclass of Presburger formulas. In this paper, we describe how to extend the Omega test so that it can answer these queries and allow us to eliminate these false data dependences. We have implemented the techniques described here and believe they are suitable for use in production compilers VL - 6 SN - 1045-9219 CP - 2 M3 - 10.1109/71.342135 ER -