Evaluating techniques for generating metric-based classification trees

Title	Evaluating techniques for generating metric-based classification trees
Publication Type	Journal Articles
Year of Publication	1990
Authors	Porter A, Selby RW
Journal	Journal of Systems and Software
Volume	12
Issue	3
Pagination	209 - 218
Date Published	1990/07//
ISBN Number	0164-1212
Abstract	Metric-based classification trees provide an approach for identifying user-specified classes of high-risk software components throughout the software lifecycle. Based on measurable attributes of software components and processors, this empirically guided approach derives models of problematic software components. These models, which are represented as classification trees, are used on future systems to identify components likely to share the same high-risk properties. Example high-risk component properties include being fault-prone, change-prone, or effort-prone, or containing certain types of faults. Identifying these components allows developers to focus the application of specialized techniques and tools for analyzing, testing, and constructing software. A validation study using metric data from 16 NASA systems showed that the trees had an average classification accuracy of 79.3% for fault-prone and effort-prone components in that environment.One fundamental feature of the classification tree generation algorithm is the method used for partitioning the metric data values into mutually exclusive and exhaustive ranges. This study compares the accuracy and the complexity of trees resulting from five techniques for partitioning metric data values. The techniques are quartiles, octiles, and three methods based on least weight subsequence (LWS-[chi]) analysis, where [chi] is the upper bound on the number of partitions. The LWS-3 and LWS-5 partition techniques resulted in trees with higher accuracy (in terms of completeness and consistency) than did quartiles and octiles. LWS-3 and LWS-5 trees were not statistically different in terms of accuracy, but LWS-3 trees had lower complexity than all other methods in terms of the number of unique metrics required. The trees from the three LWS methods (LWS-3, LWS-5, and LWS-8) had lower complexity than did the trees from quartiles and octiles. In general, the results indicate that distribution-sensitive partition techniques that use only relatively few partitions, such as the least weight subsequence techniques LWS-3 and LWS-5, can increase accuracy and decrease complexity in classification trees. Classification analysis techniques, along with other empirically based analysis techniques for large-scale software, will be supported in the Amadeus measurement and empirical analysis system.
URL	http://www.sciencedirect.com/science/article/pii/016412129090041J
DOI

Evaluating techniques for generating metric-based classification trees

Publications