%0 Conference Paper %B , Conference on Software Maintenance, 1989., Proceedings %D 1989 %T Software metric classification trees help guide the maintenance of large-scale systems %A Selby,R. W %A Porter, Adam %K automated method %K automatic programming %K classification %K Classification tree analysis %K classification trees %K Computer errors %K empirically-based models %K error-prone software objects %K Fault diagnosis %K feasibility study %K high development effort %K Large-scale systems %K multivalued functions %K NASA %K NASA projects %K recursive algorithm %K Software algorithms %K software engineering %K Software maintenance %K Software measurement %K software metrics %K software modules %K Software systems %K trees (mathematics) %X The 80:20 rule states that approximately 20% of a software system is responsible for 80% of its errors. The authors propose an automated method for generating empirically-based models of error-prone software objects. These models are intended to help localize the troublesome 20%. The method uses a recursive algorithm to automatically generate classification trees whose nodes are multivalued functions based on software metrics. The purpose of the classification trees is to identify components that are likely to be error prone or costly, so that developers can focus their resources accordingly. A feasibility study was conducted using 16 NASA projects. On average, the classification trees correctly identified 79.3% of the software modules that had high development effort or faults %B , Conference on Software Maintenance, 1989., Proceedings %I IEEE %P 116 - 123 %8 1989/10/16/19 %@ 0-8186-1965-1 %G eng %R 10.1109/ICSM.1989.65202 %0 Journal Article %J IEEE Transactions on Software Engineering %D 1988 %T Learning from examples: generation and evaluation of decision trees for software resource analysis %A Selby,R. W %A Porter, Adam %K Analysis of variance %K Artificial intelligence %K Classification tree analysis %K Data analysis %K decision theory %K Decision trees %K Fault diagnosis %K Information analysis %K machine learning %K metrics %K NASA %K production environment %K software engineering %K software modules %K software resource analysis %K Software systems %K Termination of employment %K trees (mathematics) %X A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for one problem domain, specifically, that of software resource data analysis. The purpose of the decision trees is to identify classes of objects (software modules) that had high development effort, i.e. in the uppermost quartile relative to past data. Sixteen software systems ranging from 3000 to 112000 source lines have been selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4700 objects, capture a multitude of information about the objects: development effort, faults, changes, design style, and implementation style. A total of 9600 decision trees are automatically generated and evaluated. The analysis focuses on the characterization and evaluation of decision tree accuracy, complexity, and composition. The decision trees correctly identified 79.3% of the software modules that had high development effort or faults, on the average across all 9600 trees. The decision trees generated from the best parameter combinations correctly identified 88.4% of the modules on the average. Visualization of the results is emphasized, and sample decision trees are included %B IEEE Transactions on Software Engineering %V 14 %P 1743 - 1757 %8 1988/12// %@ 0098-5589 %G eng %N 12 %R 10.1109/32.9061