Next: 2.4.1 Pruning
Up: 2 Existing work on
Previous: 2.3.3 Ordered vs. unordered
One of the main difficulties of inducing a recursive partitioning
structure is knowing when to stop. Obtaining the ``right'' sized trees
may be important for several reasons, which depend on the size of the
classification problem [117]. For moderate sized
problems, the critical issues are generalization accuracy, honest
error rate estimation
and gaining insight into the
predictive and generalization structure of the data. For very large
tree classifiers, the critical issue is optimizing structural
properties (height, balance etc.) [366,47].
Breiman et al. [29] pointed out that tree
quality depends more on good stopping rules than on splitting
rules. Effects of noise on generalization are discussed in
[270,183]. Overfitting avoidance as a
specific bias is studied in [377,317]. Effect of
noise on classification tree construction methods is studied in the
pattern recognition literature in [346].
Several techniques have been suggested for obtaining the right sized
trees. The most popular of these is pruning, whose discussion we
will defer to Section 2.4.1. The following are some
alternatives to pruning that have been attempted in the literature.
- Restrictions on minimum node size: A node is not split if it has
smaller than k objects, where k is a parameter to the tree
induction algorithm. This strategy, which is known to be not
robust, is used in some early methods [110].
- Two stage search: In this variant, tree induction is divided
into two subtasks: first, a good structure for the tree is
determined; then splits are found at all the
nodes.
The optimization method in the first
stage may or may not be related to that used in the second stage.
Lin and Fu [214] use K-means clustering for both
stages, whereas Qing-Yun and Fu [289] use
multi-variate stepwise regression for the first stage and linear
discriminant analysis for the second stage.
- Thresholds on Impurity: In this method, a threshold is imposed
on the value of the splitting criterion, such that if the splitting
criterion falls below (above) the threshold, tree growth is aborted.
Thresholds can be imposed on local (i.e., individual node) goodness
measures or on global (i.e., entire tree) goodness. The former
alternative is used in
[124,307,291,231] and the
latter in [324]. A problem with the former
method is that the value of most splitting criteria
(Section 2.3.1) varies with the size of the
training sample. Imposing a single threshold that is meaningful at
all nodes in the tree is not easy and may not even be possible. Some
feature evaluation rules, whose distribution does not depend
on the number of training samples (i.e., a goodness value of k
would have the same significance anywhere in the tree) have been
suggested in the literature
[210,382,170].
- Trees to rules conversion: Quinlan [293,297]
gave efficient procedures for converting a decision tree into a set
of production rules. Simple heuristics to generalize and combine the
rules generated from trees can act as a substitute for pruning for
Quinlan's univariate trees.
- Other: Cockett and Herrera [59] suggested a
method to reduce an arbitrary binary decision tree to an
``irreducible'' form, using discrete decision theory principles.
Every irreducible tree is optimal with respect to some expected
testing cost criterion, and the tree reduction algorithm has the
same worst-case complexity as most greedy tree induction methods. In
the context of ordered binary decision diagrams, tree compaction has
been attempted using operations that merge, delete and exchange
nodes [36].
Next: 2.4.1 Pruning
Up: 2 Existing work on
Previous: 2.3.3 Ordered vs. unordered
Sreerama Murthy
Thu Oct 19 17:40:24 EDT 1995