To build a decision tree, it is necessary to find, at each internal node, a split for the data. In case of univariate trees, finding a split amounts to finding an attribute which is the most ``useful'' in discriminating the input data, and finding a decision rule using the attribute. In case of multivariate trees, finding a split can be seen as finding a ``composite'' feature, a combination of (some of the) existing attributes that has good discriminatory power. In either of these cases, a basic task in tree building is to rank features (single or composite) according to their usefulness in discriminating the classes in the data.
The manner of growing a tree differs slightly from discipline to
discipline, but several underlying concerns are the same. In pattern
recognition and statistics literature, features are typically ranked
using feature evaluation rules, and the single best feature
or a good feature subset are chosen from the ranked list. In the
context of ordered binary decision diagrams (OBDDs), the order in
which variables are chosen at tree nodes determines the complexity of
the OBDD, and many heuristics have been evaluated for variable order
selection (eg., [339,111]). In machine
learning, feature evaluation rules are used mainly for picking the
single best feature at every node of the decision tree. Methods used
for selecting a good subset of features are typically quite different
and are used as preprocessing steps to tree induction. (We will
discuss feature subset selection methods separately in
Section 2.5.1.) Tree structure vector
quantizers, when they were proposed [42], were grown
one layer at a time, by splitting all nodes in the previous layer.
Makhoul [225] introduced an unbalanced tree
algorithm that grew the tree a node at a time. Riskin and Gray
[305] proposed a greedy method for TSVQs, which is
directly related to decision tree growing.