next up previous
Next: 2.3.1 Feature evaluation rules Up: 2 Existing work on Previous: 2.2.2 Surveys

2.3 Finding splits

   

To build a decision tree, it is necessary to find, at each internal node, a split for the data. In case of univariate trees, finding a split amounts to finding an attribute which is the most ``useful'' in discriminating the input data, and finding a decision rule using the attribute. In case of multivariate trees, finding a split can be seen as finding a ``composite'' feature, a combination of (some of the) existing attributes that has good discriminatory power. In either of these cases, a basic task in tree building is to rank features (single or composite) according to their usefulness in discriminating the classes in the data.

The manner of growing a tree differs slightly from discipline to discipline, but several underlying concerns are the same. In pattern recognition and statistics literature, features are typically ranked using feature evaluation rules, and the single best feature or a good feature subset are chosen from the ranked list. In the context of ordered binary decision diagrams (OBDDs), the order in which variables are chosen at tree nodes determines the complexity of the OBDD, and many heuristics have been evaluated for variable order selection (eg., [339,111]). In machine learning, feature evaluation rules are used mainly for picking the single best feature at every node of the decision tree. Methods used for selecting a good subset of features are typically quite different and are used as preprocessing steps to tree induction. (We will discuss feature subset selection methods separately in Section 2.5.1.) Tree structure vector quantizers, when they were proposed [42], were grown one layer at a time, by splitting all nodes in the previous layer. Makhoul [225] introduced an unbalanced tree algorithm that grew the tree a node at a time. Riskin and Gray [305] proposed a greedy method for TSVQs, which is directly related to decision tree growing. gif





next up previous
Next: 2.3.1 Feature evaluation rules Up: 2 Existing work on Previous: 2.2.2 Surveys



Sreerama Murthy
Thu Oct 19 17:40:24 EDT 1995