Machine Learning: CART
Decision tree nodes split into disjoint feature subspaces. The subspace represents a prediction. Multiple subspaces can end up with same prediction. The idea is to reach a leaf which is pure. So we need to know, the depth, value of K of a K-ary tree, threshold values for each splitting node and values of the leaf node. CART stands for Classification And Regression Tree. For classification minimization of error starts with calculating probability of a class in a region. The frequency of which can be regularized. So first question is to how to split the root node, which feature to pick, generally the one with minimum threshold and maximum impact on whole training set. The process of selecting the right feature is called information gain . Higher the information gain, better the feature to select. Gaining information reduces uncertainty. To measure uncertainty most popular techniques are Gini Index and Entropy. CART uses Gini Index. The misclassification error ranges between 0 to 1...