Decision Tree
π― Purpose
This QuickRef outlines how to use Decision Tree Classifiers for supervised classification tasks. It covers fit logic, splitting rules, hyperparameters, and interpretation essentials.
π¦ 1. When to Use
Condition |
Use DT Classifier? |
You need interpretable splits |
β
Yes |
Non-linear relationships exist |
β
Yes |
Mixed data types (cat + num) |
β
Yes |
Small data with fast results |
β
Yes |
You need top-tier performance |
β Consider ensemble (RF/XGBoost) |
π³ 2. Core Logic
- Recursively splits data to minimize impurity in child nodes
- Each leaf = class prediction
π Splitting Criteria
Criterion |
Meaning |
Gini (default) |
Measures node impurity |
Entropy |
Information gain |
π οΈ 3. Fitting in sklearn
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier(criterion='gini', max_depth=3)
model.fit(X_train, y_train)
# Visualization (simple trees)
from sklearn.tree import plot_tree
plot_tree(model, feature_names=X.columns)
π§ͺ 4. Key Hyperparameters
Param |
Description |
max_depth |
Limits tree depth (reduces overfitting) |
min_samples_split |
Min samples required to split a node |
min_samples_leaf |
Min samples in a leaf node |
max_features |
Limits features considered at each split |
π 5. Evaluation
Metric |
Use When... |
Accuracy |
Balanced classes |
Precision/Recall |
Imbalanced classes |
AUC/ROC |
Probabilistic ranking (via predict_proba ) |
βοΈ Use cross-validation or pruning to reduce overfitting
β
Checklist
- [ ] Split criterion chosen (gini or entropy)
- [ ] Tree depth and node size parameters tuned
- [ ] Class imbalance reviewed (consider weighted class option)
- [ ] Visual interpretation exported (tree plot)
- [ ] Overfitting controlled via early stopping or pruning
π‘ Tip
βDecision Trees let you explain your predictions β one split at a time.β