Skip to content

Random Forest


🎯 Purpose

This QuickRef explains how to use the Random Forest Classifier β€” a powerful, ensemble-based method that reduces overfitting and improves predictive accuracy over single decision trees.


πŸ“¦ 1. When to Use

Condition Use RF?
You want better generalization than a single tree βœ… Yes
Mixed data types or missing values βœ… Yes (robust)
Need feature importance estimates βœ… Yes
Must deploy simple/explainable model ❌ Use shallow tree or logistic regression

🌲 2. Core Logic

  • Builds many randomized decision trees
  • Aggregates predictions via majority vote (classification)
  • Reduces variance while preserving flexibility

πŸ› οΈ 3. Fitting in sklearn

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, max_depth=None, random_state=42)
model.fit(X_train, y_train)

πŸ”§ 4. Key Hyperparameters

Param Description
n_estimators Number of trees in the forest
max_depth Maximum depth of each tree
max_features Number of features considered per split
min_samples_split Min samples needed to split an internal node
bootstrap Whether trees are built on bootstrap samples

πŸ“Š 5. Feature Importance

import matplotlib.pyplot as plt
importances = model.feature_importances_
plt.barh(X.columns, importances)

βœ”οΈ Use permutation importance or SHAP for deeper insight


⚠️ 6. Tips & Limitations

  • Less interpretable than a single tree
  • Slower to train & predict on large datasets
  • Can overfit if trees are too deep or not enough data

βœ… Checklist

  • [ ] Class imbalance reviewed (consider class_weight='balanced')
  • [ ] n_estimators tuned for performance
  • [ ] Tree depth + node size limited to prevent overfitting
  • [ ] Feature importances visualized
  • [ ] Cross-validated results confirmed

πŸ’‘ Tip

β€œRandom forests don’t overfit easily β€” but that doesn’t mean they’re immune to noise.”