Threshold Tuning

🎯 Purpose¶

Use this card to determine whether your logistic regression model needs threshold tuning — and how to choose a better cutoff than the default 0.5.

🔍 1. Why Thresholds Matter¶

By default, predict_proba() > 0.5 → positive class
But when classes are imbalanced or costs are asymmetric, you may want:
0.3 (catch more positives → higher recall)
0.7 (reduce false positives → higher precision)

⚖️ 2. When to Tune the Threshold¶

Situation	Action
Class imbalance present	Tune — 0.5 may misclassify minority class
Precision vs Recall tradeoff required	Tune to align with priority
Stakeholders need risk tiers or scores	Tune thresholds for each tier
F1 score weak even if accuracy is high	Tune to rebalance output

🧪 3. How to Find the Best Threshold¶

from sklearn.metrics import precision_recall_curve

precision, recall, thresholds = precision_recall_curve(y_true, y_proba)

# Find threshold where F1 is maximized
f1_scores = 2 * (precision * recall) / (precision + recall)
best_idx = f1_scores.argmax()
optimal_threshold = thresholds[best_idx]

✔️ Use this approach to optimize for F1, precision, or recall based on goal

🧭 4. Strategy by Business Goal¶

Priority	Threshold Strategy
Maximize recall (catch all positives)	Lower threshold (e.g. 0.3–0.4)
Maximize precision (avoid false alarms)	Raise threshold (e.g. 0.6–0.8)
Balanced F1 or AUC	Optimize empirically from curves
Risk banding (low/med/high)	Create multiple threshold bins

✅ Threshold Tuning Checklist¶

[ ] Used predict_proba() instead of hard 0/1 prediction
[ ] Target imbalance reviewed
[ ] Optimal threshold selected based on metric (F1, recall, etc.)
[ ] Stakeholders informed of threshold shift
[ ] Model outputs mapped to final decision logic

💡 Tip¶

“Don’t let 0.5 make the call — let your business priorities set the bar.”