Threshold Tuning
π― Purpose¶
Use this card to determine whether your logistic regression model needs threshold tuning β and how to choose a better cutoff than the default 0.5.
π 1. Why Thresholds Matter¶
- By default,
predict_proba() > 0.5
β positive class -
But when classes are imbalanced or costs are asymmetric, you may want:
-
0.3 (catch more positives β higher recall)
- 0.7 (reduce false positives β higher precision)
βοΈ 2. When to Tune the Threshold¶
Situation | Action |
---|---|
Class imbalance present | Tune β 0.5 may misclassify minority class |
Precision vs Recall tradeoff required | Tune to align with priority |
Stakeholders need risk tiers or scores | Tune thresholds for each tier |
F1 score weak even if accuracy is high | Tune to rebalance output |
π§ͺ 3. How to Find the Best Threshold¶
from sklearn.metrics import precision_recall_curve
precision, recall, thresholds = precision_recall_curve(y_true, y_proba)
# Find threshold where F1 is maximized
f1_scores = 2 * (precision * recall) / (precision + recall)
best_idx = f1_scores.argmax()
optimal_threshold = thresholds[best_idx]
βοΈ Use this approach to optimize for F1, precision, or recall based on goal
π§ 4. Strategy by Business Goal¶
Priority | Threshold Strategy |
---|---|
Maximize recall (catch all positives) | Lower threshold (e.g. 0.3β0.4) |
Maximize precision (avoid false alarms) | Raise threshold (e.g. 0.6β0.8) |
Balanced F1 or AUC | Optimize empirically from curves |
Risk banding (low/med/high) | Create multiple threshold bins |
β Threshold Tuning Checklist¶
- [ ] Used
predict_proba()
instead of hard 0/1 prediction - [ ] Target imbalance reviewed
- [ ] Optimal threshold selected based on metric (F1, recall, etc.)
- [ ] Stakeholders informed of threshold shift
- [ ] Model outputs mapped to final decision logic
π‘ Tip¶
βDonβt let 0.5 make the call β let your business priorities set the bar.β