Skip to content

Logistic Regression

🎯 Purpose

This QuickRef guides you through EDA prep, fitting, diagnostics, and interpretation for binary logistic regression. Designed for notebook use, it merges statistical intuition with model-readiness.


πŸ“¦ 1. EDA Prep for Logistic Regression

Check Code
Target balance df['target'].value_counts(normalize=True)
Class imbalance warning >80% in one class β†’ stratify or use SMOTE
Correlation matrix sns.heatmap(df.corr())
VIF check variance_inflation_factor(X.values, i)
Feature skew df.skew() or histograms

βœ”οΈ Ensure target is binary (0/1 or Yes/No encoded) βœ”οΈ Drop or flag highly imbalanced fields


πŸ”„ 2. Logit Linearity & Transform Triggers

Problem Action
X vs logit not linear Use log/sqrt transform or binning
Predictor skewed Consider log or Yeo-Johnson transform
Interaction suspected Add interaction term (X1 * X2)
# Visual: Box-Tidwell or residual plots
sns.regplot(x='X', y=np.log(p / (1 - p)))

πŸ” 3. Model Fitting (statsmodels or sklearn)

# statsmodels
import statsmodels.api as sm
model = sm.Logit(y, sm.add_constant(X)).fit()
model.summary()

# sklearn
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression().fit(X_train, y_train)

βœ”οΈ Use C=1/lambda for sklearn regularization tuning


πŸ“ 4. Output Interpretation

Metric Interpretation
Coef > 0 Predictor ↑ β†’ higher odds of event
Coef < 0 Predictor ↑ β†’ lower odds of event
p < 0.05 Statistically significant effect
Odds Ratio np.exp(coef) = multiplicative change in odds
# Odds ratio calculation
np.exp(model.params)

πŸ“Š 5. Model Evaluation

Metric Use When...
Accuracy Balanced classes and cost-insensitive task
Precision False positives are costly
Recall False negatives are costly
AUC/ROC General ranking ability (probabilistic power)
from sklearn.metrics import roc_auc_score, confusion_matrix, classification_report

βœ”οΈ Use threshold tuning (predict_proba() > t) when needed


βœ… Logistic Modeling Checklist

  • [ ] Target class checked for balance
  • [ ] Predictors tested for logit linearity
  • [ ] Model fit using proper library (statsmodels/sklearn)
  • [ ] Coefficients interpreted and odds calculated
  • [ ] Evaluation metric selected based on real-world cost

πŸ’‘ Tip

β€œLogistic regression doesn’t just predict events β€” it explains the odds of them.”