Logistic Regression EDA
๐ฏ Purpose¶
This checklist outlines the key exploratory data analysis (EDA) steps to perform before fitting a logistic regression model. It focuses on checking class balance, predictor relationships, and key assumptions.
๐งญ Class Variable¶
- [ ] Target is binary (0/1 or Yes/No)
- [ ] Class imbalance assessed
- [ ] Considered SMOTE / reweighting if imbalanced
๐ Categorical Predictors¶
- [ ] Crosstabs and bar plots created
- [ ] Chi-square tests considered
๐ Numeric Predictors¶
- [ ] Boxplots and KDEs by class
- [ ] Checked for skewness or separation
๐ Linearity of Logit¶
- [ ] Binned predictor plotted vs outcome rate
- [ ] Curves โ applied transformation or engineered feature
๐งช Multicollinearity¶
- [ ] Correlation matrix reviewed
- [ ] Feature-to-Feature: Check for high correlation between numeric predictors.
- [ ] VIFs calculated and acted on
๐งฐ Optional Engineering¶
- [ ] Applied log/sqrt transformation for skewed predictors
- [ ] Created bins with
qcut
if useful
๐ง Final Tip¶
"For logistic regression, EDA is about finding separation. Look for any variable that splits your target class, even slightly."