Skip to content

Logistic Regression EDA

๐ŸŽฏ Purpose

This checklist outlines the key exploratory data analysis (EDA) steps to perform before fitting a logistic regression model. It focuses on checking class balance, predictor relationships, and key assumptions.


๐Ÿงญ Class Variable

  • [ ] Target is binary (0/1 or Yes/No)
  • [ ] Class imbalance assessed
  • [ ] Considered SMOTE / reweighting if imbalanced

๐Ÿ“Š Categorical Predictors

  • [ ] Crosstabs and bar plots created
  • [ ] Chi-square tests considered

๐Ÿ“ˆ Numeric Predictors

  • [ ] Boxplots and KDEs by class
  • [ ] Checked for skewness or separation

๐Ÿ” Linearity of Logit

  • [ ] Binned predictor plotted vs outcome rate
  • [ ] Curves โ†’ applied transformation or engineered feature

๐Ÿงช Multicollinearity

  • [ ] Correlation matrix reviewed
  • [ ] Feature-to-Feature: Check for high correlation between numeric predictors.
  • [ ] VIFs calculated and acted on

๐Ÿงฐ Optional Engineering

  • [ ] Applied log/sqrt transformation for skewed predictors
  • [ ] Created bins with qcut if useful

๐Ÿง  Final Tip

"For logistic regression, EDA is about finding separation. Look for any variable that splits your target class, even slightly."