Skip to content

EDA Workflow

๐ŸŽฏ Purpose

This checklist provides a high-level workflow for conducting exploratory data analysis (EDA). It covers the essential steps from initial data inspection to pre-modeling cleanup.

Related Guides

This checklist is a companion to the ๐Ÿ“Š General EDA Guidebook, ๐Ÿ“Š Visual EDA Interpretation Guide, and ๐Ÿ“˜ Advanced EDA Guidebook.

๐Ÿ“ฆ 1. Dataset Structure

  • [ ] .shape, .info() โ€” Dimensions & column types
  • [ ] .describe(include='all') โ€” Descriptive stats
  • [ ] .duplicated().sum() โ€” Check for row duplication
  • [ ] .memory_usage() โ€” Memory optimization

๐Ÿ” 2. Variable Exploration

Numeric

  • [ ] Histograms + KDE
  • [ ] Skew, kurtosis
  • [ ] Outlier detection (boxplot, z-score, IQR)

Categorical

  • [ ] Frequency counts
  • [ ] Unique counts / high cardinality flags
  • [ ] Encoding strategy (if needed)

๐Ÿ“ˆ 3. Distribution & Relationship Plots

  • [ ] Boxplot by category
  • [ ] Correlation heatmap (numeric only)
  • [ ] Pairplot or scatter matrix
  • [ ] Crosstabs (for categorical pairs)

๐Ÿงช 4. Missing Values

  • [ ] df.isnull().sum() summary
  • [ ] Heatmap of missingness
  • [ ] Consider: drop, fill, predictive imputation

๐Ÿ“‰ 5. Outliers

  • [ ] IQR filtering
  • [ ] Z-score filtering
  • [ ] Context/domain filtering (caps/floors)

๐Ÿ” 6. Scaling & Transformation

  • [ ] Log/sqrt transforms for skew
  • [ ] StandardScaler, MinMaxScaler, RobustScaler

๐Ÿง  7. Feature Engineering (Optional)

  • [ ] Bin continuous vars (if needed)
  • [ ] Polynomial or interaction terms
  • [ ] Flag rare categories or consolidate groups

๐Ÿงน 8. Pre-Model Cleanup

  • [ ] Drop ID / index / time features if not useful
  • [ ] Recheck all dtypes
  • [ ] Align feature set with model goals

๐Ÿง  Final Tip

"EDA is an iterative process. Don't be afraid to circle back to earlier steps as you uncover new insights."

Ready to export to modeling script or notebook? โœ