EDA Workflow
๐ฏ Purpose¶
This checklist provides a high-level workflow for conducting exploratory data analysis (EDA). It covers the essential steps from initial data inspection to pre-modeling cleanup.
Related Guides
This checklist is a companion to the ๐ General EDA Guidebook, ๐ Visual EDA Interpretation Guide, and ๐ Advanced EDA Guidebook.
๐ฆ 1. Dataset Structure¶
- [ ]
.shape
,.info()
โ Dimensions & column types - [ ]
.describe(include='all')
โ Descriptive stats - [ ]
.duplicated().sum()
โ Check for row duplication - [ ]
.memory_usage()
โ Memory optimization
๐ 2. Variable Exploration¶
Numeric¶
- [ ] Histograms + KDE
- [ ] Skew, kurtosis
- [ ] Outlier detection (boxplot, z-score, IQR)
Categorical¶
- [ ] Frequency counts
- [ ] Unique counts / high cardinality flags
- [ ] Encoding strategy (if needed)
๐ 3. Distribution & Relationship Plots¶
- [ ] Boxplot by category
- [ ] Correlation heatmap (numeric only)
- [ ] Pairplot or scatter matrix
- [ ] Crosstabs (for categorical pairs)
๐งช 4. Missing Values¶
- [ ]
df.isnull().sum()
summary - [ ] Heatmap of missingness
- [ ] Consider: drop, fill, predictive imputation
๐ 5. Outliers¶
- [ ] IQR filtering
- [ ] Z-score filtering
- [ ] Context/domain filtering (caps/floors)
๐ 6. Scaling & Transformation¶
- [ ] Log/sqrt transforms for skew
- [ ] StandardScaler, MinMaxScaler, RobustScaler
๐ง 7. Feature Engineering (Optional)¶
- [ ] Bin continuous vars (if needed)
- [ ] Polynomial or interaction terms
- [ ] Flag rare categories or consolidate groups
๐งน 8. Pre-Model Cleanup¶
- [ ] Drop ID / index / time features if not useful
- [ ] Recheck all dtypes
- [ ] Align feature set with model goals
๐ง Final Tip¶
"EDA is an iterative process. Don't be afraid to circle back to earlier steps as you uncover new insights."
Ready to export to modeling script or notebook? โ