๐ Python Script Resources¶
This section contains a library of starter scripts and boilerplate code for common data analysis, cleaning, EDA, and machine learning workflows. Each document is designed to be a self-contained reference for quick reuse and adaptation in your own projects.
๐ Exploratory Data Analysis (EDA)¶
- EDA Templates: A comprehensive set of functions for initial data exploration, from summary statistics to core visualizations.
- EDA Extras: Advanced and specialized visualizations like cluster maps, Andrews curves, and dendrograms for deeper insights.
๐งผ Data Cleaning¶
- Quick Clean Statistics Runner: A script to quickly generate key data quality metrics like missing values, skew, and outlier counts.
๐ Modeling & Diagnostics¶
- Linear Modeling Extensions: Runner functions for OLS, Ridge, Lasso, and ElasticNet regression models.
- Logistic Modeling Extensions: Runner functions for binary, multinomial, Poisson, and other logistic-family models.
- Regression Model Diagnostics: A suite of functions for post-hoc OLS model evaluation, including residual plots and statistical tests.
- Logistic Model Diagnostics: Functions for evaluating classification models, including confusion matrices, ROC curves, and calibration plots.
- Tree and Boost Diagnostics: Helper functions for extracting and tabulating results from
GridSearchCV
for tree-based models.
๐ ๏ธ Utilities¶
- General Utilities: Core helper functions, including a simple train-test split and a VIF calculator.