Skip to content

Statistical Interpretation


🎯 Purpose

This guide explains how to interpret key statistical summaries and tests commonly used in exploratory data analysis. It supports analysts in understanding distributional patterns, variable relationships, and data quality before modeling.


🧮 1. Descriptive Statistics

Metric Meaning Use Case
Mean Average value of a feature Central tendency (symmetric)
Median Middle value Skewed distributions
Std / Variance Spread of values Dispersion detection
Min / Max Range endpoints Outlier spotting
Skewness Asymmetry in distribution Transformation decision
Kurtosis Tail heaviness Outlier likelihood

Code Example:

df.describe().T

📉 2. Distribution Normality Tests

Test Interpretation
Shapiro-Wilk p < 0.05 → not normally distributed
D’Agostino’s K² Combines skew/kurtosis for normality
Anderson-Darling Strong tail sensitivity

Code Example:

from scipy.stats import shapiro, normaltest

✔️ Use to justify transformation, bootstrapping, or robust stats


🔍 3. Correlation and Association

🔹 Pearson Correlation

  • Measures linear relationship (range: –1 to +1)
  • Sensitive to outliers

🔹 Spearman Rank Correlation

  • Nonlinear, monotonic trends
  • Robust to outliers and skew

🔹 Cramer’s V (Categorical)

  • Association strength between categoricals

Code:

from scipy.stats import pearsonr, spearmanr

🔗 4. Feature-to-Target Relationship Tests

Scenario Test Purpose
Numeric vs Numeric Pearson/Spearman correlation Linear/monotonic relationship
Categorical vs Numeric ANOVA / Kruskal-Wallis Group mean differences
Categorical vs Categorical Chi-Squared / Cramer’s V Association strength

🧪 5. Outlier Detection (Statistical)

Method Description
Z-Score Observations > 3 SD from mean
IQR Method Outside 1.5×IQR from Q1/Q3
Mahalanobis Distance Multivariate outlier detection

✔️ Flag but don’t remove outliers without cause/context


📋 Analyst Summary Table Elements

Field Use
Mean / Median Central location
Std / IQR Dispersion/spread
Skewness / Kurtosis Transformation, normality
Missing % Imputation plan
# Unique Values Cardinality (for encoding/grouping)

💡 Final Tip

“Stats don’t tell you the answer—but they tell you where to look. Use summary patterns to guide deeper investigation.”

Use with: General & Advanced EDA Guidebook