OLS + Robust
π― Purpose¶
This QuickRef consolidates EDA prep, model fitting, diagnostics, and decision rules for using Ordinary Least Squares (OLS) and Robust Linear Regression. Designed for modelers who need a single, notebook-friendly reference.
π¦ 1. EDA Prep for Linear Regression**¶
Step | Code |
---|---|
Check distributions | df.hist() or sns.histplot() |
Skew/kurtosis | df.skew(), df.kurtosis() |
Correlation heatmap | sns.heatmap(df.corr()) |
VIF check | variance_inflation_factor(X.values, i) |
βοΈ Flag skewed variables for log/sqrt transform
βοΈ Remove or combine highly correlated variables (VIF > 5β10)
π§ 2. Feature Transformation Triggers**¶
Condition | Suggestion |
---|---|
Skew > 1 or < -1 | Try log or Yeo-Johnson |
Correlation > 0.85 | Drop one or use PCA/interactions |
Heteroskedasticity (BP test fail) | Consider log transform or robust fit |
π 3. Model Assumptions (OLS)**¶
Assumption | Diagnostic |
---|---|
Linearity | Residuals vs Fitted plot |
Normality | Histogram / QQ plot of residuals |
No multicollinearity | VIF < 5β10 |
Homoscedasticity | Breusch-Pagan, Whiteβs test |
No influential outliers | Cookβs D, leverage plot |
# Breusch-Pagan test
from statsmodels.stats.diagnostic import het_breuschpagan
het_breuschpagan(residuals, model.model.exog)
βοΈ 4. When to Use Robust Regression**¶
Problem | Use Robust If⦠|
---|---|
Heteroskedasticity persists | Use HC0βHC3 covariance correction |
High-leverage points | Use RLM (M-estimators) |
Many small violations of OLS | Consider robust SE before switching model |
# Robust SE example
model.get_robustcov_results(cov_type='HC3')
# Robust Regression
import statsmodels.api as sm
sm.RLM(y, X).fit()
π 5. Output Interpretation (OLS & Robust)**¶
Coefficient | Interpretation |
---|---|
Positive Ξ² | 1-unit β in X β Ξ² unit β in Y (holding others fixed) |
Negative Ξ² | 1-unit β in X β Ξ² unit β in Y |
p < 0.05 | Statistically significant predictor |
Adj. RΒ² | Better for comparing models with different # predictors |
βοΈ Robust regression affects SE and t-stats β not coefficients
β Final Model Checklist**¶
β’ Residuals reviewed (linearity + normality)
β’ VIF < 10 for all features
β’ Heteroskedasticity test passed OR robust SE used
β’ Outliers flagged, Cookβs D < 1
β’ Model fit (Adj. RΒ², F-stat) interpreted in context
π‘ Tip
βOLS tells you whatβs ideal β robust tells you what survives reality.β