OLS + Robust
π― Purpose¶
This QuickRef consolidates EDA prep, model fitting, diagnostics, and decision rules for using Ordinary Least Squares (OLS) and Robust Linear Regression. Designed for modelers who need a single, notebook-friendly reference.
π¦ 1. EDA Prep for Linear Regression**¶
| Step | Code |
|---|---|
| Check distributions | df.hist() or sns.histplot() |
| Skew/kurtosis | df.skew(), df.kurtosis() |
| Correlation heatmap | sns.heatmap(df.corr()) |
| VIF check | variance_inflation_factor(X.values, i) |
βοΈ Flag skewed variables for log/sqrt transform
βοΈ Remove or combine highly correlated variables (VIF > 5β10)
π§ 2. Feature Transformation Triggers**¶
| Condition | Suggestion |
|---|---|
| Skew > 1 or < -1 | Try log or Yeo-Johnson |
| Correlation > 0.85 | Drop one or use PCA/interactions |
| Heteroskedasticity (BP test fail) | Consider log transform or robust fit |
π 3. Model Assumptions (OLS)**¶
| Assumption | Diagnostic |
|---|---|
| Linearity | Residuals vs Fitted plot |
| Normality | Histogram / QQ plot of residuals |
| No multicollinearity | VIF < 5β10 |
| Homoscedasticity | Breusch-Pagan, Whiteβs test |
| No influential outliers | Cookβs D, leverage plot |
# Breusch-Pagan test
from statsmodels.stats.diagnostic import het_breuschpagan
het_breuschpagan(residuals, model.model.exog)
βοΈ 4. When to Use Robust Regression**¶
| Problem | Use Robust If⦠|
|---|---|
| Heteroskedasticity persists | Use HC0βHC3 covariance correction |
| High-leverage points | Use RLM (M-estimators) |
| Many small violations of OLS | Consider robust SE before switching model |
# Robust SE example
model.get_robustcov_results(cov_type='HC3')
# Robust Regression
import statsmodels.api as sm
sm.RLM(y, X).fit()
π 5. Output Interpretation (OLS & Robust)**¶
| Coefficient | Interpretation |
|---|---|
| Positive Ξ² | 1-unit β in X β Ξ² unit β in Y (holding others fixed) |
| Negative Ξ² | 1-unit β in X β Ξ² unit β in Y |
| p < 0.05 | Statistically significant predictor |
| Adj. RΒ² | Better for comparing models with different # predictors |
βοΈ Robust regression affects SE and t-stats β not coefficients
β Final Model Checklist**¶
β’ Residuals reviewed (linearity + normality)
β’ VIF < 10 for all features
β’ Heteroskedasticity test passed OR robust SE used
β’ Outliers flagged, Cookβs D < 1
β’ Model fit (Adj. RΒ², F-stat) interpreted in context
π‘ Tip
βOLS tells you whatβs ideal β robust tells you what survives reality.β