Skip to content

Advanced Guidebook


๐ŸŽฏ Purpose

This guidebook expands on the core linear regression modeling guide by detailing advanced techniques, assumptions, diagnostic methods, and specialized modeling extensions. It is intended for analysts building interpretable, assumption-validated, and production-ready linear models.


๐Ÿง  1. Model Fit vs Inference vs Prediction

Goal Focus Tools
Explanation Interpret coefficients OLS, WLS, RLM
Inference Validate statistical relationships P-values, confidence intervals
Prediction Minimize test error Regularization, CV, transformations

โœ”๏ธ Frame model usage before optimization.


๐Ÿงฎ 2. Advanced Assumption Testing

๐Ÿ” Normality of Residuals

  • Shapiro-Wilk, Anderson-Darling, KS Test
  • Histogram + QQ plot

๐Ÿ“‰ Homoscedasticity

  • Breusch-Pagan, White Test
  • Residuals vs Fitted visual

๐Ÿ” Autocorrelation

  • Durbin-Watson, Ljung-Box
  • Residual lag plots

๐Ÿ”ข Multicollinearity

  • VIF (Variance Inflation Factor)
  • Correlation heatmap, condition index

โœ… Python Snippets

from statsmodels.stats.stattools import durbin_watson
from statsmodels.stats.diagnostic import het_breuschpagan
from scipy.stats import shapiro

๐Ÿ›  3. Robust and Modified Linear Models

Method Use Case
RLM (M-estimators) Outliers distort OLS
WLS Non-constant error variance
HC3 Errors Heteroskedasticity, small samples
model = sm.OLS(y, X).fit()
robust_model = model.get_robustcov_results(cov_type='HC3')

๐Ÿงช 4. Residual Diagnostics

Diagnostic Plot / Test Goal
Residual Linearity Residuals vs Fitted Flat band, no trend
Normality QQ plot, histogram, Shapiro test Bell shape, diagonal line
Constant Variance BP / White, scale-location plot Uniform spread
Autocorrelation DW stat, residual autocorr. plot No serial correlation
Influential Obs. Leverage vs residual, Cook's D Flag outliers or high leverage

๐Ÿ” 5. Transformations & Nonlinear Patterns

Transformation Use Case Visual Test
Log(y) Right-skewed response Histogram, Residuals vs Fitted
Log(x) Exponential predictor pattern X vs Y scatter
Box-Cox Non-normal target, heteroscedasticity QQ, residual plot
Polynomial Curved relationships X vs Residuals, LOESS
from sklearn.preprocessing import PolynomialFeatures

๐Ÿง  6. Interaction Effects

โœ”๏ธ Capture variable relationships that change by group or depend on other predictors.

smf.ols('y ~ X1 * X2', data=df).fit()

Visuals:

  • Simple slopes plot
  • Grouped regression lines

๐Ÿงฉ 7. Regularization Extensions

Method Goal
Ridge Shrink correlated coefficients
Lasso Shrink and select (some to 0)
ElasticNet Blend Ridge + Lasso
from sklearn.linear_model import LassoCV, RidgeCV, ElasticNetCV

Visuals:

  • Coefficient paths
  • Validation curves (RMSE vs Alpha)

๐Ÿ“Š 8. Model Comparison Tools

Tool Purpose
Adjusted Rยฒ Compare fit with penalty for features
AIC/BIC Penalized log-likelihood criteria
Cross-Validation Estimate test error
RMSE/MAE Fit error on holdout/test
from sklearn.model_selection import cross_val_score

๐Ÿ“‹ 9. Reporting Template Elements

Field Description
Model Type OLS / Ridge / RLM / Elastic Net
Fit Metrics Rยฒ, Adj Rยฒ, RMSE, MAE
Assumption Results Normality, Homoskedasticity, VIF
Residual Plots Reviewed QQ, Residuals vs Fitted, Cookโ€™s D
Coefficient Table With CIs and p-values
Regularization Path Optional (if used)
Notes / Caveats Outliers, limitations, transformation notes

๐Ÿง  Final Tip

โ€œLinear regression isn't fragile โ€” it's transparent. Use diagnostics to tune it, not discard it.โ€

Use with: Linear Regression Summary Sheet, Visual Guide, and Residual Diagnostics Runner.