Advanced Visual Interpretation
๐ฏ Purpose
This guide deepens the visual analysis of linear regression models by incorporating diagnostics for assumption testing, robustness, and model complexity. It extends the standard visual evaluation companion and supports high-quality model QA and reporting.
๐ 1. Actual vs Predicted (Model Fit Check)¶
Goal: Evaluate prediction accuracy and potential bias.
โ๏ธ Look for tight clustering around the 45ยฐ line. โ ๏ธ Curvature or separation suggests underfitting or omitted variables.
sns.scatterplot(x=y_test, y=y_pred)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
๐ 2. Residuals vs Fitted (Homoscedasticity)¶
Goal: Validate constant variance and check for patterning.
โ๏ธ Cloud-like spread = good. โ ๏ธ Funnel shape = heteroscedasticity. โ ๏ธ Curve = nonlinear trend not captured.
sns.scatterplot(x=y_pred, y=residuals)
plt.axhline(0, color='red', linestyle='--')
๐ 3. Histogram of Residuals (Normality Check)¶
Goal: Test if residuals are bell-shaped (assumption for inference).
โ๏ธ Smooth bell curve = OK. โ ๏ธ Skew, multiple peaks = assumption violated.
sns.histplot(residuals, kde=True)
๐ 4. QQ Plot (Normality Diagnostic)¶
Goal: Quantify deviation from normal distribution.
โ๏ธ Points on line = good. โ ๏ธ S-curve = skewed; tails = outliers or heavy-tailed errors.
sm.qqplot(residuals, line='45')
๐งช 5. Scale-Location Plot¶
Goal: Detect non-constant variance (more sensitive than residuals plot).
โ๏ธ Flat horizontal band = homoscedastic. โ ๏ธ Upward curve = residuals increasing with fitted value.
sns.scatterplot(x=y_pred, y=np.sqrt(np.abs(residuals)))
plt.axhline(y=np.mean(np.sqrt(np.abs(residuals))), color='red', linestyle='--')
๐งญ 6. Influence & Leverage Diagnostics¶
Goal: Identify influential points or high-leverage outliers.
Plot | What to Look For |
---|---|
Cookโs Distance | Large spikes = influence |
Leverage vs Residual | Far top right = danger zone |
influence = model.get_influence()
(c, p) = influence.cooks_distance
plt.stem(c)
๐ 7. Visualizing Model Extensions¶
๐ Regularization (Ridge/Lasso)¶
- Plot coefficients vs alpha (log scale)
- Use
RidgeCV
,LassoCV
with grid ofalphas
๐ Polynomial Regression¶
- Overlay predicted vs actual with fitted line
- Visual residual pattern vs degree of polynomial
from sklearn.preprocessing import PolynomialFeatures
๐งช 8. Visual Summary Table¶
Visual | Diagnosis Target |
---|---|
Actual vs Predicted | General fit & bias |
Residuals vs Fitted | Homoscedasticity |
Histogram of Residuals | Normality |
QQ Plot | Normality (tail behavior) |
Scale-Location Plot | Variance diagnostics |
Leverage vs Residual | Influential obs / outliers |
๐ Analyst Visual Review Checklist¶
- [ ] Actual vs Predicted: Tight line fit?
- [ ] Residuals: Random cloud?
- [ ] Histogram: Bell-shaped?
- [ ] QQ Plot: Aligned with diagonal?
- [ ] Scale-location: Flat trend?
- [ ] Influential point plots reviewed?
- [ ] If using Ridge/Lasso, coefficient paths reviewed?
๐ก Final Tip¶
Always blend residual visuals, fit diagnostics, and robustness checks for trustworthy regression results.
Use this with: Advanced Linear Regression Guidebook, Statistical Summary Sheet, and Evaluation Checklist.