Ridge

🎯 Purpose¶

This QuickRef walks you through the core logic, fitting workflow, parameter tuning, and interpretation of Ridge Regression — the go-to model when you want to shrink coefficients without dropping features.

📦 1. When to Use Ridge¶

Scenario	Why Ridge Works
High multicollinearity	Stabilizes estimates by shrinking correlated weights
Many weak predictors	Reduces variance without removing them
Overfitting with OLS	Adds penalty to control complexity
All features need to stay	Unlike Lasso, it keeps all coefficients

⚙️ 2. How It Works¶

Adds L2 penalty to loss function:

$$ \text{Loss} = RSS + \alpha \sum w_i^2 $$

Penalizes large coefficients, encouraging smoother models

🛠️ 3. Fitting Ridge in sklearn¶

from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# Always scale features before fitting
model = make_pipeline(StandardScaler(), Ridge(alpha=1.0))
model.fit(X_train, y_train)

✔️ Use RidgeCV() to cross-validate alpha

🔁 4. Tuning Alpha¶

from sklearn.linear_model import RidgeCV
alphas = [0.1, 1.0, 10.0]
model = RidgeCV(alphas=alphas, cv=5).fit(X, y)

Alpha ↑	Effect
Low	Behaves like OLS (minimal shrink)
High	Stronger shrinkage, more bias

📊 5. Output Interpretation¶

Coefficients	Meaning
Small but ≠ 0	Feature retained but shrunken
Close to OLS	If alpha is low
Still interpretable	Unlike PCA or regularization-based transforms

✔️ Use R² / Adjusted R² / RMSE to evaluate fit

✅ Modeling Checklist¶

[ ] All features scaled (standardized)
[ ] Alpha selected via CV or tuning
[ ] Coefficients interpreted in context of shrinkage
[ ] Baseline OLS compared for performance gain

💡 Tip¶

“Ridge regression won’t choose your features — but it’ll stop them from yelling over each other.”