Skip to content

Feature Transformation (Logistic)


🎯 Purpose

Use this card to decide when to apply transformations when working with logistic regression, Poisson, or negative binomial models. Focuses on skew handling, scale sensitivity, and model assumptions.


πŸ” 1. Trigger Logic by Feature Type

πŸ”Ή Numeric (Continuous)

Condition Transformation
Right-skewed distribution ( skew > 1) βœ… Log or Sqrt Transform
Wide magnitude spread βœ… StandardScaler or MinMaxScaler (esp. for regularized logistic)
Count-based exposure (e.g., minutes, visits) βœ… Log or Offset transform (Poisson/Negative Binomial)

πŸ”Ή Categorical

Condition Transformation
Nominal (unordered) βœ… One-hot Encoding
Ordinal (known rank) βœ… Ordinal Encoding
High-cardinality πŸ” Grouping, embedding, or collapse (manual)

πŸ§ͺ 2. Model-Specific Cues

Model Trigger Action
Logistic Regression Skewed numeric, regularization Scale or log
Poisson Regression Log-linear fit assumption Log transform predictors + offset(optional)
Negative Binomial Same as Poisson, handles overdispersion Same transforms apply

πŸ” 3. Common Transform Functions

# Standard Scaling
from sklearn.preprocessing import StandardScaler
X_scaled = StandardScaler().fit_transform(X)

# Log transform for positive skewed count data
X['log_visits'] = np.log1p(X['visits'])

# One-hot encoding for categorical
X = pd.get_dummies(X, columns=['device_type'])

# Offset for Poisson
model = sm.GLM(y, X, family=sm.families.Poisson(), offset=np.log(X['exposure']))

βœ… Checklist Before Modeling

  • [ ] Numeric features reviewed for skew or count-like structure
  • [ ] Categorical variables encoded correctly (one-hot or ordinal)
  • [ ] For Poisson/NB: offset column included (log of exposure)
  • [ ] Scaled if using regularized logistic (L1/L2)
  • [ ] Transforms documented and justified in EDA or notebook

πŸ’‘ Tip

β€œWhen modeling probabilities or counts, log is your best friend β€” but only when used intentionally.”