Skip to content

Missingness Handling


🎯 Purpose

Use this card to decide whether to drop, flag, or impute missing values β€” based on quantity, cause, downstream impact, and modeling goals.


πŸ” 1. Estimate Severity

# Basic missingness check
df.isnull().sum()
df.isnull().mean() * 100
Rule of Thumb Action
< 5% missing Safe to drop or fill silently
5–30% missing Flag or impute, especially for MAR
> 30% missing Assess necessity or drop field
> 50% missing Usually drop unless critical

πŸ”Ž 2. Identify the Pattern

Pattern Type Description Common Actions
MCAR Missing Completely At Random Drop or fill with mean/median
MAR Missing At Random (based on other vars) Impute + flag or model conditionally
MNAR Not At Random (systematic loss) Flag, impute cautiously, document cause

βœ”οΈ Use groupby or visuals to detect MAR/MNAR patterns


βš–οΈ 3. Choose an Action Path

If... Then...
Field is missing randomly and <10% Drop or impute with mean/median
Missingness depends on another feature Use model-based imputation or conditional fill
Field is categorical with few values Impute with mode or add "Missing" label
Field is an ID or timestamp Do not impute β€” review or flag separately
Field is critical and >30% missing Document and impute with caution, or exclude from model

πŸ§ͺ 4. Flagging Best Practice

# Before filling
df['income_flag'] = df['income'].isnull()
df['income'] = df['income'].fillna(df['income'].median())

βœ”οΈ Always flag imputed values for auditing and regression testing


βœ… Decision Checklist

  • [ ] % missing calculated and pattern explored
  • [ ] Cause assessed (MCAR, MAR, MNAR)
  • [ ] Column role evaluated (target, input, ID)
  • [ ] Chosen imputation/drop method documented
  • [ ] Optional: flag column created for auditability

πŸ’‘ Tip

β€œNot all nulls are noise. Sometimes they’re the most informative signal in the dataset.”