KNN Classifier

🎯 Purpose¶

This QuickRef explains how to use the K-Nearest Neighbors (KNN) algorithm for classification tasks. It covers fit logic, distance metrics, scaling importance, and evaluation strategies.

📦 1. When to Use¶

Condition	Use KNN?
Small to medium dataset	✅ Yes
Predictors are numeric + scale-consistent	✅ Yes
Need interpretable local decisions	✅ Yes
High dimensional or noisy data	❌ Try trees or regularized models

🧮 2. Core Logic¶

KNN is a lazy learner — it stores training data and makes predictions at inference time based on proximity
Uses majority vote among k closest training points to assign class

📏 3. Distance Metrics¶

Metric	Use When...
Euclidean (default)	Standard numeric data
Manhattan	Grid-like or sparse data
Minkowski	Generalized form (power = 1 or 2)
Cosine	Text embeddings, angular similarity

✔️ Always scale features before fitting to avoid distance bias

🛠️ 4. Fitting in sklearn¶

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(), KNeighborsClassifier(n_neighbors=5))
model.fit(X_train, y_train)

🔧 5. Key Hyperparameters¶

Param	Description
`n_neighbors`	Number of nearest neighbors to use
`weights`	`'uniform'` (default) or `'distance'` (closer neighbors weigh more)
`metric`	Distance function (Euclidean, Manhattan, etc.)

📊 6. Evaluation Tips¶

Use cross-validation to tune k
Use confusion matrix, precision/recall, AUC if needed
Sensitive to class imbalance → consider stratified sampling

✅ Modeling Checklist¶

[ ] Features scaled before training (e.g., StandardScaler)
[ ] n_neighbors tuned with validation set or CV
[ ] Distance metric chosen based on feature type
[ ] Class imbalance reviewed
[ ] Evaluation scores visualized with multiple k values

💡 Tip¶

“KNN makes no assumptions — but gives no explanations either.”