Skip to content

KNN Classifier


🎯 Purpose

This QuickRef explains how to use the K-Nearest Neighbors (KNN) algorithm for classification tasks. It covers fit logic, distance metrics, scaling importance, and evaluation strategies.


📦 1. When to Use

Condition Use KNN?
Small to medium dataset ✅ Yes
Predictors are numeric + scale-consistent ✅ Yes
Need interpretable local decisions ✅ Yes
High dimensional or noisy data ❌ Try trees or regularized models

🧮 2. Core Logic

  • KNN is a lazy learner — it stores training data and makes predictions at inference time based on proximity
  • Uses majority vote among k closest training points to assign class

📏 3. Distance Metrics

Metric Use When...
Euclidean (default) Standard numeric data
Manhattan Grid-like or sparse data
Minkowski Generalized form (power = 1 or 2)
Cosine Text embeddings, angular similarity

✔️ Always scale features before fitting to avoid distance bias


🛠️ 4. Fitting in sklearn

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(), KNeighborsClassifier(n_neighbors=5))
model.fit(X_train, y_train)

🔧 5. Key Hyperparameters

Param Description
n_neighbors Number of nearest neighbors to use
weights 'uniform' (default) or 'distance' (closer neighbors weigh more)
metric Distance function (Euclidean, Manhattan, etc.)

📊 6. Evaluation Tips

  • Use cross-validation to tune k
  • Use confusion matrix, precision/recall, AUC if needed
  • Sensitive to class imbalance → consider stratified sampling

✅ Modeling Checklist

  • [ ] Features scaled before training (e.g., StandardScaler)
  • [ ] n_neighbors tuned with validation set or CV
  • [ ] Distance metric chosen based on feature type
  • [ ] Class imbalance reviewed
  • [ ] Evaluation scores visualized with multiple k values

💡 Tip

“KNN makes no assumptions — but gives no explanations either.”