Skip to content

Statistical Summary


🎯 Purpose

This reference provides a structured overview of classification model evaluation metrics and statistical summaries. It supports model interpretation, diagnostics, and reporting across binary and multiclass classification tasks.


πŸ“ 1. Classification Metrics (Binary & Multiclass)

βœ… Accuracy

Proportion of correct predictions.

from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_pred)
  • ⚠️ Misleading for imbalanced datasets

βœ… Precision, Recall, and F1 Score

Measures of prediction quality by class.

from sklearn.metrics import precision_score, recall_score, f1_score
precision_score(y_true, y_pred, average='binary')
recall_score(y_true, y_pred, average='binary')
f1_score(y_true, y_pred, average='binary')
  • average='macro', 'micro', or 'weighted' for multiclass

βœ… Confusion Matrix

Breakdown of predicted vs actual labels.

from sklearn.metrics import confusion_matrix
confusion_matrix(y_true, y_pred)
  • Use with heatmaps or annotation for reports

βœ… ROC AUC (Area Under Curve)

Measures ranking ability of classifier.

from sklearn.metrics import roc_auc_score
roc_auc_score(y_true, y_proba)
  • Best for binary or OvR multiclass with y_proba

βœ… Log Loss (Cross Entropy)

Penalty for incorrect probabilities.

from sklearn.metrics import log_loss
log_loss(y_true, y_proba)
  • Sensitive to poorly calibrated outputs

βœ… Matthews Correlation Coefficient (MCC)

Balanced metric for binary classification.

from sklearn.metrics import matthews_corrcoef
matthews_corrcoef(y_true, y_pred)
  • Works well for imbalanced classes

βœ… Cohen’s Kappa

Measures agreement beyond chance.

from sklearn.metrics import cohen_kappa_score
cohen_kappa_score(y_true, y_pred)
  • Useful for reviewer agreement or weak supervision

πŸ“Š 2. Per-Class Statistics (Multiclass Models)

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))
Output Includes: Description
Precision Correctness of predicted positives
Recall Sensitivity to true positives
F1-Score Balance of precision/recall
Support Count of true instances per class

πŸ“ˆ 3. Confidence + Probability Analysis

Prediction Confidence Histogram

plt.hist(y_proba, bins=20)
  • Check model certainty and threshold behavior

Brier Score (probability accuracy)

from sklearn.metrics import brier_score_loss
brier_score_loss(y_true, y_proba)
  • Lower is better; complements log loss

Calibration Curve

from sklearn.calibration import calibration_curve
prob_true, prob_pred = calibration_curve(y_true, y_proba, n_bins=10)
  • Helps interpret output probability quality

πŸ”Ž 4. Summary Table Elements (for Reports)

Column Description
Model Model name or type
Accuracy / F1 Summary performance
ROC AUC / PR AUC Probability performance
TP / FP / FN / TN Basic confusion stats
Top Features Feature importance (if available)
Threshold Used If not default (0.5), must be documented
Calibration Status Calibrated or raw probabilities?
Version / Run Date Metadata for reproducibility

🧠 Final Tip

β€œUse at least one probability-based metric, one confusion-based metric, and one visual output in every classification summary.”

Use this in tandem with: Visual Guide, Classifier Evaluation Checklist, and Modeling Guidebook.