precision_recall

When working on a machine learning application with a highly imbalanced dataset, traditional metrics like accuracy can be misleading. For example, if you're detecting a rare disease, a model with 99% accuracy might seem impressive, but if only 0.5% of patients have the disease, a model that always predicts "no disease" would still achieve 99.5% accuracy without being useful.

To better evaluate such models, we use precision and recall:

Precision: The fraction of true positive predictions among all positive predictions. It tells us how many of the predicted positive cases were actually positive.

Formula:
Precision
=
True Positives
True Positives
+
False Positives
Precision=
True Positives+False Positives
True Positives
​

Recall: The fraction of true positive cases detected among all actual positive cases. It measures the model's ability to identify all relevant cases.

Formula:
Recall
=
True Positives
True Positives
+
False Negatives
Recall=
True Positives+False Negatives
True Positives
​

Using a confusion matrix helps visualize these metrics:

True Positives (TP): Correctly predicted positive cases.
True Negatives (TN): Correctly predicted negative cases.
False Positives (FP): Incorrectly predicted positive cases.
False Negatives (FN): Incorrectly predicted negative cases.
For a model to be useful, both precision and recall should be reasonably high. This ensures the model not only makes accurate positive predictions but also identifies a significant portion of actual positive cases.