A Beginner's Guide to Precision, Recall, F1 Score, and True Positives

 

Precision

Precision is the ratio of correctly predicted positive observations to the total predicted positives. It answers the question: "Of all the positive predictions, how many were actually correct?"

Precision=True Positives (TP)True Positives (TP)+False Positives (FP)\text{Precision} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}}

Example: Imagine you have a model that predicts whether an email is spam. Out of 100 emails predicted as spam, 80 are actually spam, and 20 are not (false positives).

Precision=8080+20=80100=0.8\text{Precision} = \frac{80}{80 + 20} = \frac{80}{100} = 0.8

So, the precision is 0.8, or 80%.

Recall (Sensitivity)

Recall is the ratio of correctly predicted positive observations to all the observations in the actual class. It answers the question: "Of all the actual positives, how many did we correctly predict?"

Recall=True Positives (TP)True Positives (TP)+False Negatives (FN)\text{Recall} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Negatives (FN)}}

Example: Continuing with the spam email example, suppose there are 90 actual spam emails in the dataset, and your model correctly identified 80 of them, missing 10 (false negatives).

Recall=8080+10=80900.89\text{Recall} = \frac{80}{80 + 10} = \frac{80}{90} \approx 0.89

So, the recall is approximately 0.89, or 89%.

F1 Score

The F1 score is the harmonic mean of precision and recall. It provides a balance between precision and recall, especially useful when you need a single metric to evaluate the model's performance.

F1 Score=2×Precision×RecallPrecision+Recall\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}

Example: Using the precision of 0.8 and recall of 0.89 from the previous examples:

F1 Score=2×0.8×0.890.8+0.892×0.7121.690.843\text{F1 Score} = 2 \times \frac{0.8 \times 0.89}{0.8 + 0.89} \approx 2 \times \frac{0.712}{1.69} \approx 0.843

So, the F1 score is approximately 0.843, or 84.3%.

True Positive (TP)

A true positive is an outcome where the model correctly predicts the positive class. In the context of binary classification (spam vs. not spam), a true positive means the model correctly identifies a spam email as spam.

Example: If your model correctly identifies 80 spam emails as spam, those 80 are your true positives.

Summary with Example

Let's consolidate these metrics with a confusion matrix for the spam email example:

  • True Positives (TP): 80 (spam correctly identified as spam)
  • False Positives (FP): 20 (non-spam incorrectly identified as spam)
  • False Negatives (FN): 10 (spam incorrectly identified as non-spam)
  • True Negatives (TN): Suppose there are 200 actual non-spam emails, and your model correctly identifies 180 of them. So, TN = 180.

From this, we get

These metrics help you evaluate your model's performance in a more nuanced way than simply looking at overall accuracy.

Comments

Popular posts from this blog

Mastering the Difference Array Technique: A Simple Guide with Examples

Leetcode 2594. Minimum Time to Repair Cars explained clearly

Finding a Unique Binary String in Python