A Beginner's Guide to Precision, Recall, F1 Score, and True Positives
Precision
Precision is the ratio of correctly predicted positive observations to the total predicted positives. It answers the question: "Of all the positive predictions, how many were actually correct?"
Example: Imagine you have a model that predicts whether an email is spam. Out of 100 emails predicted as spam, 80 are actually spam, and 20 are not (false positives).
So, the precision is 0.8, or 80%.
Recall (Sensitivity)
Recall is the ratio of correctly predicted positive observations to all the observations in the actual class. It answers the question: "Of all the actual positives, how many did we correctly predict?"
Example: Continuing with the spam email example, suppose there are 90 actual spam emails in the dataset, and your model correctly identified 80 of them, missing 10 (false negatives).
So, the recall is approximately 0.89, or 89%.
F1 Score
The F1 score is the harmonic mean of precision and recall. It provides a balance between precision and recall, especially useful when you need a single metric to evaluate the model's performance.
Example: Using the precision of 0.8 and recall of 0.89 from the previous examples:
So, the F1 score is approximately 0.843, or 84.3%.
True Positive (TP)
A true positive is an outcome where the model correctly predicts the positive class. In the context of binary classification (spam vs. not spam), a true positive means the model correctly identifies a spam email as spam.
Example: If your model correctly identifies 80 spam emails as spam, those 80 are your true positives.
Summary with Example
Let's consolidate these metrics with a confusion matrix for the spam email example:
- True Positives (TP): 80 (spam correctly identified as spam)
- False Positives (FP): 20 (non-spam incorrectly identified as spam)
- False Negatives (FN): 10 (spam incorrectly identified as non-spam)
- True Negatives (TN): Suppose there are 200 actual non-spam emails, and your model correctly identifies 180 of them. So, TN = 180.
From this, we get
These metrics help you evaluate your model's performance in a more nuanced way than simply looking at overall accuracy.
Comments
Post a Comment