We use two primary metrics to measure how well a machine learning model is performing. Accuracy and F1 Score, Precision and Recall.
To compute Accuracy and F1 Score we need to define 6 different parameters that are used to compute these.
Imagine a study evaluating a new test that screens people for a disease. Each person taking the test either has or does not have the disease. The test outcome can be positive (classifying the person as having the disease) or negative (classifying the person as not having the disease). The test results for each subject may or may not match the subject's actual status. In that setting:
True positive: Sick people correctly identified as sick
False positive: Healthy people incorrectly identified as sick
True negative: Healthy people correctly identified as healthy
False negative: Sick people incorrectly identified as healthy
In general, Positive = identified and negative = rejected. Therefore:
True positive = Correctly identified
False positive = Incorrectly identified
True negative = Correctly rejected
False negative = Incorrectly rejected
Accuracy:
Accuracy is closeness of the measurements to a specific value. In terms of TP, TN, FP, FN it is measured as:
Accuracy = TP + TN / (TP + TN + FP + FN)
Precision:
Precision is the number of correct results divided by the number of all returned results. In terms of TP, TN, FP, FN it is measured as:
Precision = TP / (TP + FP)
Recall:
Recall is the number of correct results divided by the number of results that should have been returned.. In terms of TP, TN, FP, FN it is measured as:
Precision = TP / (TP + FN)
F1 Score:
Recall is the number of correct results divided by the number of results that should have been returned.. In terms of TP, TN, FP, FN it is measured as:
F1 = 2 * Recall * Precision / (Recall + Precision) = 2TP / (2TP + FP + FN)
We use the F1 Score as the primary success metric since it is a better measure than Accuracy since TN are a very large number and outweigh most of the other variable in information extraction tasks and since F1 Score is a measure of both Precision and Recall and is close to Accuracy without the TN it is the best measurement task for OCR tasks.