top of page
  • Writer's picturefuatakal

Why Accuracy may not be the Best Metric to Evaluate Models (in Medical Settings)?

There are several metrics we can use for evaluating a machine learning (ML) model. One of them is accuracy.


We compute the accuracy of a model, by running a test set against the model and looking at the proportion of the total samples that the model correctly classified.


I will try to show the shortcomings of the accuracy metric in a medical setting, where it might have a serious impact on the patients' fate.


I will also cover some other metrics including sensitivity, specificity, predictive values, and the ROC curve on other post as I want to keep this one compact.


Let's work with an example in the figure below so that we can illustrate the computation of accuracy.


let’s assume a test set of 10 tumor tissues labeled as benign (green) or malignant (red).


As the ground truth says, there are 3 malignant and 7 benign samples in the test set.



Let's assume we built a model (Model 1) and tested it against our test set.


Model 1 predicts all samples benign. That is, three malignant samples are classified as benign erroneously. On the other hand, seven samples are correctly classified. That is, Model 1's accuracy is 7 out of 10, which equals 0.7, although it is not really classifying.


It is definitely not a useful model, but it gets all the benign samples right.


Let's assume you built another model, i.e. Model 2.


Model 2 correctly classifies five samples as benign and two samples as malignant. In total, seven samples are classified right, which makes the accuracy 0.7 for Model 2 as well.


As a result, we have two ML models with an accuracy of 0.7.


I think one can safely say that Model 2 may be of more use than Model 1.


In summary, Model 1 would let go of cancer patients undetected while Model 2 would catch at least two of them although both models have the same accuracy scores.





 

Thank you for reading this post. If you have anything to say/object/correct, please drop a comment down below.

287 views0 comments

Recent Posts

See All

The Effects of Varying Threshold on Evaluation Metrics

Let's assume you have a prediction model that returns a probability score to indicate how likely a tumor tissue sample is malignant. Further assume that we have a test set of 10 tumor tissues samples

Handling Missing Data: Part 2 - Multivariate Imputation

I continue with the second post of the imputation series. I had covered univariate imputation before. If you missed it somehow, you can find it here. Let us continue with a more sophisticated imputati

Never Trust a Dataset!

If you are following my posts, you might remember that I was writing on imputation recently. To prepare a series of blog posts on imputation, I searched for a convenient data set on the Internet. And,

bottom of page