Confusion Matrix NOT Confusing Matrix

Confusion Matrices are a useful tool for evaluating binary classification models, it can even be extended to multiclass classification problems but we will stick to binary cases for this article. The matrix is a way for us to see our model’s predicted classes vs the actual outcomes. It’s called a confusion matrix because it reveals how “confused” the model is between the two possible outcomes and highlights instances in which one class is confused for the other. If you have never seen a confusion matrix before here is an example, don’t worry we will go over how to read it and the metrics we can get from later on in the article.

example confusion matrix

Many datasets that we will come across will tend to have imbalanced classes. An imbalanced dataset will have more data points that belong to one category than another. This can have an impact on the performance of our model as it might learn to always predict the majority class. There are techniques we can use to mitigate the issues that can arise from imbalanced datasets, such as upsampling and downsampling, that will balance the classes in the hopes of producing a better model. For now though we will investigate what would happen if we did not make use of these techniques. If we were to build a classifier on a dataset that is made up of say 90 Percent of the majority class and we found that our model gave us a 91 percent accuracy we may think that the model is performing really well. In reality this model is only slightly better than if we just always chose the majority class, this means that accuracy is not a very useful metric in this case. We need to instead rely on different classification metrics such as precision, recall, f-1 score, and others to tell us whether we have a good model. This is where the confusion matrix comes in, from the matrix we can compute these metrics and use the metric that makes sense for our dataset.

Understanding the confusion matrix can be a bit “confusing” when you are first confronted with it but after a bit of practice it’s really not that bad! Here we will go over how to read the confusion matrix using the image from above to convey the necessary parts you need to know. The columns of the confusion matrix are the true classes, ones we know to be true, while the rows of the confusion matrix are the predicted classes, ones that our model classified. The cells of the matrix are composed of the following: true positives, false positives, false negatives, and true negatives (we will highlight what these mean but lets continue on). The main diagonal of the confusion matrix, top-left to bottom-right, are the cases where the model is correct (true positives and true negatives) and the second diagonal of the confusion matrix, top-right to bottom-left, are the cases where the model is incorrect (false negatives and false positives). Now lets define those common terms that we introduced:

  • True Positives: cases where the model correctly predicted yes
  • False Positives: cases where the model incorrectly predicted yes
  • False Negatives: cases where the model incorrectly predicted no
  • True Negatives: cases where the model correctly predicted no

The above outcomes are important to understand as we will build many of our classification metrics from these terms. The most used of these metrics are precision, recall, and f1-score, we will go over what these mean and how to compute them from the confusion matrix.

  • Precision: This is the number of true positives out of all predicted positive values or the accuracy of our positive predictions. We can compute this metric by dividing the sum of the True Positives and False Positives from the True Positives.
formula for precision
  • Recall: This metric is typically used along with precision, it may also be referred to as sensitivity of the true positive rate. Recall is the ratio of positive instances that are correctly detected by our model. We can compute this metric by dividing the sum of True Positives and False Negatives by the True Positives.
formula for recall
  • F1-score: This metric completely relies on the precision and recall metrics that we computed above. In fact this score conveys the balance between our precision and recall scores. This metric can be computed a few different ways so I will just put the formulas below
formulas for F1-score

I hope that you have gained some insight into what the Confusion Matrix is used for and how to use it to gain more insights into our models capabilities. This can be somewhat of a confusing topic but a very important one to master. The video below is a great resource for learning more about confusion matrices and many other machine learning ideas.