In my previous posts we had a look at some fundamentals of machine learning and had a look at the linear regression. Today, we will look at another statistical topic: false positives and false negatives. You will come across these terms quite often when working with data, so let’s have a look at them.
The false positive
In statistics, there is one error, called the false positive error. This happens when the prediction states something to be true, but in reality it is false. To easily remember the false positive, you could describe this as a false alarm. A simple example for that is the airport security check: when you pass the security check, you have to walk through a metal detector. If you don’t wear any metal items with you (since you left them for the x-ray!), no alarm will go on. But in some rather rare cases, the alarm might still go on. Either you forgot something or the metal detector had an error – in this case, a false positive. The metal detector predicted that you have metal items somewhere with you, but in fact you don’t.
Another sample of a false positive in machine learning would be in image recognition: imagine your algorithm is trained to recognise cats. There are so many cat pictures on the web, so it is easy to train this algorithm. However, you would then feed the algorithm the image of a dog and the algorithm would call it a cat, even though it is a dog. This again is a false positive.
In a business context, your algorithm might predict that a specific customer is going to buy a certain product for sure. but in fact, this customer didn’t buy it. Again, here we have our false positive. Now, let’s have a look at the other error: the false negative.
The false negative
The other error in statistics is the false negative. Similar to the false positive, it is something that should be avoided. It is very similar to the false positive, just the other way around. Let’s look at the airport example one more time: you wear a metal item (such as a watch) and go through the metal detector. You simply forgot to take off the watch. And – the metal detector doesn’t go on this time. Now, you are a false negative: the metal detector stated that you don’t wear any metal items, but in fact you did. A condition was predicted to be true but in fact it was false.
A false positive is often useful to score your data quality. Now that you understand some of the most important basics of statistics, we will have a look at another machine learning algorithm in my next post: the logistic regression.
This tutorial is part of the Machine Learning Tutorial. You can learn more about Machine Learning by going through this tutorial. On Cloudvane, there are many more tutorials about (Big) Data, Data Science and alike, read about them in the Big Data Tutorials here. If you look for great datasets to play with, I would recommend you Kaggle.