Now we have learned how to write a Linear Regression model from hand in our last tutorial. Also, we had a look at the prediction error and standard error. Today, we want to focus on a way how to measure the performance of a model. In marketing, a common methodology for this is lift and gain curves. They can also be used for other things, but in our today’s sample we will use a marketing scenario
The marketing scenario for Lift and Gain charts
Let’s assume that you are in charge of an outbound call campaign. Basically, your goal is to increase conversions of people contacted via this campaign. Like with most campaigns, you have a certain – limited – budget and thus need to plan the campaign smart. This is where machine learning comes into play: you only want to contact those people that are most relevant to buy the product. Therefore, you contact the top X percent of customers where you rather expect a conversion and avoid contacting those customers that are very unlikely to get converted. We assume that you already built a model for that and that we now do the campaign. We will measure our results with a gain chart, but first let’s create some data.
Our sample data represents all our customers, grouped into decentiles. Basically, we group the customers into top 10%, top 20%, … until we reach all customers. We add the number of conversions to it as well:
|Decantile||# of Customers||Conversions|
As you can see in the above table, the first decentile contains most conversions and is thus our top group. The conversion rates for each group in percent are:
As you can see, 17.2% of all top 10% customers could be converted. From each group, it declines. So, the best approach is to first contact the top customers. As a next step, we add the cumulative conversions. This number is then used for our cumulative gain chart.
|Cumulative % Conversions|
Cumulative Gain Chart
With this data, we can now create the cumulative gain chart. In our case, this would look like the following:
The Lift factor
Now, let’s have a look at the lift factor. The base for the lift factor is always the lift 1. This means that there was a random sample selected and no structured approach was done. Basically, the lift factor is the ratio you get between the number of customers contacted in % and the number of conversions for the decentile in %. With our sample data, this lift data would look like the following:
Thus we would have a lift factor of 1.72 with the first percentile, decreasing towards the full customer set.
In this tutorial, we’ve learned about how to verify a machine learning model. In the next tutorial, we will have a look at false positives and some other important topics before moving on with Logistic Regression.