A linear regression model

Now we have learned how to write a Linear Regression model from hand in our last tutorial. Also, we had a look at the prediction error and standard error. Today, we want to focus on a way how to measure the performance of a model. In marketing, a common methodology for this is lift and gain curves. They can also be used for other things, but in our today’s sample we will use a marketing scenario

The marketing scenario for Lift and Gain charts

Let’s assume that you are in charge of an outbound call campaign. Basically, your goal is to increase conversions of people contacted via this campaign. Like with most campaigns, you have a certain – limited – budget and thus need to plan the campaign smart. This is where machine learning comes into play: you only want to contact those people that are most relevant to buy the product. Therefore, you contact the top X percent of customers where you rather expect a conversion and avoid contacting those customers that are very unlikely to get converted. We assume that you already built a model for that and that we now do the campaign. We will measure our results with a gain chart, but first let’s create some data.

Our sample data represents all our customers, grouped into decentiles. Basically, we group the customers into top 10%, top 20%, … until we reach all customers. We add the number of conversions to it as well:

Decantile# of CustomersConversions
120033
220030
320027
420025
520023
620019
720015
820011
92007
102002

As you can see in the above table, the first decentile contains most conversions and is thus our top group. The conversion rates for each group in percent are:

% Conversions
17,2%
15,6%
14,1%
13,0%
12,0%
9,9%
7,8%
5,7%
3,6%
1,0%

As you can see, 17.2% of all top 10% customers could be converted. From each group, it declines. So, the best approach is to first contact the top customers. As a next step, we add the cumulative conversions. This number is then used for our cumulative gain chart.

Cumulative % Conversions
17,2%
32,8%
46,9%
59,9%
71,9%
81,8%
89,6%
95,3%
99,0%
100,0%

Cumulative Gain Chart

With this data, we can now create the cumulative gain chart. In our case, this would look like the following:

A cumulative gain chart
A cumulative gain chart

The Lift factor

Now, let’s have a look at the lift factor. The base for the lift factor is always the lift 1. This means that there was a random sample selected and no structured approach was done. Basically, the lift factor is the ratio you get between the number of customers contacted in % and the number of conversions for the decentile in %. With our sample data, this lift data would look like the following:

Lift
1,72
1,64
1,56
1,50
1,44
1,36
1,28
1,19
1,10
1,00

Thus we would have a lift factor of 1.72 with the first percentile, decreasing towards the full customer set.

In this tutorial, we’ve learned about how to verify a machine learning model. In the next tutorial, we will have a look at false positives and some other important topics before moving on with Logistic Regression.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!