In my previous posts, I explained the Linear Regression and stated that there are some errors in it. This is called the error of prediction (for individual predictions) and there is also a standard error. A prediction is good if the individual errors of prediction and the standard error are small. Let’s now start by examining the error of prediction, which is called the standard error in a linear regression model.
Error of prediction in Linear regression
Let’s recall the table from the previous tutorial:
|Year||Ad Spend (X)||Revenue (Y)||Prediction (Y’)|
|2013||€ 345.126,00||€ 41.235.645,00||€ 48.538.859,48|
|2014||€ 534.678,00||€ 62.354.984,00||€ 65.813.163,80|
|2015||€ 754.738,00||€ 82.731.657,00||€ 85.867.731,47|
|2016||€ 986.453,00||€ 112.674.539,00||€ 106.984.445,76|
|2017||€ 1.348.754,00||€ 156.544.387,00||€ 140.001.758,86|
|2018||€ 1.678.943,00||€ 176.543.726,00||€ 170.092.632,46|
|2019||€ 2.165.478,00||€ 199.645.326,00||€ 214.431.672,17|
We can see that there is a clear difference in between the prediction and the actual numbers. We calculate the error in each prediction by taking the real value minus the prediction:
In the above table, we can see how each prediction differs from the real value. Thus it is our prediction error on the actual values.
Calculating the Standard Error
Now, we want to calculate the standard error. First, let’s have a look at the formular:
Basically, we take the sum of all error to the square, divide it by the number of occurrences and take the square root of it. We already have Y-Y’ calculated, so we only need to make the square of it:
|-€ 7.303.214,48||€ 53.336.941.686.734,40|
|-€ 3.458.179,80||€ 11.959.007.558.032,20|
|-€ 3.136.074,47||€ 9.834.963.088.101,32|
|€ 5.690.093,24||€ 32.377.161.053.416,10|
|€ 16.542.628,14||€ 273.658.545.777.043,00|
|€ 6.451.093,54||€ 41.616.607.923.053,70|
|-€ 14.786.346,17||€ 218.636.033.083.835,00|
The sum of it is 641.419.260.170.216,00 €
And N is 7, since it contains 7 Elements. Divided by 7, it is: 91.631.322.881.459,50 €
The last step is to take the square root, which results in the standard error of 9.572.425,13 € for our linear regression.
Now, we have most items cleared for our linear regression and can move on to the logistic regression in our next tutorial.
This tutorial is part of the Machine Learning Tutorial. You can learn more about Machine Learning by going through this tutorial. On Cloudvane, there are many more tutorials about (Big) Data, Data Science and alike, read about them in the Big Data Tutorials here. If you look for great datasets to play with, I would recommend you Kaggle.