In the last two posts we introduced the core concepts of Deep Learning, Feedforward Neural Network and Convolutional Neural Network. In this post, we will have a look at two other popular deep learning techniques: Recurrent Neural Network and Long Short-Term Memory.

Recurrent Neural Network

The main difference to the previously introduced Networks is that the Recurrent Neural Network provides a feedback loop to the previous neuron. This architecture makes it possible to remember important information about the input the network received and takes the learning into consideration along with the next input. RNNs work very well with sequential data such as sound, time series (sensor) data or written natural languages.

The advantage of a RNN over a feedforward network is that the RNN can remember the output and use the output to predict the next element in a series, while a feedforward network is not able to fed the output back to the network. Real-time gesture tracking in videos is another important use-case for RNNs.

Long Short-Term Memory

A usual RNN has a short-term memory, which is already great at some aspect. However, there are requirenments for more advanced memory functionality. Long Short-Term Memory is solving this problem. The two Austrian researchers Josef Hochreiter and Jürgen Schmidhuber introduced LSTM. LSTMs enable RNNs to remember inputs over a long period of time. Therefore, LSTMs are used in combination with RNNs for sequential data which have long time lags in between.

LSTM learns over time on which information is relevant and what information isn’t relevant. This is done by assigning weights to information. This information is then assigned to three different gates within the LSTM: the input gate, the output gate and the “forget” gate.

This tutorial is part of the Machine Learning Tutorial. You can learn more about Machine Learning by going through this tutorial. On Cloudvane, there are many more tutorials about (Big) Data, Data Science and alike, read about them in the Big Data Tutorials here. If you look for great datasets to play with, I would recommend you Kaggle.

In the last couple of posts, we’ve learned about various aspects of Machine Learning. Now, we will focus on other aspects of Machine Learning: Deep Learning. After introducing the key concepts of Deep Learning in the previous post, we will have a look at two concepts: the Convolutional Neural Network (CNN) and the Feedforward Neural Network

The Feedforward Neural Network

Feedforward neural networks are the most general-purpose neural network. The entry point is the input layer and it consists of several hidden layers and an output layer. Each layer has a connection to the previous layer. This is one-way only, so that nodes can’t for a cycle. The information in a feedforward network only moves into one direction – from the input layer, through the hidden layers to the output layer. It is the easiest version of a Neural Network. The below image illustrates the Feedforward Neural Network.

Convolutional Neural Networks (CNN)

The Convolutional Neural Network is very effective in Image recognition and similar tasks. For that reason it is also good for Video processing. The difference to the Feedforward neural network is that the CNN contains 3 dimensions: width, height and depth. Not all neurons in one layer are fully connected to neurons in the next layer. There are three different type of layers in a Convolutional Neural Network, which are also different to feedforward neural networks:

Convolution Layer

Convolution puts the input image through several convolutional filters. Each filter activates certain features, such as: edges, colors or objects. Next, the feature map is created out of them. The deeper the network goes the more sophisticated those filters become. The convolutional layer automatically learns which features are most important to extract for a specific task.

Rectified linear units (ReLU)

The goal of this layer is to improve the training speed and impact. Negative values in the layers are removed.

Pooling/Subsampling

Pooling simplifies the output by performing nonlinear downsampling. The number of parameters that the network needs to learn about gets reduced. In convolutional neural networks, the operation is useful since the outgoing connections usually receive similar information.

This tutorial is part of the Machine Learning Tutorial. You can learn more about Machine Learning by going through this tutorial. On Cloudvane, there are many more tutorials about (Big) Data, Data Science and alike, read about them in the Big Data Tutorials here. If you look for great datasets to play with, I would recommend you Kaggle.

In the last couple of posts, we’ve learned about various aspects of Machine Learning. Now, we will focus on other aspects of Machine Learning: Deep Learning. In this post, I will give an introduction to deep learning. Over the last couple of years, this was the hype around AI. But what is so exciting about Deep Learning? First, let’s have a look at the concepts of Deep Learning.

A brief introduction to Deep Learning

Basically, Deep Learning should function similar to the human brain. Everything is built around Neurons, which work in networks (neural networks). The smallest element in a neural network is the neuron, which takes an input parameter and creates an output parameter, based on the bias and weight it has. The following image shows the Neuron in Deep Learning:

Next, there are Layers in the Network, which consists of several Neurons. Each Layer has some transformations, that will eventually lead to an end result. Each Layer will get much closer to the target result. If your Deep Learning model built to recognise hand writing, the first layer would probably recognise gray-scales, the second layer a connection between different pixels, the third layer would recognise simple figures and the fourth layer would recognise the letter. The following image shows a typical neural net:

A typical workflow in a neural net calculation for image recognition could look like this:

All images are split into batches

Each batch is sent to the GPU for calculation

The model starts the analysis with random weights

A cost function gets specified, that compares the results with the truth

Back propagation of the result happens

Once a model calculation is finished, the result is merged and returned

How is it different to Machine Learning?

Although Deep Learning is often considered to be a “subset” of Machine Learning, it is quite different. For different aspects, Deep Learning often achieves better results than “traditional” machine learning models. The following table should provide an overview of these differences:

Machine Leaning

Deep Learning

Feature extraction happens manually

Feature extraction is done automatically

Features are used to create a model that categorises elements

Performs “end-to-end learning”

Shallow learning

Deep learning algorithms scale with data

This is only the basic overview of Deep Learning. Deep Learning knows several different methods. In the next tutorial, we will have a look at different interpretations of Deep Learning.

This tutorial is part of the Machine Learning Tutorial. You can learn more about Machine Learning by going through this tutorial. On Cloudvane, there are many more tutorials about (Big) Data, Data Science and alike, read about them in the Big Data Tutorials here. If you look for great datasets to play with, I would recommend you Kaggle.

https://i0.wp.com/cloudvane.net/wp-content/uploads/2019/09/screenshot-2019-09-06-at-18.43.13.png?fit=1140%2C804&ssl=18041140Mario Meir-Huberhttp://cloudvane.net/wp-content/uploads/2019/08/cloudvane_small-300x188.pngMario Meir-Huber2019-10-24 14:06:002021-06-20 14:27:25An introduction to deep learning

In the first posts, I introduced different type of Machine Learning concepts. On of them is classification. Basically, classification is about identifying to which set of categories a certain observation belongs. Classifications are normally of supervised learning techniques. A typical classification is Spam detection in e-mails – the two possible classifications in this case are either “spam” or “no spam”. The two most common classification algorithms are the naive bayes classification and the random forest classification.

What classification algorithms are there?

Basically, there are a lot of classification algorithms available and when working in the field of Machine Learning, you will discover a large number of algorithms every time. In this tutorial, we will only focus on the two most important ones (Random Forest, Naive Bayes) and the basic one (Decision Tree)

The Decision Tree classifier

The basic classifier is the Decision tree classifier. It basically builds classification models in the form of a tree structure. The dataset is broken down into smaller subsets and gets detailed by each leave. It could be compared to a survey, where each question has an effect on the next question. Let’s assume the following case: Tom was captured by the police and is a suspect in robing a bank. The questions could represent the following tree structure:

Basically, by going from one leave to another, you get closer to the result of either “guilty” or “not guilty”. Also, each leaf has a weight.

The Random Forest classification

Random forest is a really great classifier, often used and also often very efficient. It is an ensemble classifier made using many decision tree models. There are ensemble models that combine the different results. The random forest model can both run regression and classification models.

Basically, it divides the data set into subsets and then runs on the data. Random forest models run efficient on large datasets, since all compute can be split and thus it is easier to run the model in parallel. It can handle thousands of input variables without variable deletion. It computes proximities between pairs of cases that can be used in clustering, locating outliers or (by scaling) give interesting views of the data.

There are also some disadvantages with the random forest classifier: the main problem is its complexity. Working with random forest is more challenging than classic decision trees and thus needs skilled people. Also, the complexity creates large demands for compute power.

Random Forest is often used by financial institutions. A typical use-case is credit risk prediction. If you have ever applied for a credit, you might know the questions being asked by banks. They are often fed into random forest models.

The Naive Bayes classifier

The Naive Bayes classifier is based on prior knowledge of conditions that might relate to an event. It is based on the Bayes Theorem. There is a strong independence between features assumed. It uses categorial data to calculate ratios between events.

The benefit of Naive Bayes are different. It can easily and fast predict classes of data sets. Also, it can predict multiple classes. Naive Bayes performs better compared to models such as logistic regression and there is a lot less training data needed.

A key challenge is that if a categorical variable has a category which was not checked in the training data set, then model will assign a 0 (zero) probability, which makes it unable for prediction. Also, it is known to be a rather bad estimator. Also, it is rather complex to use.

As stated, there are many more algorithms available. In the next tutorial, we will have a look at Deep Learning.

This tutorial is part of the Machine Learning Tutorial. You can learn more about Machine Learning by going through this tutorial. On Cloudvane, there are many more tutorials about (Big) Data, Data Science and alike, read about them in the Big Data Tutorials here. If you look for great datasets to play with, I would recommend you Kaggle.

In the previous tutorial posts, we looked at the Linear Regression and discussed some basics of statistics such as the Standard Deviation and the Standard Error. Today, we will look at the Logistic Regression. It is similar in name to the linear regression, but different in usage. Let’s have a look

The Logistic Regression explained

One of the main difference to the Linear Regression for the Logistic Regression is that you the logistic regression is binary – it calculates values between 0 and 1 and thus states if something is rather true or false. This means that the result of a prediction could be “fail” or “succeed” for a test. In a churn model, this would mean that a customer either stays with the company or leaves the company.

Another key difference to the Linear Regression is that the regression curve can’t be calculated. Therefore, in the Logistic Regression, the regression curve is “estimated” and optimised. There is a mathematical function to do this estimation – called the “Maximum Likelihood Method”. Normally, these Parameters are calculated by different Machine Learning Tools so that you don’t have to do it.

Another aspect is the concept of “Odds”. Basically, the odd of a certain event happening or not happening is calculated. This could be a certain team winning a soccer game: let’s assume that Team X wins 7 out of 10 games (thus loosing 3, we don’t take a draw). The odds in this case would be 7:10 on winning or 3:10 on loosing.

This time we won’t calculate the Logistic Regression, since it is way too long. In the next tutorial, I will focus on classifiers such as Random Forest and Naive Bayes.

This tutorial is part of the Machine Learning Tutorial. You can learn more about Machine Learning by going through this tutorial. On Cloudvane, there are many more tutorials about (Big) Data, Data Science and alike, read about them in the Big Data Tutorials here. If you look for great datasets to play with, I would recommend you Kaggle.

In my previous posts we had a look at some fundamentals of machine learning and had a look at the linear regression. Today, we will look at another statistical topic: false positives and false negatives. You will come across these terms quite often when working with data, so let’s have a look at them.

The false positive

In statistics, there is one error, called the false positive error. This happens when the prediction states something to be true, but in reality it is false. To easily remember the false positive, you could describe this as a false alarm. A simple example for that is the airport security check: when you pass the security check, you have to walk through a metal detector. If you don’t wear any metal items with you (since you left them for the x-ray!), no alarm will go on. But in some rather rare cases, the alarm might still go on. Either you forgot something or the metal detector had an error – in this case, a false positive. The metal detector predicted that you have metal items somewhere with you, but in fact you don’t.

Another sample of a false positive in machine learning would be in image recognition: imagine your algorithm is trained to recognise cats. There are so many cat pictures on the web, so it is easy to train this algorithm. However, you would then feed the algorithm the image of a dog and the algorithm would call it a cat, even though it is a dog. This again is a false positive.

In a business context, your algorithm might predict that a specific customer is going to buy a certain product for sure. but in fact, this customer didn’t buy it. Again, here we have our false positive. Now, let’s have a look at the other error: the false negative.

The false negative

The other error in statistics is the false negative. Similar to the false positive, it is something that should be avoided. It is very similar to the false positive, just the other way around. Let’s look at the airport example one more time: you wear a metal item (such as a watch) and go through the metal detector. You simply forgot to take off the watch. And – the metal detector doesn’t go on this time. Now, you are a false negative: the metal detector stated that you don’t wear any metal items, but in fact you did. A condition was predicted to be true but in fact it was false.

A false positive is often useful to score your data quality. Now that you understand some of the most important basics of statistics, we will have a look at another machine learning algorithm in my next post: the logistic regression.

This tutorial is part of the Machine Learning Tutorial. You can learn more about Machine Learning by going through this tutorial. On Cloudvane, there are many more tutorials about (Big) Data, Data Science and alike, read about them in the Big Data Tutorials here. If you look for great datasets to play with, I would recommend you Kaggle.

Now we have learned how to write a Linear Regression model from hand in our last tutorial. Also, we had a look at the prediction error and standard error. Today, we want to focus on a way how to measure the performance of a model. In marketing, a common methodology for this is lift and gain charts. They can also be used for other things, but in our today’s sample we will use a marketing scenario.

The marketing scenario for Lift and Gain charts

Let’s assume that you are in charge of an outbound call campaign. Basically, your goal is to increase conversions of people contacted via this campaign. Like with most campaigns, you have a certain – limited – budget and thus need to plan the campaign smart. This is where machine learning comes into play: you only want to contact those people that are most relevant to buy the product. Therefore, you contact the top X percent of customers where you rather expect a conversion and avoid contacting those customers that are very unlikely to get converted. We assume that you already built a model for that and that we now do the campaign. We will measure our results with a gain chart, but first let’s create some data.

Our sample data represents all our customers, grouped into decentiles. Basically, we group the customers into top 10%, top 20%, … until we reach all customers. We add the number of conversions to it as well:

Decantile

# of Customers

Conversions

1

200

33

2

200

30

3

200

27

4

200

25

5

200

23

6

200

19

7

200

15

8

200

11

9

200

7

10

200

2

As you can see in the above table, the first decentile contains most conversions and is thus our top group. The conversion rates for each group in percent are:

%
Conversions

17,2%

15,6%

14,1%

13,0%

12,0%

9,9%

7,8%

5,7%

3,6%

1,0%

As you can see, 17.2% of all top 10% customers could be converted. From each group, it declines. So, the best approach is to first contact the top customers. As a next step, we add the cumulative conversions. This number is then used for our cumulative gain chart.

Cumulative
% Conversions

17,2%

32,8%

46,9%

59,9%

71,9%

81,8%

89,6%

95,3%

99,0%

100,0%

Cumulative Gain Chart

With this data, we can now create the cumulative gain chart. In our case, this would look like the following:

The Lift factor

Now, let’s have a look at the lift factor. The base for the lift factor is always the lift 1. This means that there was a random sample selected and no structured approach was done. Basically, the lift factor is the ratio you get between the number of customers contacted in % and the number of conversions for the decentile in %. With our sample data, this lift data would look like the following:

Lift

1,72

1,64

1,56

1,50

1,44

1,36

1,28

1,19

1,10

1,00

Thus we would have a lift factor of 1.72 with the first percentile, decreasing towards the full customer set.

In this tutorial, we’ve learned about how to verify a machine learning model. In the next tutorial, we will have a look at false positives and some other important topics before moving on with Logistic Regression.

This tutorial is part of the Machine Learning Tutorial. You can learn more about Machine Learning by going through this tutorial. On Cloudvane, there are many more tutorials about (Big) Data, Data Science and alike, read about them in the Big Data Tutorials here. If you look for great datasets to play with, I would recommend you Kaggle.

https://i0.wp.com/cloudvane.net/wp-content/uploads/2019/09/screenshot-2019-09-06-at-18.43.13.png?fit=1140%2C804&ssl=18041140Mario Meir-Huberhttp://cloudvane.net/wp-content/uploads/2019/08/cloudvane_small-300x188.pngMario Meir-Huber2019-09-26 22:42:002021-06-20 14:26:24Lift and Gain charts to measure the performance of a model

In my previous posts, I explained the Linear Regression and stated that there are some errors in it. This is called the error of prediction (for individual predictions) and there is also a standard error. A prediction is good if the individual errors of prediction and the standard error are small. Let’s now start by examining the error of prediction, which is called the standard error in a linear regression model.

Error of prediction in Linear regression

Let’s recall the table from the previous tutorial:

Year

Ad Spend (X)

Revenue (Y)

Prediction (Y’)

2013

€ 345.126,00

€ 41.235.645,00

€ 48.538.859,48

2014

€ 534.678,00

€ 62.354.984,00

€ 65.813.163,80

2015

€ 754.738,00

€ 82.731.657,00

€ 85.867.731,47

2016

€ 986.453,00

€ 112.674.539,00

€ 106.984.445,76

2017

€ 1.348.754,00

€ 156.544.387,00

€ 140.001.758,86

2018

€ 1.678.943,00

€ 176.543.726,00

€ 170.092.632,46

2019

€ 2.165.478,00

€ 199.645.326,00

€ 214.431.672,17

We can see that there is a clear difference in between the prediction and the actual numbers. We calculate the error in each prediction by taking the real value minus the prediction:

Y-Y’

-€ 7.303.214,48

-€ 3.458.179,80

-€ 3.136.074,47

€
5.690.093,24

€ 16.542.628,14

€
6.451.093,54

-€ 14.786.346,17

In the above table, we can see how each prediction differs from the real value. Thus it is our prediction error on the actual values.

Calculating the Standard Error

Now, we want to calculate the standard error. First, let’s have a look at the formular:

Basically, we take the sum of all error to the square, divide it by the number of occurrences and take the square root of it. We already have Y-Y’ calculated, so we only need to make the square of it:

Y-Y’

(Y-Y’)^2

-€ 7.303.214,48

€ 53.336.941.686.734,40

-€ 3.458.179,80

€ 11.959.007.558.032,20

-€ 3.136.074,47

€ 9.834.963.088.101,32

€
5.690.093,24

€ 32.377.161.053.416,10

€ 16.542.628,14

€ 273.658.545.777.043,00

€
6.451.093,54

€ 41.616.607.923.053,70

-€ 14.786.346,17

€ 218.636.033.083.835,00

The sum of it is 641.419.260.170.216,00 €

And N is 7, since it contains 7 Elements. Divided by 7, it is: 91.631.322.881.459,50 €

The last step is to take the square root, which results in the standard error of 9.572.425,13 € for our linear regression.

Now, we have most items cleared for our linear regression and can move on to the logistic regression in our next tutorial.

This tutorial is part of the Machine Learning Tutorial. You can learn more about Machine Learning by going through this tutorial. On Cloudvane, there are many more tutorials about (Big) Data, Data Science and alike, read about them in the Big Data Tutorials here. If you look for great datasets to play with, I would recommend you Kaggle.

https://i0.wp.com/cloudvane.net/wp-content/uploads/2019/09/pexels-photo-374918.jpeg?fit=1880%2C1253&ssl=112531880Mario Meir-Huberhttp://cloudvane.net/wp-content/uploads/2019/08/cloudvane_small-300x188.pngMario Meir-Huber2019-09-19 09:25:002021-06-20 14:26:14Machine Learning 101 – The Standard Error in a Linear Regression

In my previous posts, I introduced the basics of machine learning. Today, I want to focus on the two elementary algorithms: linear and logistic regression. Basically, you would learn them at the very beginning of your journey for machine learning, but eventually not use them much later on any more. But to understand the concepts of it, it is helpful to understand them.

Linear Regression

A Linear Regression is the simplest model for Data Science. Linear Regression is of supervised learning and used in Trend Analysis, Time-Series Analysis, Risk in Banking and many more.

In a linear regression, a relationship between a dependent variable y and a dataset of x_{n} is linear. This basically means, that if there is data of a specific trend, a future trend can be predicted. Let’s assume that there is a significant relation between ad spendings and sales. We would have the following table:

Year

Ad Spend

Revenue

2013

€ 345.126,00

€ 41.235.645,00

2014

€ 534.678,00

€ 62.354.984,00

2015

€ 754.738,00

€ 82.731.657,00

2016

€ 986.453,00

€ 112.674.539,00

2017

€ 1.348.754,00

€ 156.544.387,00

2018

€ 1.678.943,00

€ 176.543.726,00

2019

€ 2.165.478,00

€ 199.645.326,00

If you look at the data, it is very easy to figure out that that there is some kind of relation between how much money you spend on the ads and the revenue you get. Basically, the ratio is 1:92 to 1:119. Please not that I totally made up the numbers. however, based on this numbers, you could basically predict what revenues to obtain when spending X amount of data. The relation between them is therefore linear and we can easily plot it on a line chart:

As you can see, some of the values are above the line and others below. Let’s now manually calculate the linear function. There are some steps necessary that should eventually lead to the prediction values. Let’s assume we want to know if we spend a specific money on ads, what revenue we can expect. Let’s assume we want to know how much value we create for 1 Million spend on ads. The linear regression function for this is:

predicted score (Y') = bX + intercept (A)

This means that we now need to calculate several values: (A) the slope (it is our “b” and the intercept (it is our A). X is the only value we know – our 1 Million spend. Let’s first calculate the slope

Calculating the Slope

The first thing we need to do is calculating the slope. For this, we need to have the standard deviation of both X and XY. Let’s first start with X – our revenues. The standard deviation is calculated for each revenue individually. There are some steps involved:

Creating the average of the revenues

Subtracting the individual revenue

Building the square

The first step is to create the average of both values. The average for the revenues should be: € 118.818.609,14 and the average for the spend should be: € 1.116.310,00.

Next, we need to create the standard deviation of each item. For the ad spend, we do this by substracting each individual ad spend and building the square. The table for this should look like the following:

The formular is: (Average of Ad spend – ad spend) ^ 2

Year

Ad spend

Stddev (X)

2013

€ 345.126,00

€ 594.724.761.856,00

2014

€ 534.678,00

€ 338.295.783.424,00

2015

€ 754.738,00

€ 130.734.311.184,00

2016

€ 986.453,00

€ 16.862.840.449,00

2017

€ 1.348.754,00

€ 54.030.213.136,00

2018

€ 1.678.943,00

€ 316.555.892.689,00

2019

€ 2.165.478,00

€ 1.100.753.492.224,00

Quite huge numbers already, right? Now, let’s create the standard deviation for the revenues. This is done by taking the average of the ad spend – ad spend and multiplying it with the same procedure for the revenues. This should result in:

Year

Revenue

Y_Ad_Stddev

2013

€ 41.235.645,00

€ 59.830.740.619.545,10

2014

€ 62.354.984,00

€ 32.841.051.219.090,30

2015

€ 82.731.657,00

€ 13.048.031.460.197,10

2016

€ 112.674.539,00

€ 797.850.516.541,00

2017

€ 156.544.387,00

€ 8.769.130.708.225,71

2018

€ 176.543.726,00

€ 32.478.055.672.684,90

2019

€ 199.645.326,00

€ 84.800.804.871.574,80

Now, we only need to sum up the columns for Y and X. The sums should be:

€ 2.551.957.294.962,00 for the X-Row € 232.565.665.067.859,00 for the Y-Row

Now, we need to divide the Y-Row by the X-Row and would get the following slope: 91,1322715

Calculating the Intercept

The intercept is somewhat easier. The formular for it is: average(y) – Slope * average(x). We already have all relevant variables calculated in our previous step. Our intercept should equal: € 17.086.743,14.

Predicting the value with the Linear Regression

Now, we can build our function. This is: Y = 91,1322715X + 17.086.743,14

As stated in the beginning, our X should be 1 Million and we want to know our revenue: € 108.219.014,64

The prediction is actually lower than the values which are closer (2016 and 2017 values). If you change the values to 2 Million or 400k, it will again get closer. Predictions always produce some errors and they are normally shown. Therefore, the error table would look like the following:

ad spent

real revenue (Y)

prediction (Y’)

error

2013

€ 345.126,00

€ 41.235.645,00

€ 48.538.859,48

-€ 7.303.214,48

2014

€ 534.678,00

€ 62.354.984,00

€ 65.813.163,80

-€ 3.458.179,80

2015

€ 754.738,00

€ 82.731.657,00

€ 85.867.731,47

-€ 3.136.074,47

2016

€ 986.453,00

€ 112.674.539,00

€ 106.984.445,76

€ 5.690.093,24

2017

€ 1.348.754,00

€ 156.544.387,00

€ 140.001.758,86

€ 16.542.628,14

2018

€ 1.678.943,00

€ 176.543.726,00

€ 170.092.632,46

€ 6.451.093,54

2019

€ 2.165.478,00

€ 199.645.326,00

€ 214.431.672,17

-€ 14.786.346,17

The error calculation is done by using the real value and deducting the predicted value from it. And voila – you have your error. One common thing in machine learning is to reduce the error and make predictions more accurate.

This tutorial is part of the Machine Learning Tutorial. You can learn more about Machine Learning by going through this tutorial. On Cloudvane, there are many more tutorials about (Big) Data, Data Science and alike, read about them in the Big Data Tutorials here. If you look for great datasets to play with, I would recommend you Kaggle.

A current trend in AI is not a much technical one – it is rather a societal one. Basically, technologies around AI in Machine Learning and Deep Learning are getting more and more complex. This is making it even more complex for humans to understand what is happening and why a prediction is happening. The current approach in „throwing data in, getting a prediction out“ is not necessarily working for that. It is somewhat dangerous building knowledge and making decisions based on algorithms that we don‘t understand. To solve this problem, we need to have explainable AI.

What is explainable AI?

Explainable AI is getting even more important with new developments in the AI space such as Auto ML. With Auto ML, the system takes most of the data scientist‘s work. It needs to be ensured that everyone understands what‘s going on with the algorithms and why a prediction is happening exactly the way it is. So far (and without AutoML), Data Scientists were basically in charge of the algorithms. At least there was someone that could explain an algorithm. NOTE: it didn‘t prevent us from bias in it, nor will AutoML do. With AutoML, when the tuning and algorithm selection is done more or less automatically, we need to ensure to have some vital and relevant documentation of the predictions available.

And one last note: this isn‘t a primer against AutoML and tools that do so – I believe that democratisation of AI is an absolute must and a good thing. However, we need to ensure that it stays – explainable!

https://i0.wp.com/cloudvane.net/wp-content/uploads/2019/02/creativecommons-robot.jpg?fit=1280%2C853&ssl=18531280Mario Meir-Huberhttp://cloudvane.net/wp-content/uploads/2019/08/cloudvane_small-300x188.pngMario Meir-Huber2019-06-12 20:50:482019-06-12 20:50:48Explainable AI: Why we need to explain AI for Data Science