Machine Learning 101 – Clustering, Regression and Classification


In my last post of this series, I explained the concept of supervised, unsupervised and semi-supervised machine learning. In this post, we will go a bit deeper into machine learning (but don’t worry, it won’t be that deep yet!) and look at more concrete topics. But first of all, we have to define some terms, which basically derive from statistics or mathematics. These are: Features Labels Features are known values, which are often used to calculate results. This are the variables that have an impact on a prediction. If we talk about manufacturing, we might want to reduce junk in our production line. Known features from a machine could then be: Temperature, Humidity, Operator, Time since last service. Based on these Features, we can later calculate the quality of the machine output Labels are the values we want to build the prediction on. In training data, labels are mostly known, but for the prediction they are not known. When we

read more Machine Learning 101 – Clustering, Regression and Classification

Machine Learning 101 – Supervised and Unsupervised Learning


I teach Big Data & Data Science at several universities and I work in that field also. Since I wrote a lot here on Big Data itselve and there are now many young professionals deciding if they want to go for data science, I decided to write a short intro series to machine learning. After this intro, you should be capable of getting deeper into this topic and know where to start. To kick off the series, we’ll go over some basics of machine learning. One of the main ideas behind that is to find patterns in data and make predictions on that data without the need to develop each and every use-case from scratch. Therefore, a certain number of algorithms are available. These algorithms can be “classified” by how they work. the main two principles (which then can also be spilt) are: Supervised Learning Unsupervised Learning Semi-supervised Learning With supervised learning, the algorithm learns basically by existing data and

read more Machine Learning 101 – Supervised and Unsupervised Learning