Data is the key to Marketing Automation


One topic every company is currently discussing on high level is the topic of marketing automation. It is a key factor to digitalisation of the marketing approach of a company. With Marketing Automation, we have the chance that marketing gets much more precise and to the point. No more unnecessary marketing spent, every cent spent wise – and no advertisement overloading. So far, this is the promise from vendors if we would all live in a perfect world. But what does it take to live in this perfect marketing world? DATA. One disclaimer upfront: I am not a marketing expert. I try to enable marketing to achieve these goals by the utilisation of our data – next to other tasks. Data is the weak point in Marketing Automation. If you have bad data, you will end up having bad Marketing Automation. Data is the engine or the oil for Marketing Automation. But why is it so crucial to get the

read more Data is the key to Marketing Automation

Hadoop where are you heading to?


One of my 5 predictions for 2019 is about Hadoop. Basically, I do expect that a lot of projects won’t take Hadoop as a full-blown solution anymore. Why is that? Basically, one of the most exciting news in 2018 was the merger between Hortonworks and Cloudera. The two main competitors now joining forces? How can this happen? Basically, I do believe that a lot of that didn’t come out of a strength of the two and that they somehow started to “love” each other but rather out of economical calculations. Now, it isn’t a competition between Hortonworks or Cloudera anymore (even before the merger), it is rather Hadoop vs. new solutions. These solutions are highly diversified – Apache Spark is one of the top competitors to it. But there are also other platforms such as Apache Kafka and some NoSQL databases such as MongoDB, plus TensorFlow emerging. One would now argue that all of that is included in a Cloudera

read more Hadoop where are you heading to?

My top 5 Analytics and AI predictions for 2019


This is the last post of my series about the topics I care most about. This time, I will focus on Analytics and AI. Especially the last topic (AI) has been a major buzz-word this year, so it is interesting to see what might happen in 2019. Therefore, my predictions for 2019 are: 1. Governance will be seen as major enabler – or blocker – for self-service analytics. Self-service Analytics will become a key goal for most companies Let’s stay on the ground: a “deal-breaker” for Advanced Analytics and Data Science is often the inability to access data (fast) or bad data quality. Both topics can be handled well if data governance is treated with major investments within enterprises. I often see data scientists waiting for days or weeks to access data. Once they have access to data, they only figure out that the quality is very bad. Let’s face it: data governance wasn’t important in enterprises nor attractive. Nobody

read more My top 5 Analytics and AI predictions for 2019

My top 5 Cloud predictions for 2019


Another year has passed and 2018 has been a really great year for the cloud. We can now really say that cloud is becoming a commodity and common sense. After years of arguing why the cloud is useful, this discussion is now gone. Nobody doubts the benefits of the cloud anymore. The next year, most developments that already started in 2018 will continue and intensify. My predictions for 2019 won’t be revolutionary but the trends we will see in the short period of this year. Therefore, my 5 predictions for 2019 are: 1. Strong growth in the cloud will continue, but it won’t be hyper growth anymore In the past years, companies such as Amazon or Microsoft saw significant growth rates in their cloud business. These numbers will still go up by a large, double digit, growth rate for all major cloud providers (not just the two of them). However, overall growth will be slower than previous years as the

read more My top 5 Cloud predictions for 2019

Cloud is not the future


Now you probably think: is Mario crazy? In fact, during this post, I will explain why cloud is not the future. First, let’s have a look at the economic facts of the cloud. If we look at share prices of companies providing cloud services, it is rather easy to say: those shares are skyrocketing! (Not mentioning recent drops in some shares, but these are rather market dynamics than real valuations). One thing is also about overall company performances: the income of companies providing cloud services increased a lot. Have a look at the major cloud providers such as AWS, Google, Oracle or Microsoft: they make quite a lot of their revenue now with cloud services. So, obviously here, my initial statement seems to be wrong. So why did I just choose this one? Still crazy? Let’s look at another explanation on this: it might be all about technology, right? I was recently playing with AWS API Gateway and AWS Lambda.

read more Cloud is not the future

The Datalake as driver for digital transformation & data centricity


Everyone (or at least most) companies today talk about digital transformation and treat data as a main asset for this. The question is where to store this data. In a traditional database? In a DWH? I think we should take a step back to answer this question. First of all, a Datalake is not a single piece of software. It consists of a large variety of Platforms, where Hadoop is a central one, but not the only one – it includes other tools such as Spark, Kafka, … and many more. Also, it includes relational Databases – such as PostgreSQL for instance. If we look at how truly digital companies such as Facebook, Google or Amazon solve these problems, then the technology stack is also clear; in fact, they heavily contribute to and use Hadoop & similar technologies. So the answer is clear: you don’t need overly expensive DWHs any more. However, many C-Level executives might now say: “but we’ve

read more The Datalake as driver for digital transformation & data centricity

How to: Start and Stop Cloudera on Azure with the Azure CLI


The Azure CLI is my favorite tool to manage Hadoop Clusters on Azure. Why? Because I can use the tools I am used to from Linux now from my Windows PC. In Windows 10, I am using the Ubuntu Bash for that, which gives me all the major tools for managing remote Hadoop Clusters. One thing I am doing frequently, is starting and stopping Hadoop Clusters based on Cloudera. If you are coming from Powershell, this might be rather painfull for you, since you can only start each vm in the cluster sequentially, meaning that a cluster consisting of 10 or more nodes is rather slow to start and might take hours! In the Azure CLI I can easily do this by specifiying “–nowait” and all runs in parallel. The only disadvantage is that I won’t get any notifications on when the cluster is ready. But I am doing this with a simple hack: ssh’ing into the cluster (since I

read more How to: Start and Stop Cloudera on Azure with the Azure CLI

Why building Hadoop on your own doesn’t make sense


There are several things people discuss when it comes to Hadoop and there are some wrong discussions. First, there is a small number of people believing that Hadoop is a hype that will end at some point in time. They often come from a strong DWH background and won’t accept (or simply ignore) the new normal. But there are also some people that basically coin two major sayings: the first group of people states that Hadoop is cheap because it is open source and the second group of people states that Hadoop is expensive because it is very complicated. (Info: by Hadoop, I also include Spark and alike) Neither the one nor the other is true. First, you can download it for free and install it on your system. This makes it basically free in terms of licenses, but not in terms of running it. When you get a vanilla Hadoop, you will have to think about hotfixes, updates, services,

read more Why building Hadoop on your own doesn’t make sense

RACEing to agile Big Data Analytics


I am happy to announce the development we did over the last month within Teradata. We developed a light-weight process model for Big Data Analytic projects, which is called “RACE”. The model is agile and resembles the know-how of more than 25 consultants that worked in over 50 Big Data Analytic projects in the recent month. Teradata also developed CRISP-DM, the industry leading process for data mining. Now we invented a new process for agile projects that addresses the new challenges of Big Data Analytics. Where does the ROI comes from? This was one of the key questions we addressed when developing RACE. The economics of Big Data Discovery Analytics are different to traditional Integrated Data Warehousing economics. ROI comes from discovering insights in highly iterative projects run over very short time periods (4 to 8 weeks usually) Each meaningful insight or successful use case that can be actioned generates ROI. The total ROI is a sum of all the

read more RACEing to agile Big Data Analytics