One topic every company is currently discussing on high level is the topic of marketing automation and marketing data. It is a key factor to digitalisation of the marketing approach of a company. With Marketing Automation, we have the chance that marketing gets much more precise and to the point. No more unnecessary marketing spent, every cent spent wise – and no advertisement overloading. So far, this is the promise from vendors if we would all live in a perfect world. But what does it take to live in this perfect marketing world? DATA.

What is so hot on Marketing data?

One disclaimer upfront: I am not a marketing expert. I try to enable marketing to achieve these goals by the utilisation of our data – next to other tasks. Data is the weak point in Marketing Automation. If you have bad data, you will end up having bad Marketing Automation. Data is the engine or the oil for Marketing Automation. But why is it so crucial to get the data right for it?

As of now, Data was never seen as a strategic asset within companies. It was rather treated like something that you have to store somewhere. So it ended up being stored in silos within different departments. Making it access hard and connections difficult. Also, governance was and is still neglected. When data scientists start to work with data, they often fight governance issues – what is inside the data, why is data structured in a specific way and what should the data tell us? This process often takes weeks to overcome and is expensive.

Some industries (e.g. banks) are more mature, but are also struggling with this. In the last years, a lot of companies built data warehouses to consolidate their view on the data. Data warehouses are heavily outdated and overly expensive nowadays and still most till now most dwh’s are poorly structured. In the last years, companies started to shift data to datalakes (initially Hadoop) to get a 360° view. Economically, this makes perfect sense, but also there a holistic customer model is a challenge. It takes quite some time and resources to build this.

The newest hype in marketing are now Customer Data Platforms (CDPs). The value of CDPs aren’t proved yet. But most of them are an abstraction layer to make data handling for marketeers easier. However, integrating the data to the CDPs is challenging itself and there is a high risk of another data silo.

In order to enable Marketing Automation with data, the following steps are necessary

  • Get your data house in order. Build your data assets on open standards to change technology and vendor if necessary. Don’t lock in your data to one vendor
  • Do the first steps in small chunks, closely aligned with Marketing – in an agile way. Customer journeys are often dedicated to specific data sources and thus a full-blown model isn’t necessary. However, make sure that the model stays extensible and the big picture is always available. A recommendation is to use a NoSQL store such as Document stores for the model.
  • Keep the data processing on the datalake, the abstraction layer (I call it Customer 360) interacts with the datalake and uses tools out of it
  • Do Governance in the first steps. It is too difficult to do it at a later stage. Establish a data catalog for easy retrieval, search and data quality metrics/scoring.
  • Establish a central identity management and household management. A 360 degree view of the customer helps a lot.

With Marketing Automation, we basically differentiate 2 different types of data (so, a Lambda Architecture is my recommendation for it):

  • Batch data. This kind of data doesn’t change frequently – such as Customer Details. This data also contains data about models that run on larger datasets and thus require time-series data. Analytical models run on that data are promoted as KPIs or fields to the C360 model
  • Event data. Data that needs to feed into Marketing Automation platforms fast. If this has happened, unnecessary ads should be removed (otherwise, you would loose money)

What’s next?

This is just a high-level view on that, but handling data right for marketing is getting more and more important. And, you need to get your own data in order – you can’t outsource this task.

Let me know what challenges you had with this so far, as always – looking forward to discuss this with you 🙂

This post is part of the “Big Data for Business” tutorial. In this tutorial, I explain various aspects of handling data right within a company. If you want to learn more about Marketing Automation, I recommend you reading this article.

One of my 5 predictions for 2019 is about Hadoop. Basically, I do expect that a lot of projects won’t take Hadoop as a full-blown solution anymore. Why is that? What is the future of Hadoop?

What happend to the future of Hadoop?

Basically, one of the most exciting news in 2018 was the merger between Hortonworks and Cloudera. The two main competitors now joining forces? How can this happen? Basically, I do believe that a lot of that didn’t come out of a strength of the two and that they somehow started to “love” each other but rather out of economical calculations. Now, it isn’t a competition between Hortonworks or Cloudera anymore (even before the merger), it is rather Hadoop vs. new solutions.

These solutions are highly diversified – Apache Spark is one of the top competitors to it. But there are also other platforms such as Apache Kafka and some NoSQL databases such as MongoDB, plus TensorFlow emerging. One would now argue that all of that is included in a Cloudera or Hortonworks distribution, but it isn’t as simple as that. Spark and Kafka founders provider their own distributions of their stack, more lightweight than the complex Hadoop stack. In several use-cases, it is simply not necessary to have a full-blown solution but rather go for a light-weighted one.

The Cloud is the real threat to Hadoop

But the real thread rather comes from something else: the Cloud. Hadoop was always running better on bare-metal and still both pre-merger companies are arguing that in fact Hadoop does better run on bare-metal. Other solutions such as Spark are performing better in the Cloud and built for the Cloud. This is the real threat for Hadoop, since the Cloud is simply something that won’t go away now – with most companies switching to it.

Object stores provide a great and cheap alternative to HDFS and the management of Object Stores is ways easier. I only call it an alternative here since Object Stores still miss several Enterprise Features. However, I expect that the large cloud providers such as AWS and Microsoft will invest significantly in this space and provide great additions to their object stores even this year. Object Stores in the cloud will catch up fast this year – and probably surpass HDFS functionality by 2020. If this happens and the cost benefits remain better than bare-metal Hadoop, there is really no need for it anymore.

On the analytics layer, the cloud is also ways superior. Running dynamic Spark Jobs against data in object stores (or managed NoSQL databases) are impressive. You don’t have to manage Clusters anymore, which takes a lot of pain and headache away from large IT departments. This will increase performance and speed of developments. Another disadvantage I see for the leading Hadoop solutions is their salesforce: they get better compensated for on-prem solutions, so they try to tell companies to keep out of the cloud – which isn’t the best strategy in 2019.

What about enterprise adoption of Hadoop?

However, there is still some hope about Enterprise Integration, which is often handled better from Hadoop distributions. And even though the entire world is moving on the Cloud, there are still many legacy systems running on-premise. Also, after the HWX/Cloudera merger, their mission statement became of being the leading company for big data in the cloud. So if they are going to fully execute this, I am sure that there will be a huge market share ahead of them – and the initially described threads could even be turned down. Let’s see what 2019 and 2020 will bring in this respect and what the future of Hadoop might bring.

This post is part of the “Big Data for Business” tutorial. In this tutorial, I explain various aspects of handling data right within a company

This is the last post of my series about the topics I care most about. This time, I will focus on Analytics and AI. Especially the last topic (AI) has been a major buzz-word this year, so it is interesting to see what might happen in 2019. Therefore, my predictions for 2019 are:

1. Governance will be seen as major enabler – or blocker – for self-service analytics. Self-service Analytics will become a key goal for most companies

Let’s stay on the ground: a “deal-breaker” for Advanced Analytics and Data Science is often the inability to access data (fast) or bad data quality. Both topics can be handled well if data governance is treated with major investments within enterprises. I often see data scientists waiting for days or weeks to access data. Once they have access to data, they only figure out that the quality is very bad. Let’s face it: data governance wasn’t important in enterprises nor attractive. Nobody I know was stating that he applied a great data governance strategy. Other topics are more interesting to talk about. Nevertheless, if an enterprise continues to treat data governance as done till now, it will block data science from being successful. A lot of consulting companies currently market the term “self-service analytics” – but this is simply not achievable without data governance in place. Next year, more and more companies will figure this out and either apply a data governance strategy or risk to fail with their data driven efforts.

2. AI will continue to be a buzz-word, creating even more confusion in 2019 than it did before

I don’t know how you felt about AI the past year, but I had some really great “aha” moments. A lot of vendors approached me and wanted to talk about their great AI solutions. When I started to ask questions, the answer from (sales staff) was – “don’t worry, our AI takes care of it”. When looking under the hood of the technologies, it was often just a simple rules engine – no smart AI! I started to call this “rules based AI”, as there was no magic involved. When asking some vendors how they would explain AI, they simply said: “don’t worry, only the smartest people understand it”. I found this to be sort of offensive as they considered themselves as not smart enough – and even me :). I even asked if their AI is already rules based, and they said yes. So, one thing is very clear: AI is a buzz-word. Everyone is talking about it, but hardly anyone understands it. Same story as with the cloud, just some years ago. This trend will continue and finding real AI solutions (no, I won’t mention which I would consider as real – no ad placement in here) will be tricky. Many companies will buy “AI” solutions as it is trendy and they want to be part of it or simply don’t want to loose in this growing market. However, many of them will figure out that their AI isn’t as smart as they would have hoped for.

3. Google will use it’s advantage in AI to catch up in the Cloud

This basically reflects what I already wrote in the post some days ago in the Cloud. When it comes to the cloud, the #3 in the market is definitely Google. They entered the market somewhat later than AWS or Microsoft did. However, they offer a very interesting portfolio and competitive pricing. A key strength Google has is their AI and Analytics services, as the company itself is very data driven. Google really knows how to handle and analyse data much more than the two others do, so it is very likely that Google will use this advantage to gain shares from their competitors. I am exited about next Google I/O and what will be shown there in terms of Analytics and AI. 

4. Voice is the new Bacon

One of the many things AI should solve is voice recognition. It is one of the strength of mankind and one key development factor for us becoming what we are. With AI, we already see significant advances in intend recognition for written text (e.g. Chat, E-Mail, …). We carried out a project recently and could classify intent in e-mails with very little effort. However, voice is still an issue – especially if you are operating in a market with 4-6 million native speakers only. In order to go for significant automation of customer care, it is inevitable to go for voice recognition. But will it work? Ask yourself. Do you have Alexa or Google Home at your flat? Yes? I think we can answer this immediately. It only works poorly and is somewhat of an issue. However, next year, we will see significant improvements in this space, mainly driven by business demand. When we look at what Google presented during Google IO, this is the way to go. I believe that this year we will see much more of these services. Expect a lot to come in 2019 around Voice.

5. Rise of the python

Python is already the most popular language when it comes to data science. However, other languages like R are still in this space. Now, since many new data scientists come fresh from the universities with an IT background / major, python will continue to grow. This will also be reflected in new packages and add-ons for Python. Other languages won’t see so much effort and new and exciting tools will be available for Python only. Python still lacks capabilities for data visualisation when compared to R, but also this will change during 2019 and Python will continue to grow for this as well.

So, this are my 5 predictions for Data Science and AI. What do you think? Where do you agree or disagree? I am looking forward to our discussion!

Big Data was a buzz-word over the last years, but it started to prove value in the last years as well. Major companies started to develop Big Data strategies and made their organisation fit to become “data driven”. However there is still some way to go. Therefore, my 5 predictions for 2019 are:

1. Big Data Services will become more popular on the Cloud, impacting the business model for On-Premise Hadoop providers

One obvious development in the last year was that the object stores (such as Amazon S3) became more popular for processing large amounts of data. This heavily threats the “traditional” HDFS based solutions, which are fully built on Hadoop. With Object Stores, HDFS and thus Hadoop becomes obsolete. Processing is done now with Spark. Also, most established Cloud providers started to create automated Spark services. This gives the customers more flexibility over traditional solutions. However, traditional HDFS still brings some advantages over Object stores such as fine-granular security and data governance. We will see improvements in cloud based object stores over the next year(s) to overcome those obstacles. But anyhow: in my opinion, the Hortonworks/Cloudera merger this year didn’t come out of a position of strength but rather from the future threats that arise – from the cloud. And running Hadoop in a full distribution in the cloud isn’t smart from an economy point of view.

2. Traditional database providers will see shrinking revenues for their proprietary solutions

Data warehouse providers such as Teradata struggle with decreasing revenues from their core business model. We could see this over the last years with shrinking revenue statements and declining market capitalisation. This trend will continue in 2019 and increases in it’s pace – but isn’t at the fasted depreciation yet. With companies becoming more mature in data, they will increasingly see that overly expensive data warehouses aren’t necessary anymore. However, data warehousing will always exist – also in a relational way. But this might also move to economically more relevant platforms. Anyway, data warehouse providers need to change their business models. They have huge potential to become a major player in the Big Data and Analytical ecosystem.

3. Big Data will become faster

This is another trend that emerged over the last year. With the discussion of Kappa vs. Lambda for the Big Data technology stack, this trend was becoming more attention recently. Real-Time platforms become more and more economical and thus making it easier to move towards faster execution. Also, customers expect fast results and internal (business) departments don’t want to wait forever for results. 

4. Big Data (and Analytics) isn’t a buzz-word anymore and sees significant investment within companies

As already mentioned in my opening statement, Big Data isn’t a buzzword anymore and companies putting some significant investments into it. This is now also perceived from the c-level. Executives see the need for their companies to become data driven in all aspects. The backbone of digitalisation is data, and if they want to succeed in digitalisation, the data aspects has to be mastered first. Banking and Telecommunication already started this journey in the last years and thus has significant knowledge gathered in this, other industries – also very traditional ones – will follow in 2019. Initiatives will now turn into programs with organisational alignments. 

5. Governance is now perceived as a key challenge for Big Data

Data Governance was always something nobody wanted to care about. It didn’t give much benefits to business functions (you don’t see governance) and it wasn’t put into the big context. Now, with big data being put into production in large enterprises, data governance comes up as an important topic again. Basically, companies should start with data governance at the very beginning, since it is too hard to do it afterwards. Also, a good data governance strategy enables the company to be faster with analytics. The aim on this should be self-service analytics, which can only be achieved with a great data governance strategy.