Big Data was a buzz-word over the last years, but it started to prove value in the last years as well. Major companies started to develop Big Data strategies and made their organisation fit to become “data driven”. However there is still some way to go. Therefore, my 5 predictions for 2019 are:
1. Big Data Services will become more popular on the Cloud, impacting the business model for On-Premise Hadoop providers
One obvious development in the last year was that the object stores (such as Amazon S3) became more popular for processing large amounts of data. This heavily threats the “traditional” HDFS based solutions, which are fully built on Hadoop. With Object Stores, HDFS and thus Hadoop becomes obsolete. Processing is done now with Spark. Also, most established Cloud providers started to create automated Spark services. This gives the customers more flexibility over traditional solutions. However, traditional HDFS still brings some advantages over Object stores such as fine-granular security and data governance. We will see improvements in cloud based object stores over the next year(s) to overcome those obstacles. But anyhow: in my opinion, the Hortonworks/Cloudera merger this year didn’t come out of a position of strength but rather from the future threats that arise – from the cloud. And running Hadoop in a full distribution in the cloud isn’t smart from an economy point of view.
2. Traditional database providers will see shrinking revenues for their proprietary solutions
Data warehouse providers such as Teradata struggle with decreasing revenues from their core business model. We could see this over the last years with shrinking revenue statements and declining market capitalisation. This trend will continue in 2019 and increases in it’s pace – but isn’t at the fasted depreciation yet. With companies becoming more mature in data, they will increasingly see that overly expensive data warehouses aren’t necessary anymore. However, data warehousing will always exist – also in a relational way. But this might also move to economically more relevant platforms. Anyway, data warehouse providers need to change their business models. They have huge potential to become a major player in the Big Data and Analytical ecosystem.
3. Big Data will become faster
This is another trend that emerged over the last year. With the discussion of Kappa vs. Lambda for the Big Data technology stack, this trend was becoming more attention recently. Real-Time platforms become more and more economical and thus making it easier to move towards faster execution. Also, customers expect fast results and internal (business) departments don’t want to wait forever for results.
4. Big Data (and Analytics) isn’t a buzz-word anymore and sees significant investment within companies
As already mentioned in my opening statement, Big Data isn’t a buzzword anymore and companies putting some significant investments into it. This is now also perceived from the c-level. Executives see the need for their companies to become data driven in all aspects. The backbone of digitalisation is data, and if they want to succeed in digitalisation, the data aspects has to be mastered first. Banking and Telecommunication already started this journey in the last years and thus has significant knowledge gathered in this, other industries – also very traditional ones – will follow in 2019. Initiatives will now turn into programs with organisational alignments.
5. Governance is now perceived as a key challenge for Big Data
Data Governance was always something nobody wanted to care about. It didn’t give much benefits to business functions (you don’t see governance) and it wasn’t put into the big context. Now, with big data being put into production in large enterprises, data governance comes up as an important topic again. Basically, companies should start with data governance at the very beginning, since it is too hard to do it afterwards. Also, a good data governance strategy enables the company to be faster with analytics. The aim on this should be self-service analytics, which can only be achieved with a great data governance strategy.
Hi Mario, thanks for sharing this article, I like it. Why do you mean it wouldn’t be smart to run Hadoop in a full distribution on the cloud? I believe, the collection of tools and frameworks, provide meanwhile everything you need for implementing a high performance Big data solution. From my own experience, I would even say that cloud services (PaaS/SaaS) sometimes have a lack of transparency and flexibility. Patrick
Hi Patrick, it depends on the use-case. but a lot of use-cases only need Spark and some data – without a full-blown Hadoop environment. Other use-cases need advanced features from distributions like cloudera or hortonworks … so it highly depends. but at least there is choice and an optimum can be achieved by having different options. Also, I would argue that standalone spark is more efficient than Spark within a distribution. so one always has to evaluate was is needed for a specific case …
And yes, I agree with the PaaS/SaaS transparency issues – with Big Data PaaS I was rather thinking on Spark as a Service – e.g. Azure Databricks. Here you basically get Open Source Spark but in a fully managed env. I would say that in the next years we will move away from managing our Spark/Hadoop/Kafka envs (also due to containers)
Hi Mario, thanks for your response. I agree, this is very much depening on the use case.
However, by having in mind complete big data solutions, I still believe self-managed environments are here to stay. Patrick
At least for the next years. I guess there will rather be a hybrid approach