Big Data was a buzz-word over the last years, but it started to prove value in the last years as well. Major companies started to develop Big Data strategies and made their organisation fit to become “data driven”. However there is still some way to go. Therefore, my 5 predictions for 2019 are: 1. Big Data Services will become more popular on the Cloud, impacting the business model for On-Premise Hadoop providers One obvious development in the last year was that the object stores (such as Amazon S3) became more popular for processing large amounts of data. This heavily threats the “traditional” HDFS based solutions, which are fully built on Hadoop. With Object Stores, HDFS and thus Hadoop becomes obsolete. Processing is done now with Spark. Also, most established Cloud providers started to create automated Spark services. This gives the customers more flexibility over traditional solutions. However, traditional HDFS still brings some advantages over Object stores such as fine-granular security
In my last post of this series, I explained the concept of supervised, unsupervised and semi-supervised machine learning. In this post, we will go a bit deeper into machine learning (but don’t worry, it won’t be that deep yet!) and look at more concrete topics. But first of all, we have to define some terms, which basically derive from statistics or mathematics. These are: Features Labels Features are known values, which are often used to calculate results. This are the variables that have an impact on a prediction. If we talk about manufacturing, we might want to reduce junk in our production line. Known features from a machine could then be: Temperature, Humidity, Operator, Time since last service. Based on these Features, we can later calculate the quality of the machine output Labels are the values we want to build the prediction on. In training data, labels are mostly known, but for the prediction they are not known. When we
As 2016 is around the corner, the question is what this year will bring for Big Data. Here are my top assumptions for the year to come: The growth for relational databases will slow down, as more companies will evaluate Hadoop as an alternative to classic rdbms The Hadoop stack will get more complicated, as more and more projects are added. It will almost take a team to understand what each of these projects does Spark will lead the market for handling data. It will change the entire ecosystem again. Cloud vendors will add more and more capability to their solutions to deal with the increasing demand for workloads in the cloud We will see a dramatic increase of successful use-cases with Hadoop, as the first projects come to a successful end What do you think about my predictions? Do you agree or disagree?
2 Big Data and Hadoop E-Books are available at a special promotion. The reduced price is only valid for 1 week, so make sure to order soon! The offer expires on 21th of December and are available on the Kindle store. The two E-Books are: Big Data (Introduction); 0.99$ instead of 5$: Get it here Hadoop (Introduction); 0.99$ instead of 5$: Get it here Have fun reading it!
2016 is around the corner and the question is, what the next year might bring. I’ve added my top 5 predictions that could become relevant for 2016: The Cloud war will intensify. Amazon and Azure will lead the space, followed (with quite some distance) by IBM. Google and Oracle will stay far behind the leading 2+1 Cloud providers. Both Microsoft and Amazon will see significant growth, with Microsoft’s growth being higher, meaning that Microsoft will continue to catch up with Amazon More PaaS Solutions will arrive. All major vendors will provide PaaS solutions on their platform for different use-cases (e.g. Internet of Things). These Solutions will become more industry-specific (e.g. a Solution specific for manufacturing workflows, …) Vendors currently not using the cloud will see declines in their income, as more and more companies move to the cloud Cloud Data Centers will become more often outsourced from the leading providers to local companies, in order to overcome local legislation Big
On the 15th of December, a Big Data Meetup will take place in Vienna, with leading personals from Fraunhofer, Rapidminer, Teradata et al. About the Meetup: The growing digitization and networking process within our society has a large influence on all aspects of everyday life. Large amounts of data are being produced permanently, and when these are analyzed and interlinked they have the potential to create new knowledge and intelligent solutions for economy and society. Big Data can make important contributions to the technical progress in our societal key sectors and help shape business. What is needed are innovative technologies, strategies and competencies for the beneficial use of Big Data to address societal needs. Climate, Energy, Food, Health, Transport, Security, and Social Sciences – are the most important societal challenges tackled by the European Union within the new research and innovation framework program “Horizon 2020”. In every one of these fields, the processing, analysis and integration of large amounts of
I am happy to announce that there is a partnership between the Data Natives conference and Cloudvane. Once again, one lucky person can get a free ticket to this conference. The conference takes place from 19th to 20th November in Berlin. What’s necessary for you to get the ticket: Share the blog post (Twitter, LinkedIn, Facebook) and send the proof of that to me via mail Write a review (ideally with some pictures) Data Natives focuses on three key areas of innovation: Big Data, IoT and FinTech. The intersection of these product categories is home to the most exciting technology innovation happening today. Whether it’s for individual consumers or multi-billion dollar industries, the opportunity is immense. Come and learn more from leading scientists, founders, analysts, investors and economists coming from Google, SAP, Rocket Internet,Gartner, Forrester among others. Two days full of interesting talks, sharing knowledge from 50+ speakers and engaging with the community of a data driven generation of more
I saw so many Big Data “initiatives” in the last month in companies. And guess what? Most of them failed either completely or simply didn’t deliver the results expected. A recent Gartner study even mentioned that only 20% of Hadoop projects are put “live”. But why do these projects fail? What is everyone doing wrong? Whenever customers are coming to me, they “heard” of what Big Data can help them with. So they looked at 1-3 use cases and now want to have them put into production. However, this is where the problem starts: they are not aware of the fact that also Big Data needs a strategic approach. To get this right, it is necessary to understand the industry (e.g. TelCo, Banking, …) and associated opportunities. To achieve that, a Big Data roadmap has to be built. This is normally done in a couple of workshops with the business. This roadmap will then outline what projects are done in
Everyone is doing Big Data these days. If you don’t work on Big Data projects within your company, you are simply not up to date and don’t know how things work. Big Data solves all of your problems, really! Well, in reality this is different. It doesn’t solve all your problems. It actually creates more problems then you think! Most companies I saw recently working on Big Data projects failed. They started a Big Data project and successfully wasted thousands of dollars on Big Data projects. But what exactly went wrong? First of all, Big Data is often only seen as Hadoop. We live with the mis-perception that only Hadoop can solve all Big Data topics. This simply isn’t true. Hadoop can do many things – but real data science is often not done with the core of Hadoop. Ever talked to someone doing the analytics (e.g someone good in math or statistics)?. They are not ok with writing Java
When working with the main Hadoop services, it is not necessary to work with the console at all time (event though this is the most powerful way of doing so). Most Hadoop distributions also come with a User Interface. The user interface is called “Apache Hue” and is a web-based interface running on top of a distribution. Apache Hue integrates major Hadoop projects in the UI such as Hive, Pig and HCatalog. The nice thing about Apache Hue is that it makes the management of your Hadoop installation pretty easy with a great web-based UI. The following screenshot shows Apache Hue on the Cloudera distribution. Apache Hue