As 2016 is around the corner, the question is what this year will bring for Big Data. Here are my top assumptions for the year to come:

  • The growth for relational databases will slow down, as more companies will evaluate Hadoop as an alternative to classic rdbms
  • The Hadoop stack will get more complicated, as more and more projects are added. It will almost take a team to understand what each of these projects does
  • Spark will lead the market for handling data. It will change the entire ecosystem again.
  • Cloud vendors will add more and more capability to their solutions to deal with the increasing demand for workloads in the cloud
  • We will see a dramatic increase of successful use-cases with Hadoop, as the first projects come to a successful end

What do you think about my predictions? Do you agree or disagree?

2 Big Data and Hadoop E-Books are available at a special promotion. The reduced price is only valid for 1 week, so make sure to order soon! The offer expires on 21th of December and are available on the Kindle store. The two E-Books are:

  • Big Data (Introduction); 0.99$ instead of 5$: Get it here
  • Hadoop (Introduction); 0.99$ instead of 5$: Get it here

Have fun reading it!

2016 is around the corner and the question is, what the next year might bring. I’ve added my top 5 predictions that could become relevant for 2016:

  • The Cloud war will intensify. Amazon and Azure will lead the space, followed (with quite some distance) by IBM. Google and Oracle will stay far behind the leading 2+1 Cloud providers. Both Microsoft and Amazon will see significant growth, with Microsoft’s growth being higher, meaning that Microsoft will continue to catch up with Amazon
  • More PaaS Solutions will arrive. All major vendors will provide PaaS solutions on their platform for different use-cases (e.g. Internet of Things). These Solutions will become more industry-specific (e.g. a Solution specific for manufacturing workflows, …)
  • Vendors currently not using the cloud will see declines in their income, as more and more companies move to the cloud
  • Cloud Data Centers will become more often outsourced from the leading providers to local companies, in order to overcome local legislation
  • Big Data in the Cloud will grow significantly in 2016 as more companies will put workload to the Cloud for these kind of applications

What do you think? What are your predictions?

On the 15th of December, a Big Data Meetup will take place in Vienna, with leading personals from Fraunhofer, Rapidminer, Teradata et al.
About the Meetup:

The growing digitization and networking process within our society has a large influence on all aspects of everyday life. Large amounts of data are being produced permanently, and when these are analyzed and interlinked they have the potential to create new knowledge and intelligent solutions for economy and society. Big Data can make important contributions to the technical progress in our societal key sectors and help shape business. What is needed are innovative technologies, strategies and competencies for the beneficial use of Big Data to address societal needs.

Climate, Energy, Food, Health, Transport, Security, and Social Sciences – are the most important societal challenges tackled by the European Union within the new research and innovation framework program “Horizon 2020”. In every one of these fields, the processing, analysis and integration of large amounts of data plays a growing role – such as the analysis of medical data, the decentralized supply with renewable energies or the optimization of traffic flow in large cities.

Big Data Europe (BDE, http://www.big-data-europe.eu) will undertake the foundational work for enabling European companies to build innovative multilingual products and services based on semantically interoperable, large-scale, multi-lingual data assets and knowledge, available under a variety of licenses and business models

On 14-15 December 2015 the whole BDE team is meeting in Vienna for a project plenary and thereby around 35 experts in the topic will be participating in the Big Data Europe MeetUp on 15 December 2015 at the Impact Hub Vienna to discuss challenges and requirements and proven solutions for big data management together with the audience.

Agenda
16:00 – 16:10, Welcome & the BDE MeetUp, Vienna – Martin Kaltenböck (SWC)
16:10 – 16:30, The Big Data Europe Project
Sören Auer (Fraunhofer IAIS, BDE Project Lead)
16:30 – 16:45, Big Data Management Models (e.g. RACE)
Mario Meir-Huber (Big Data Lead CEE, Teradata, Vienna – Austria)
16:45 – 17:00, Selected Big Data Projects in Budapest & above,

Zoltan C Toth (Senior Big Data Engineer RapidMiner Inc., Budapest – Hungary)
17:00 – 17:30 Open Discussion with the Panel on Big Data Requirements, Challenges and Solutions.
17:30 – 19:00 Networking & Drinks
Remark: 19:00/30 end of event…

Register here or here.

I am happy to announce that there is a partnership between the Data Natives conference and Cloudvane. Once again, one lucky person can get a free ticket to this conference. The conference takes place from 19th to 20th November in Berlin.

What’s necessary for you to get the ticket:

  • Share the blog post (Twitter, LinkedIn, Facebook) and send the proof of that to me via mail
  • Write a review (ideally with some pictures)

Data Natives focuses on three key areas of innovation: Big Data, IoT and FinTech. The intersection of these product categories is home to the most exciting technology innovation happening today. Whether it’s for individual consumers or multi-billion dollar industries, the opportunity is immense. Come and learn more from leading scientists, founders, analysts, investors and economists coming from Google, SAP, Rocket Internet,Gartner, Forrester among others. Two days full of interesting talks, sharing knowledge from 50+ speakers and engaging with the community of a data driven generation of more than 500 people.

More information on www.datanatives.io 

Thursday, November 19, 8:30AM to Friday, November 20 7:00PM  

NHow Hotel Berlin

Stralauer Allee 3

10245 Berlin

Germany

I am happy to announce the conference Big Data Week. I managed to get one free ticket, which I will give to a reader of my blog. What’s necessary for you to get the ticket:

  • Share the blog post (Twitter, LinkedIn, Facebook) and send the proof of that to me via mail
  • Write a review (ideally with some pictures)

About the conference:

You are invited to attend Big Data Conference which is going to take place in London, on November 25.

This year conference’s theme is Big Data in Use: presenting innovative use cases coming from retail, advertising, publishing, IoT and gaming domains.  Companies that implemented such projects will showcase their impact in the business, the benefits and the challenges, both technical and business wise.

Get your ticket now and learn from industry experts, put your existing knowledge to work and forge lasting relationships within one of the most exciting big data communities!

Why should you attend?

Confirmed speakers and themes for the 2015 lineup include:

  • New business models:  Exterion, Honest Caffe, Copenhagen City Exchange
  • Big Data in Retail: Shop Direct, Dunnhumby, EBI Solutions
  • Grow your business with machine learning: Yandex Data Factory
  • How to value data: Dunnhumby, The Economist, Skimlinks, Exterion
  • Data Models and Architectures: Excelian, ShopDirect, Skimlinks
  • 3 Panels: Big Data in Retail, How to become a data driven company, Data Scientists & the Business
  • 1 Workshop: How to become a data scientist? (Technical Track)

 

Your VIP ticket extra-benefits include:

  • 4 Trainings – Big Data in Retail and Real time processing of data – sessions on  23, 24, 26, 27 November 23, 24, 26, 27
  • 70% discount on a second conference ticket – One Day Pass
  • VIP Lounge and after conference networking party access

*** A little special something for our community: the organizers are offering you an exclusive 20% off! Just use this code: CloudVane_20_Off***

Super Early Bird Tickets on sale until October 16th!

Want to find out more? Check out the Conference Website.

I am happy to announce the development we did over the last month within Teradata. We developed a light-weight process model for Big Data Analytic projects, which is called “RACE”. The model is agile and resembles the know-how of more than 25 consultants that worked in over 50 Big Data Analytic projects in the recent month. Teradata also developed CRISP-DM, the industry leading process for data mining. Now we invented a new process for agile projects that addresses the new challenges of Big Data Analytics.
Where does the ROI comes from?
This was one of the key questions we addressed when developing RACE. The economics of Big Data Discovery Analytics are different to traditional Integrated Data Warehousing economics. ROI comes from discovering insights in highly iterative projects run over very short time periods (4 to 8 weeks usually) Each meaningful insight or successful use case that can be actioned generates ROI. The total ROI is a sum of all the successful use cases. Competitive Advantage is therefore driven by the capability to produce both a high volume of insights as well as creative insights that generate a high ROI.
What is the purpose of RACE?
RACE is built to deliver a high volume of use cases, focusing on speed and efficiency of production. It fuses data science, business knowledge & creativity to produce high ROI insights
How does the process look like?

RACE - an agile process for Big Data Analytic Projects

RACE – an agile process for Big Data Analytic Projects


The process itself is divided into several short phases:

  • Roadmap.That’s an optional first step (but heavily recommended) to built a roadmap on where the customer wants to go in terms of Big Data.
  • Align. Use-cases are detailed and data is confirmed.
  • Create. Data is loaded, prepared and analyzed. Models are developed
  • Evaluate. Recommendations for the business are given

In the next couple of weeks we will publish much more on RACE, so stay tuned!

What is necessary to achieve interoperability in the Cloud?

As described in the previous sections, 3 major interoperability approaches arise. First, there is the standardisation approach, next there is the middleware approach and last but not least there is the API approach. This is also supported by [Hof09] and [Gov10]. In addition to that, [Gov10] suggests building abstraction layers in order to achieve interoperability and transportability.
There are two main aspects where interoperability is necessary. One level is the management level. This deals with handling the virtual machine(s), applying load balancing, setting DNS settings, auto scaling features and other tasks that come with IaaS solutions. However, this level is mainly necessary in IaaS solutions as PaaS solutions already take care of most of it. The other level is the services level. The services level is basically everything that comes with application services such as messaging, data storage and databases.
interop approaches
Figure: Cloud interoperability approaches
These requirements are described in several relevant papers such as [End10], [Mel09], [Jha09].
Parameswaran et al. describes similar challenges for Cloud interoperability. The authors see two different approaches, the first via a unified cloud interface (UCI) and the second via Enterprise Cloud Orchestration [Par09].
A unified cloud interface is basically an API that is written “around” other cloud APIs that is vendor specific. It requires some re-writing of and integration. This is similar to the approach selected by Apache jClouds and Apache libcloud.
The Enterprise Cloud Orchestration is a layer where different cloud providers register their services. The platform then provides these services for users like in a discovery. This is very similar to UDDI. The downside of this is that the orchestration layer still needs to integrate all different services (and built wrappers around them). However, it is transparent to the end-user.
Enterprise orchestration layer
Figure: Enterprise Orchestration Layer [Par09]
This post is part of a work done on Cloud interoperability. You can access the full work here and the list of references here.

As discussed in the previous sections, there are several standards and interoperability frameworks available. Most of them are infrastructure related. The standards and frameworks can generally be clustered into 3 groups.
The first group is the “Standards” group, which consists of OCCI an the DMTF standards. The second group is the “Middleware” group. This group contains mOSAIC, the PaaS Semantic Interoperability Framework and Frascati. The third group is the “Library” group. This group is a concrete implementation that provides a common API for several cloud platforms. The two projects in here are Apache jClouds and Apache libcloud.

Interoperability Solutions for the Cloud

Interoperability Solutions for the Cloud


Figure: Interoperability in the Cloud
OCCI provides great capabilities for infrastructure solutions, but there is nothing done for individual services. The same applies to the standards proposed by the distributed management task force.
As with the libraries and frameworks, a similar picture is drawn. Apache jClouds and Apache libcloud provide some interoperability features for infrastructure services. As for platform services, only the blob storage is available. When developers build their applications, they still run into interoperability challenges.
mOSAIC offers a large number of languages and services, however, it is necessary to build a layer on top of an existing platform. This is not a lightweight solution, as a developer has to maintain both – the developed application and the middleware solution installed on top of the provider. The developer eliminates the vendor lock-in, but runs into operational management of the mOSAIC platform. This eliminates a problem on the one side but might create a new one on the other side.
The same problem exists with the PaaS Semantic Interoperability Framework. A user has to install a software on top of the cloud platforms, which then has to be maintained by the user. The goal of platform as a service is to relieve the user from any operational management like in IaaS platforms. Frascati is also a middleware that needs to be maintained.
All of the libraries or frameworks described work for IaaS services. None of them support the PaaS paradigm and services related to it. Apache libcloud and Apache jClouds offer some very fundamental support for the storage service. However, other services such as messaging and key/value storage are not supported.
This post is part of a work done on Cloud interoperability. You can access the full work here and the list of references here.

PaaS Semantic Interoperability Framework (PSIF)

Loutas et al. defines semantic interoperability as “the ability of heterogeneous Cloud PaaS systems and their offerings to overcome the semantic incompatibilities and communicate” [Lou11]. The target of this framework is to give developers the ability to move their application(s) and data seamlessly from one provider to another. Loutas et al. propose a three-dimensional model addressing semantic interoperability for public cloud solutions [Lou11].
Fundamental PaaS Entities
The fundamental PaaS entities consist of several models: the PaaS System, the PaaS Offering, an IaaS-Offering, Software Components and an Application [Lou11].
Levels of Semantic Conflicts
Loutas et al. [Lou11] assumes that there are 3 major semantic conflicts that can be raised for PaaS offerings. The first one is an interoperability problem between the metadata definitions. This occurs when different data models describe one PaaS offering. The second problem is when the same data gets interpreted differently and the third is when different pieces of data have similar meaning. Therefore, [Lou11] uses a two-level approach for solving semantic conflicts. The first level is the Information Model that refers to differences with data and data structures/models. The other level is the data level that refers to differences in the data because of various representations.
Types of Semantics
Three different types of semantics are defined [Lou11]. The first type is the functional semantic. This is basically a representation of everything that a PaaS solution can offer. The second type, the non-functional semantic, is about elements such as pricing or Quality of Service. The third semantic is the execution semantic, which is describing runtime semantics.
This post is part of a work done on Cloud interoperability. You can access the full work here and the list of references here.