Posts

When Kappa first appeared as an architecture style (introduced by Jay Kreps) I was really fond of this new approach. I carried out several projects that went with Kafka as the main “thing” and not having the trade-offs as Lambda. But the more complex projects got, the more I figured out that it isn’t the answer to everything and that we ended up with Lambda again … somehow.

Kappa vs. Lambda Architecture

First of all, what is the benefit of Kappa and the trade-off with Lambda? It all started with Jay Kreps in his blog post when he questioned the Lambda Architecture. Basically, with different layers in the Lambda Architecture (Speed Layer, Batch Layer and Presentation Layer) you need to use different tools and programming languages. This leads to code complexity and the risk that you end up having inconsistent versions of your processing capabilities. A change to the logic on the one layer requires changes on the other layer as well. Complexity is basically one thing we want to remove from our architecture at all times, so we should also do it with Data Processing.

The Kappa Architecture came with the promise to put everything into one system: Apache Kafka. The speed that data can be processed with it is tremendous and also the simplicity is great. You only need to change code once and not twice or three times as compared to Lambda. This leads to cheaper labour costs as well, as less people are necessary to maintain and produce code. Also, all our data is available at our fingertips, without major delays as with batch processing. This brings great benefits to business units as they don’t need to wait forever for processing.

So what is the problem about Kappa Architecture?

However, my initial statement was about something else – that I mistrust Kappa Architecture. I implemented this architecture style at several IoT projects, where we had to deal with sensor data. There was no question if Kappa is the right thing – as we were in a rather isolated Use-Case. But as soon as you have to look at a Big Data architecture for a large enterprise (and not only into isolated use-cases) you end up with one major issue around Kappa: Cost.

In use-cases where data don’t need to be available within minutes, Kappa seems to be an overkill. Especially in the cloud, Lambda brings major cost benefits with Object Storages in combination with automated processing capabilities such as Azure Databricks. In enterprise environments, cost does matter and an architecture should also be cost efficient. This also holds true when it comes to the half-live of data which I was recently writing about. Basically, data that looses its value fast should be stored on cheap storage systems at the very beginning already.

Cost of Kappa Architecture

An easy way to compare Kappa to Lambda is the comparison per Terabyte stored or processed. Basically, we will use a scenario to store 32 TB. With a Kappa Architecture running 24/7, this would mean that we have an estimated 16.000$ per month to spend (no discounts, no reserved instances – pay as you go pricing; E64 CPUs with 64 cores per node, 432 GB Ram and E80 SSDs attached with 32TB per disk). If we would use Lambda and only process once per day, this would mean that we need 32TB on a Blob Store – that costs 680$ per month. Now we would take the cluster above for processing with Spark and use it 1 hour per day: 544$. Summing up, this would equal to 1.224$ per month – a cost ratio of 1:13.

However, this is a very easy calculation and it can still be optimised on both sides. In the broader enterprise context, Kappa is only a specialisation of Lambda but won’t exist all alone at all time. Kappa vs. Lambda can only be selected by the use-case, and this is what I recommend you to do.

This post is part of the “Big Data for Business” tutorial. In this tutorial, I explain various aspects of handling data right within a company

2016 is around the corner and the question is, what the next year might bring. I’ve added my top 5 predictions that could become relevant for 2016:

  • The Cloud war will intensify. Amazon and Azure will lead the space, followed (with quite some distance) by IBM. Google and Oracle will stay far behind the leading 2+1 Cloud providers. Both Microsoft and Amazon will see significant growth, with Microsoft’s growth being higher, meaning that Microsoft will continue to catch up with Amazon
  • More PaaS Solutions will arrive. All major vendors will provide PaaS solutions on their platform for different use-cases (e.g. Internet of Things). These Solutions will become more industry-specific (e.g. a Solution specific for manufacturing workflows, …)
  • Vendors currently not using the cloud will see declines in their income, as more and more companies move to the cloud
  • Cloud Data Centers will become more often outsourced from the leading providers to local companies, in order to overcome local legislation
  • Big Data in the Cloud will grow significantly in 2016 as more companies will put workload to the Cloud for these kind of applications

What do you think? What are your predictions?

Amazon announced details about their Q2 earnings yesterday. Their cloud business grew with incredible 81%. This is massive, given the fact that Amazon is already the number #1 company in that area. This quarter, they earned 1.8 billion USD from cloud computing.
Summing up this number, their revenue would definitively reach some 7 billion this year. However, if this growth continues to increase so fast, I guess they could even get double-digit by the end of this year. Will Amazon reach 10 billion in 2015? If so, this would be incredible! Microsoft stated that their growth was somewhere well above the 100% mark, so I am interested in where Microsoft will stand by the end of the year.
But what does this tell us? Both Microsoft and Amazon are growing fast in this business and we can expect that we will see many more interesting services in the coming month and years in the Cloud. My opinion is that the market is already consolidated between Microsoft and Amazon. Other companies such as Google and Oracle are rather niche players in the Cloud market.

Amazon Web Services today announced their new Datacenter for Germany, Frankfurt. This is AWS region number 11 and the second in Europe. AWS will support a large number of services from that datacenter.
Here is the original press release:

SEATTLE—Oct, 23, 2014– (NASDAQ:AMZN) — Amazon Web Services, Inc. (AWS, Inc.), an Amazon.com company, today announced the launch of its new AWS EU (Frankfurt) region, which is the 11th technology infrastructure region globally for AWS and the second region in the European Union (EU), joining the AWS EU (Ireland) region. All customers can now leverage AWS to build their businesses and run applications on infrastructure located in Germany. As with every AWS region, customers can do this knowing that their content will stay within the region they choose. The newly launched AWS EU (Frankfurt) region comes as a result of the rapid growth AWS has been experiencing and is available now for any business, organization or software developer to sign up and get started at: http://aws.amazon.com.

All AWS infrastructure regions around the world are designed, built, and regularly audited to meet rigorous compliance standards including, ISO 27001, SOC 1 (Formerly SAS 70), PCI DSS Level 1, and many more, providing high levels of security for all AWS customers. AWS is fully compliant with all applicable EU Data Protection laws, and for customers that require it, AWS provides data processing agreements to help customers comply with EU data protection requirements. More information on how customers using AWS can meet EU data protection requirements and local certifications such as BSI IT Grundschutz, can be found on the AWS Data Protection webpage at: aws.amazon.com/de/data-protection. A full list of compliance certifications can be found on the AWS compliance webpage at: http://aws.amazon.com/compliance/.

The new AWS EU (Frankfurt) region consists of two separate Availability Zones at launch. Availability Zones refer to datacenters in separate, distinct locations within a single region that are engineered to be operationally independent of other Availability Zones, with independent power, cooling, and physical security, and are connected via a low latency network. AWS customers focused on high availability can architect their applications to run in multiple Availability Zones to achieve even higher fault-tolerance. For customers looking for inter-region redundancy, the new AWS EU (Frankfurt) region, in conjunction with the AWS EU (Ireland) region, gives them flexibility to architect across multiple AWS regions within the EU.

“Our European business continues to grow dramatically,” said Andy Jassy, Senior Vice President, Amazon Web Services. “By opening a second European region, and situating it in Germany, we’re enabling German customers to move more workloads to AWS, allowing European customers to architect across multiple EU regions, and better balancing our substantial European growth.”

Many German customers are already using AWS including Talanx, in the highly regulated insurance sector. Talanx is one of the top three largest insurers in Germany and one of the largest insurance companies in the world with over €28 billion in premium income in 2013. “For Talanx, like many companies that hold sensitive customer data, data privacy is paramount,” says Achim Heidebrecht, Head of Group IT, Talanx AG. “Using AWS we are already seeing a 75% reduction in calculation time, and €8 million in annual savings, when running our Solvency II simulations while still complying with our very strict data policies. With the launch of the AWS region on German soil, we will now move even more of our sensitive and mission critical workloads to AWS.”

Hubert Burda Media is one of the largest media companies in Europe with over 400 brands and revenues in excess of $3.6 billion. JP Schmetz, Chief Scientist of Hubert Burda Media said of the announcement, “Now that AWS is available in Germany it gives our subsidiaries the option to move certain assets to the cloud. We have long had policies preventing data to be hosted outside of German soil and this new German region gives us the option to use AWS more meaningfully.”

Academics in Germany were also quick to welcome the new region, “The arrival of an Amazon Web Services Region in Germany marks an important occasion for the German business and technology community,” said Prof. Dr Helmut Krcmar, Vice Dean of the Computer Science Faculty, and Chair of Information Systems at the Technical University of Munich. “We work with a number of DAX listed companies in Germany. Many have been holding off moving sensitive workloads to the cloud until they had computing and service facilities on German soil as this could help them comply with their internal processes. This new region from AWS answers this and we expect to see innovation amongst Germany, and Europe’s, companies flourish as a result.”

The Header Image was published by Martin aka Maha under the Creative Commons License.

The AWS Java SDK Version 1.8.10 comes with a critical bug, affecting uploads. A fix was provided by AWS and normally the SDK is updated automatically, so you don’t need to worry.
However, if automatic updates are disabled in your Eclipse Version, you might loose data when uploading via the SDK Version 1.8.10. Here is what AWS has to say about the bug:
//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js //

AWS Message

Users of AWS SDK for Java 1.8.10 are urged to immediately update to the latest version of the SDK, version 1.8.11.
If you’ve already upgraded to 1.8.11, you can safely ignore this message.
Version 1.8.10 has a potential for data loss when uploading data to Amazon S3 under certain conditions. Data loss can occur if an upload request using an InputStream with no user-specified content-length fails and is automatically retried by the SDK.
The latest version of the AWS SDK for Java can be downloaded here:
http://aws.amazon.com/sdk-for-java/
And is also available through Maven central:
http://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.8.11/


The bug itself is repaired, in case you didn’t update the AWS SDK and are on the SDK Version 1.8.10 you should update that. Normally, the AWS SDK updates itself automatically in Eclipse.

I will post some Developer content from now on, with a focus to some easy but helpful tasks when working with various Cloud Platforms. These tips will be named after the service (e.g. Amazon Web Services for AWS, …)
The first tip I want to show is how to retrieve the full queue URL when you already have the queue name:

sqs.getQueueUrl(new GetQueueUrlRequest().withQueueName("myqueue")).getQueueUrl();

The function “getQueueUrl()” already returns a String-representation and not a URI itself (this is what I would rather expect in that case)

Since there are many cloud providers out there and I often come across the problem to switch between different platforms (such as Google AppEngine, Amazon S3, …) I have decided to write a single client that will work with all different platforms – or at least as most as possible. I’ve created a project on Google Code here and I will start to write a first draft of interfaces. In the first step, I will include Amazon S3. I hope that more people will join this project and help me creating a great project 😉

I am happy to announce that my new E-Book is in stores now! The book is 85 pages in lenght and the target for the book is to provide an overview of Amazon Web Services for .NET Developers. The E-Book by “developer.press” is called “Shortcut” and aims at delivering this topic in one or two evenings to read right after work.

Amazon Web Services for .NET Developers

Amazon Web Services for .NET Developers


The book starts with a description of the service categories offered by Amazon with a brief description of available services. Due to the fact that Services are released often and the book authoring timeframe is about 3-6 month, services such as Amazon Glacier are not yet included. The other 6 chapters focus entirely on building an Application with AWS. Amazon Elastic Beanstalk is used with Asp.NET MVC. Next, a focus on S3, SQS, DynamoDB and Amazon EC2 is given.
Subscribe to this channel to get updates about the book.
You can download the Source Code here.
The book is currently available only in German, an English Version is planned.
The E-Book is available in the iTunes Store and in the Kindle Store.

Still can’t get enough information about Cloud Computing? Here is the weekend reading list for interesting Topics about Cloud Computing. Have a nice (and cloudy) weekend 🙂
//
http://cloudvane.azurewebsites.net/wp-content/uploads/2012/10/show_ads.js

  • Google search to include your private Gmail soon? – In this Blog by Computerworld the possibility of Google to include Gmail Data into search is discussed. There is some interesting Feedback by some authors. Read more here.
  • Only 16% of people know what the Cloud is. Do you agree? Is Cloud Computing not well known enough? Most believed it was about drugs, pillows, the weather, or toilet paper. Read more on Forbes about it.
  • 5 considerations when moving to the cloud. IBM explains what we have to consider when moving to the cloud. The 5 Tips are Monitoring, Security, Performance, Vendor-Lockin and Migration. Read more about it here.
  • Amazon Web Services boss Andy Jassy on competition, price wars, and getting big. Read the interview with AWS boss Andy Jassy on his thoughts about Cloud Computing and what is going on right now in this sector. The interview can be found here.
  • The Role of Open Source in Cloud Computing Innovation. Interesting article about the role of Open Source for Cloud Computing. The article can be read here.

[widgets_on_pages id=”sb”]
[widgets_on_pages id=3]

The most popular Posts: