News that are going on in the world of Cloud Computing and Big Data

//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js //

Apache Software Foundation announced that Apache Storm is now a top level Hadoop project. But what is Apache Storm about? Well, basically Apache Storm is a project to analyse data streams that are near real time. Storm works with messages and analyses what is going on. Storm originates from Twitter, which is using it for their streaming API. Storm is about processing time-critical data and Storm guarantees that your data gets processed. It is basically fault tolerant and scalable. Apache Storm is useful for fraud protection in gambling, banking and financial services, but not only there. Storm can be used wherever real-time or time-critical applications are necessary. At the moment, Storm allows to process 1 million tupels per second and node. This is massive, given the fact that Storm is all about scaling out. Imagine adding 100 nodes! Apache Storm works with Tupels that come from spouts. A spout is a messaging system such as Apache Kafka. Storm supports much more Messaging systems and it can easily be extended by it’s abstraction layer. Storm consists of some major concepts illustrated in the following image: Apache Storm Nimbus is the Master Node, similar to Hadoop‘s Job Tracker. ZooKeeper is used for Cluster coordination and the Supervisor runs the worker process. Each worker process consists of some subsets: an executor that is a thread spanned by the worker and a task itself.

Enjoy this article?

Make sure to subscribe to Cloudvane to receive regular updates here

Major concepts in Apache Storm are 4 elements: streams, spouts, bolts and topologies.

Tuples in Apache Storm

Tuples in Apache Storm

Streams are an unbound sequence of Tuples, a Spout is a source of streams, Bolts process input streams and create new output streams and a topology is a network of Bolts and Spouts.   The header image is provided as Creative Commons license by MattysFlicks.

Big Data is considered to be the job you simply have to go for. Some call it sexy, some call it the best job in the future. But what exactly is a Data Scientist? Is it someone you can simply hire from university or is it more complicated? Definitely the last one applies for that.

When we think about a Data Scientist, we often say that the perfect Data Scientist is kind of a hybrid between a Statistician and Computer Scientist. I think this needs to be redefined, since much more knowledge is necessary. A Data Scientist should also be good in analysing business cases and talk to line executives to understand the problem and model an ideal solution. Furthermore, extensive knowledge on current (international) law is necessary. In a recent study we did, we defined 5 major challenges:

perfect-data-scientist

Each of the 5 topics are about:

  • Big Data Business Developer: The person needs to know what questions to ask, how to cooperate with line of business (LOB) decision makers and must have good social skills to cooperate with all of them.
  • Big Data Technologist: In case your company isn’t using the cloud for Big Data Analytics, you also need to be into infrastructure. The person must know a lot about system infrastructure, distributed systems, datacenter design and operating systems. Furthermore, it is also important to know how to run your software. Hadoop doesn’t install itself and there is some maintenance necessary.
  • Big Data Analyst: This is the fun part; here it is all about writing your queries, running Hadoop jobs, doing fancy MapReduce queries and so on! However, the person should know what to analyse and how to implement such algorithms. It is also about machine learning and more advanced topics.
  • Big Data Developer: Here it is more about writing extensions, add-ons and other stuff. It is also about distributed programming, which isn’t the easiest part itself.
  • Big Data Artist: Got the hardware/datacenter right? Know what to analyse? Wrote the algorithms? What about presenting them to your management? Exactly! This is also necessary! You simply shouldn’t forget about that. The best data is worth noting if nobody is interested in it because of poor presentation. It is also necessary to know how to present your data.

As you can see, it is very hard to become a data scientist. Things are not as easy as it might seems. The Data Scientist should be a nerd in each of these fields, so the person should be some kind of a “super nerd”. This might be the super hero of the future.

Most likely, you won’t find one person that is good in all of these fields. Therefore, it is necessary to build an effective team.

Header Image Copyright: Chase Elliott Clark

Big Data is definitely a very complex “thing”. Why do I call it “a thing” here? Because it is simply not a technology itself! Hadoop is a technology, Lucene is a technology but Big Data is more of a concept, since it is nothing you can touch. Ever tried installing Big Data on your machine? Or said “I need this Big Data Software”? When you talk about a software or technology, you talk about a very concrete Product or Open Source Tool.

The concept of Big Data is rather complicated when it comes to implementing it. There are several major dimensions you have to be aware of.

Big Data Dimensions

Big Data Dimensions

The dimensions are:

  • Legal dimension: What is necessary in terms of data protection legislation? What do you need to know about legal impacts, what kind of data are you allowed to store or collect/process?
  • Social dimension: What social impacts will you generate with your application? How will your users react to that?
  • Business dimension: What is the business model you want to generate with your Big Data platform? How can your Big Data platform support your business? What kind of pricing do you want to calculate?
  • Technology dimension: How can you achieve your targets? What technology would you use to get there? What scale able software can you use?
  • Application dimension: What industry solutions are available for your needs? How can you enable decision support based on data for your company?

If you want to target all of these questions, you need to have a team that is capable of fulfilling this request. In the next posts I will talk about the Big Data technology stack and what it needs to be a data scientist.

Header Image copyright:  Michael Coghlan. Distributed under the Creative Commons license 2.0 by Creative Commons Australia Pool.

CloudVane.com is a popular platform where articles about cloud computing and big data are published. It is listed as the Top 100 Blogs by the Cloud Computing Journal and I often get free event passes for global events.

If you know a lot about the cloud and you want to reach a broader audience, feel free to contact me and start blogging about Cloud Computing today at CloudVane.com.

As CloudVane is a non-commercial site, I can’t pay anything for your efforts – however, you will reach a large audience and increase your visibility within the Cloud area. Whenever I get event invites, I will do my best to distribute that to authors writing at CloudVane.

Interested? Send me an E-Mail: mario.mh@cloudvane.com. You should indicate your experience with Cloud Computing. Good english skills are mandatory.

Looking forward to hearing from you,

Mario

I am happy to announce that CloudVane.com is now a Media Partner for the IE Big Data Innovation Summit! The summit takes place in London (30th April and 1st May) and in San Francisco (11th and 12th April).

Big Data Innovation Summit

Driving Business Success Through Big Data Science

  • 80+ industry expert keynote presentations
  • 1000+ Big Data &  Leaders attending
  • Interactive workshops with industry leaders
  • Over 25 hours of networking opportunities included
  • Access to online presentations on-demand post-summit
  • 50+ case studies presented from Fortune 500 companies

Speakers are from companies such as:

  • Nasa
  • Twitter
  • Facebook
  • eBay

You can get more informations about the event on the event website:

Plus, there is the chance of winning a ticket for each of the events. I will post more about this next week. So stay tuned and make sure to subscribe to blog updates 🙂
[widgets_on_pages id=”sb”]

The cloud adoption in CEE is approaching a turning point. Innovative companies and CIOs are embracing the cloud in an increasing number, challenging the “wait and see” approach of their competitors.

After several years of investments in the development of the cloud by service providers, the cloud has moved beyond the hype in CEE and has become a viable option for enhancing business agility, efficiency and driving innovation in enterprise IT. Perception around the advantages and risks of the cloud are also shifting. The business value is rapidly gaining credit through the best practices provided by the increasing number of CEE implementations while traditional concerns around security and control are becoming less prevalent.

No surprise, that the outlook of the cloud market remains outstandingly positive, representing the most dynamic segment of ICT market in CEE with an average growth rate of more than 50% for the next years. However, the growth will vary among cloud service types, depending on the resonation of the service with the specifics of the CEE market. IDC expects the most dynamic uptake in virtual private cloud, as it best addresses the need of customers for control and security, while delivering the advantage of the lower costs of the public cloud infrastructure.

The IDC Cloud Leadership Forum will provide you with insight in the ultimate cloud trends and the most recent experience of enterprise cloud users through a host of industry expert, vendor and case study presentations, roundtable discussions and coffee break talks.

Key Topics

  • Beyond the Hype: The Big Picture of Cloud Computing
  • The cloud as the foundation of the next generation IT
  • Measuring ROI: Cloud Computing, Efficiency, and Sustainability
  • What would suite you best? – The Private, Public, and Hybrid Cloud Models
  • Cloud Compliance Challenges and Solutions
  • Cloud Security Concerns: Vendor Solutions and Customer Experience
  • Private cloud: the life after virtualization
  • Infrastructure Trends in the Cloud
  • Cloud-Based Applications
  • Is cloud diminishing or transforming the role of the CIO?
  • How to get the green-light for cloud form teh business decision makers?

 Who will attend

  • CIOs IT Directors
  • IT Managers and Heads of Departments
  • Facility & Operation Managers
  • IT Infrastructure Managers
  • Network Administrators
  • Purchase Managers

Event Dates and Locations:

  • BOSNIA & HERZEGOVINA , Sarajevo – February 21
  • MALTA, Malta – March 07
  • ALMATY, Kazakhstan – March 12
  • TIRANA, Albania – March 21
  • VIENNA, Austria – April 09
  • SALZBURG, Austria – April 11
  • SKOPJE, Macedonia– April 18
  • SOFIA, Bulgaria – May 16
  • NICOSIA, Cyprus – May 20
  • MOSCOW, Russia – May 30
  • LJUBLJANA, Slovenia – June 05
  • PRAGUE, Czech – September 12
  • BRATISLAVA, Slovakia – September 17
  • ATHENS, Greece – September 20
  • KIEV, Ukraine – September 25
  • BUCHAREST, Romania – September 26
  • BELGRADE, Serbia – October 10
  • WARSAW, Poland – October 15
  • ZAGREB, Croatia – October 16
  • BUDAPEST, Hungary – October 17

Hope to see you there!

I am often discussing the different platforms with people and we often end up discussing the different prices. People keep on asking me how much money you have to invest for a specific number of instances. It is necessary to look up different websites and so on. I came up with the idea of creating a simple cost comparison calculator with javascript on my blog. And I can now say: here it is!

Click here to go to the Calculator.

The cost comparision calculator gives you 3 different options:

  • Calculate the best price for cpu intense applications
  • Calculate the best price for memory intense applications
  • Calculate the best cheapest instances available.

The cost comparison calculator currently compares the following platforms:

  • Amazon EC2
  • Google Compute Engine
  • Rackspace Cloud Servers
  • Windows Azure Virtual Machines

A challenge is, what to compare. Each vendor has different instance types, so a comparison is often not so easy. Therefore, I have decided to split the comparison into different challenges as described above. For the first calculation – with cpu intense applications – i’ve used the lowest available instance. Basically, all 4 instance types matched despite Google. Google offers less memory than the other instances. The cheapest instances targets targets micro instances. All vendors despite Google offers micro instances. The memory comparison targets high memory instances.  The calculation is based on available memory. The memory is aligned to the number of instances necessary. For instance, if we select 68 GB of RAM, we need one instance at Amazon EC2 but multiple instances with Windows Azure. The prices are not aligned to instances but to the sum of memory. This leads to cheaper prices when using EC2 for lower memory. I will try to adjust this in the future.

To get updates on the calculator, you can subscribe to the newsletter below.

 

YTo4OntzOjk6IndpZGdldF9pZCI7czoyMDoid3lzaWphLW5sLTEzNjA4NDQzMzQiO3M6NToibGlzdHMiO2E6Mjp7aTowO3M6MToiNSI7aToxO3M6MToiMyI7fXM6MTA6Imxpc3RzX25hbWUiO2E6Mzp7aTo1O3M6Mjc6IkNvc3QgQ29tcGFyaXNpb24gQ2FsY3VsYXRvciI7aTo0O3M6MjM6ImJpZyBkYXRhIGNvbmZlcmVuY2UgdGl4IjtpOjM7czoyMjoiTmV3c2xldHRlciBTdWJzY3JpYmVycyI7fXM6MTI6ImF1dG9yZWdpc3RlciI7czoxNzoibm90X2F1dG9fcmVnaXN0ZXIiO3M6MTI6ImxhYmVsc3dpdGhpbiI7czoxMzoibGFiZWxzX3dpdGhpbiI7czo2OiJzdWJtaXQiO3M6MTA6IlN1YnNjcmliZSEiO3M6Nzoic3VjY2VzcyI7czo1MDoiQ2hlY2sgeW91ciBpbmJveCBub3cgdG8gY29uZmlybSB5b3VyIHN1YnNjcmlwdGlvbi4iO3M6MTI6ImN1c3RvbWZpZWxkcyI7YToxOntzOjU6ImVtYWlsIjthOjE6e3M6NToibGFiZWwiO3M6NToiRW1haWwiO319fQ==

 

Click here to go to the Calculator.

 

Jeremy Geelan from Cloud Computing Journal / Cloud Computing Expo listed Mario Meir-Huber from Cloudvane as one of the Top 100 Bloggers on Cloud Computing! Thanks a lot! This is great news to the growing platform Cloudvane!

Link: Top 100 Blogs on Cloud Computing.

CloudVane thanks for 1,000 Facebook likes! CloudVane has seen tremendous growth over the last 3 month since we started, doubling the user base every month! Stay tuned to receive more News and Articles about Cloud Computing!

1000 Facebook Likes Celebration Baloons

1000 Facebook Likes Celebration Baloons

To receive regular updates, please also subscribe to our Newsletter:

 

YTo4OntzOjk6IndpZGdldF9pZCI7czoyMDoid3lzaWphLW5sLTEzNTQ0NDM1NjQiO3M6NToibGlzdHMiO2E6MTp7aTowO3M6MToiMyI7fXM6MTA6Imxpc3RzX25hbWUiO2E6MTp7aTozO3M6MjI6Ik5ld3NsZXR0ZXIgU3Vic2NyaWJlcnMiO31zOjEyOiJhdXRvcmVnaXN0ZXIiO3M6MTc6Im5vdF9hdXRvX3JlZ2lzdGVyIjtzOjEyOiJsYWJlbHN3aXRoaW4iO3M6MTM6ImxhYmVsc193aXRoaW4iO3M6Njoic3VibWl0IjtzOjEwOiJTdWJzY3JpYmUhIjtzOjc6InN1Y2Nlc3MiO3M6OTQ6IkNoZWNrIHlvdXIgaW5ib3ggbm93IHRvIGNvbmZpcm0geW91ciBzdWJzY3JpcHRpb24uIFBsZWFzZSBhbHNvIGNoZWNrIHlvdXIgU3BhbSBGb2xkZXIuIFRoYW5rcyEiO3M6MTI6ImN1c3RvbWZpZWxkcyI7YToxOntzOjU6ImVtYWlsIjthOjE6e3M6NToibGFiZWwiO3M6NToiRW1haWwiO319fQ==

 

I am happy to announce that my new E-Book is in stores now! The book is 85 pages in lenght and the target for the book is to provide an overview of Amazon Web Services for .NET Developers. The E-Book by “developer.press” is called “Shortcut” and aims at delivering this topic in one or two evenings to read right after work.

Amazon Web Services for .NET Developers

Amazon Web Services for .NET Developers

The book starts with a description of the service categories offered by Amazon with a brief description of available services. Due to the fact that Services are released often and the book authoring timeframe is about 3-6 month, services such as Amazon Glacier are not yet included. The other 6 chapters focus entirely on building an Application with AWS. Amazon Elastic Beanstalk is used with Asp.NET MVC. Next, a focus on S3, SQS, DynamoDB and Amazon EC2 is given.

Subscribe to this channel to get updates about the book.

You can download the Source Code here.

The book is currently available only in German, an English Version is planned.

The E-Book is available in the iTunes Store and in the Kindle Store.