Agility is almost everywhere and it also starts to get more into other hyped domains – such as Data Science. One thing which I like in this respect is the combination with DevOps – as this eases up the process and creates end-to-end responsibility. However, I strongly believe that it doesn’t make much sense to exclude the business. In case of Analytics, I would argue that it is BizDevOps.

There is a huge demand for Data DevOps nowadays. Basically, Data Science needs a lot of business integration and works throughout different domains and functions. I outlined several times and in different posts here, that Data Science isn’t a job that is done by Data Scientists. It is more of a team work, and thus needs different people. With the concept of BizDevOps, this can be easily explained; let’s have a look at the following picture and I will afterwards outline the interdependencies on it.

The process for Data Science: BizDevOps is the answer

BizDevOps for Data Science

Basically, there must be exactly one person that takes the end-to-end responsibility – ranging from business alignments to translation into an algorithm and finally in making it productive by operating it. This is basically the typical workflow for BizDevOps. This one person taking the end-to-end responsibility is typically a project or program manager working in the data domain. The three steps were outlined in the above figure, let’s now have a look at each of them.

Data DevOps: Biz

The program manager for Data (or – you could also call this person the “Analytics Translator”) works closely with the business – either marketing, fraud, risk, shop floor, … – on getting their business requirements and needs. This person has a great understanding of what is feasible with their internal data as well in order to be capable of “translating a business problem to an algorithm”. In here, it is mainly about the Use-Case and not so much about tools and technologies. This happens in the next step. Until here, Data Scientists aren’t necessarily involved yet.

Data DevOps: Dev

In this phase, it is all about implementing the algorithm and working with the Data. The program manager mentioned above already aligned with the business and did a detailed description. Also, Data Scientists and Data Engineers are integrated now. Data Engineers start to prepare and fetch the data. Also, they work with Data Scientists in finding and retrieving the answer for the business question. There are several iterations and feedback loops back to the business, once more and more answers arrive. Anyway, this process should only take a few weeks – ideally 3-6 weeks. Once the results are satisfying, it goes over to the next phase – bringing it into operation.

Data DevOps: Ops

This phase is now about operating the algorithms that were developed. Basically, the data engineer is in charge of integrating this into the live systems. Basically, the business unit wants to see it as (continuously) calculated KPI or any other action that could result in some sort of impact. Also, continuous improvement of the models is happening there, since business might come up with new ideas on it. In this phase, the data scientist isn’t involved anymore. It is the data engineer or a dedicated devops engineer alongside the program manager.

Eventually, once the project is done (I dislike “done” because in my opinion a project is never done), this entire process moves into a CI process.

This post is part of the “Big Data for Business” tutorial. Our focus was on Data DevOps and BizDevOps. In this tutorial, I explain various aspects of handling data right within a company. I also recommend you to read about the concept of DevOps.

Everyone is doing Big Data these days. If you don’t work on Big Data projects within your company, you are simply not up to date and don’t know how things work. Big Data solves all of your problems, really!
Well, in reality this is different. It doesn’t solve all your problems. It actually creates more problems then you think! Most companies I saw recently working on Big Data projects failed. They started a Big Data project and successfully wasted thousands of dollars on Big Data projects. But what exactly went wrong?
First of all, Big Data is often only seen as Hadoop. We live with the mis-perception that only Hadoop can solve all Big Data topics. This simply isn’t true. Hadoop can do many things – but real data science is often not done with the core of Hadoop. Ever talked to someone doing the analytics (e.g someone good in math or statistics)?. They are not ok with writing Java Map/Reduce queries or Pig/Hive scripts. They want to work with other tools that are ways more interactive.
The other thing is that most Big Data initiatives are often handled wrong. Most initiatives often simply don’t include someone being good in analytics. One simply doesn’t find this type of person in an IT team – the person has to be found somewhere else. Failing to include someone with this skills often leads to finding “nothing” in the data – because IT staff is good in writing queries – but not in doing complex analytics. These skills are actually not thought in IT classes – it requires a totally different study field to reach this skill set.
Hadoop as the solution to everything for many IT departments. However, projects often stop with implementing Hadoop. Most Hadoop implementations never leave the pilot phase. This is often due to the fact that IT departments see Hadoop as a fun thing to play with – but getting this into production requires a different approach. There are actually more solutions out there that can be done when delivering a Big Data project.
A key to ruining your Big Data project is not involving the LoB. The IT department often doesn’t know what questions to ask. So how can they know the answer and try to find the question? The LoB sees that different. They see an answer – and know what question it would be.
The key to kill your Big Data initiative is exactly one thing: go with the hype. Implement Hadoop and don’t think about what you actually want to achieve with it. Forget the use-case, just go and play with the fancy technology. NOT
As long as companies will stich to that, I am sure I will have enough work to do. I “inherited” several failed projects and turned them into success. So, please continue.

As already mentioned in the last post, Application deployment in the Cloud is not an easy task. It requires a lot of work and knowledge, unless you use Platform as a Service. The later one might reduce this complexity significantly. In order to deploy your Applications to an Infrastructure as a Service Layer, you might need additional knowledge. Let us now built upon what we have heard from the last post by looking at several strategies and how they are implemented by different enterprises.
Some years ago before joining IDC, I worked for a local Software Company. We had some large-scale deployments in place and we used Cloud Services to run our Systems. In this company, my task was to work on the deployment and iteration strategy (but not only on that ;)). We developed in an agile iteration model (basically it was Scrum). This means that our Team had a new “Release” every 2 weeks. During these two weeks, new Features were developed and bugs fixed. The Testing department checked for Bugs and the development Team had to fix them. We enforced daily check-ins and gated them. A gated check-in means that the check-in is only accepted if the Source Code can be compiled on the Build Server(s). The benefit of this was to have a working Version every morning as some started early (e.g. 6am) and others rather late (e.g. at 10am). Before starting with that, our Teams often ran into the error of having to debug bad code that was checked in by someone the day before. Different teams worked on different Features and if the team decided a feature to be final, it was merged to the “stable” branch. Usually on Thursdays (we figured out that Fridays are not good for deployment for various reasons) we deployed the stable branch. The features initially get deployed to a staging system. The Operations Team tests the features. If a feature isn’t working, it has to be disabled and removed from the stable release. Once there was a final version with working features, we deployed them to the live system. This usually happened on Friday evening or Saturdays. Over the years, the Development and Operations Team got a real “Team” and less features had to be removed from the staging system due to errors/bugs. My task was to keep the process running since it was now “easy” to forget about the rules once everything seems to run good.
Want to hear more about Application Deployment? Subscribe to the Newsletter:


A very advanced environment is Windows Azure. Windows Azure is Microsoft’s Cloud Platform and was usually designed as a PaaS-Platform. However, they added several IaaS-Features so far. In this post, I will only focus on the PaaS-Capapilities of Windows Azure. There are several steps involved:

  1. Full Package with Assemblies gets uploaded
  2. If a new Package is added, a Staging Environment is started. There is now the possibility to Test on that Staging Environment
  3. Once the Administrator decides the Package to be „safe“, Staging is switched to Production
  4. New Roles with the new Binaries are started
  5. Old Roles that still have Sessions are up and running until all Sessions are closed
Deployment for Windows Azure PaaS

Deployment for Windows Azure PaaS

This is a very mature deployment process and solves a lot of problems that are normally associated with deployment for large-scale Cloud Systems.
Facebook described their process for deployment very clear:
  1. Facebook takes advantage of BitTorrent
  2. New Versions of the Software gets added to a Tracker
  3. Each Server listens to the Trackers
  4. Once a new Version is available, it is downloaded to the Server
  5. The Server then unloads the current Version and Loads the new one
  6. Old Versions always stay on the Server
  7. Easy Rollback
Any Questions/Issues on deployment? Post your comments below.

More and more companies start Cloud Computing Projects, with a lot of Startups also among them. But what do you have to take into account when you start a new Cloud Computing Project? Is it save to just start or should you consider some best practices in project management? And if so, what do you need to do regarding Cloud Computing? In this article we will give a brief overview on what is necessary to get started with Project Management for Cloud Computing.
Project Management itself consists of the following 5 Iterations:

1.Project Initialisation
2.Project Creation
3.Project Planing

Each Iteration contains some sub-processes which we will describe more detailed now.

Project Management in the cloud

Project Management in the cloud

The first iteration, Project Initialisation, basically starts when someone realizes that there is a need for a project or someone has a business idea. Normally, you start building some KO-Criterias, like feasability or already existing products/platforms. If your project “passes” these KO-criterias, you might move on on evaluate Project Risks. This task often comes with a SWOT-Analysis. In a SWOT-Analysis you check the project for Strength, Weaknesses, Opportunities and Threads. Another important factor to analyse are the Stakeholders, since they can take affect on the project. Stakeholders are people or organizations that have a special interest in the project or product, such as the top-management, customers, owners or co-workers.
Once you are done with the basics, it is necessary to do a less corse-grained evaluation of the project. Initially, you would analyse what platform will be used. This is a very important part for Cloud Computing as we have different platforms that server different technology and offer various possibilities. A major thread is the risk of a vendor lock-in, what shouldn’t happen at all. Therefore, it is necessary to look at the platform and figure out how easy it is to move to other platforms or services. Once you decide to use a specific platform, it is necessary to evaluate what knowledge about this platform is available within the team/company. Other important factors are the platform costs and the interoperability with external services.

In the next phase, Project Creation you start with planing the project and setting some variables for it. You would also create the initial project plan and select the project organisation. The project organisation heavily depends on team size and knowledge of team members. If you are in a very agile environment, you might not want to have a very strict project organisation, whereas other environments require a rather “fixed” organisation form.

With Project Planing, you start detailing your project with costs and detailed delivery dates. In Project Planing, you would also select the Iteration model for the project, like in “Project Creation”, you need to select the iteration model based on your environment. If it is heavily agile, you might not use the “V-Model” but rather xP or Scrum-like techniques.

The phase Execution requires detailled monitoring about what is going on in the project. Some key indicators are the project milestones and the budget. If these indicators are out-of-bound, adjustment might be necessary.

The last phase, Introduction is an often underestimated phase, since it requires additional knowledge of go-to-market strategies and very good marketing. Often, there is no more budget available after the project has finished. Now imagine you got a great product but no money to tell it to possible customers? A clear go-to-market strategy is necessary in order to complete a project successful.