Posts

Agility is everywhere in the enterprise nowadays. Most companies want to become more agile and also on C-Level, there are huge expectations on agility. However, I’ve seen much of the analytics (and Big Data) projects being the complete opposite: neither agile nor successful. Often, the reasons for this were different: the setup of the datalake with expensive hardware setup took years, not month and the operations with it turned out to be very inefficient to maintain these systems. What can be done for agile data science projects?

The demand for agile data science projects

Also, a lot of companies expressed their demand for agile analytics. But in fact, with analytics (and big data), we moved away from agility and to a complex waterfall-like approach. But what was worse, is the approach of doing agile analytics and then don’t stick to it (and rather do it somewhere in between).

However, a lot of companies also realised that agility can only be solved with (Biz)DevOps and the Cloud. Really, there is hardly any way around this. And a close coop between data engineering and data science. One important question for agile data science projects is the methodology. Is it Kanban or Scrum?

I would say, that this question is a “luxury” problem. If a company has to answer this, it is already at a very high maturity state on data. My ideas on this topic (which, again, is a “it depends” thing) are:

When to select Kanban or Scrum for Data projects

  • Complexity: if the data project is more complex, Scrum might be the better choice. A lot of data science projects are one-person projects (with support of data engineers and devops at some stages) and rather run for some weeks and also not always full-time. In this case (lower complexity), Kanban is the most suiteable approach. Often, the data scientist even works on different projects as the load per project isn’t much at all. Other projects with higher complexity, I would recommend Scrum
  • Integration/Productization: If the integration effort is high (e.g. into existing processes, systems and alike), I would rather recommend to go with Scrum. More people are involved and the complexity is immediately higher. If the focus is on Data Engineering or at least this part is very high, it is often delivered with Scrum.

I guess there could be much more indicators, so I am looking forward to your comments on it 🙂

This post is part of the “Big Data for Business” tutorial. In this tutorial, I explain various aspects of handling data right within a company. You might also read this discussion about Scrum for Data Scientists.

I am happy to announce the development we did over the last month within Teradata. We developed a light-weight process model for Big Data Analytic projects, which is called “RACE”. The model is agile and resembles the know-how of more than 25 consultants that worked in over 50 Big Data Analytic projects in the recent month. Teradata also developed CRISP-DM, the industry leading process for data mining. Now we invented a new process for agile projects that addresses the new challenges of Big Data Analytics.
Where does the ROI comes from?
This was one of the key questions we addressed when developing RACE. The economics of Big Data Discovery Analytics are different to traditional Integrated Data Warehousing economics. ROI comes from discovering insights in highly iterative projects run over very short time periods (4 to 8 weeks usually) Each meaningful insight or successful use case that can be actioned generates ROI. The total ROI is a sum of all the successful use cases. Competitive Advantage is therefore driven by the capability to produce both a high volume of insights as well as creative insights that generate a high ROI.
What is the purpose of RACE?
RACE is built to deliver a high volume of use cases, focusing on speed and efficiency of production. It fuses data science, business knowledge & creativity to produce high ROI insights
How does the process look like?

RACE - an agile process for Big Data Analytic Projects

RACE – an agile process for Big Data Analytic Projects


The process itself is divided into several short phases:

  • Roadmap.That’s an optional first step (but heavily recommended) to built a roadmap on where the customer wants to go in terms of Big Data.
  • Align. Use-cases are detailed and data is confirmed.
  • Create. Data is loaded, prepared and analyzed. Models are developed
  • Evaluate. Recommendations for the business are given

In the next couple of weeks we will publish much more on RACE, so stay tuned!