Posts

The three data sources

To get the most out of your data strategy in an enterprise, it is necessary to cluster the different user types that might arise in an enterprise. All of them are users of data but with different needs and demands on it. In my opinion, they range from different expertise levels. Basically, I see three different user types for data access within a company

Data access on 3 different levels

Three degrees of Data Access

Basically, the different user types differentiate from their level of how they use data and from the number of users. Let’s first start with the lower part of the pyramid – Business Users

Business Users

The first layer are the business users. This are basically users that need data for their daily decisions, but are rather consumers of the data. These people look at different reports to make decisions on their business topics. They could either be Marketing, Sales or Technology – depending on the company itself. Basically, these users would use pre-defined reports, but in the long run would rather go for customized reports. One great thing for that is self-service BI. Basically, theses users are experienced in interpreting data for their business goals and asking questions on their data. This could be about re-viewing the performance of a campaign, weekly or monthly sales reports, … They create huge load on the underlying systems without understanding the implementation and complexity underneath it – and they don’t have to. From time to time, they start digging deeper into their data and thus become power users – our next level

Power Users

Power Users often emerge from Business Users. This is typically a person that is close with the business and understands the needs and processes around it. However, they also have a great technical understanding (or gained this understanding during the process of becoming power users). They have some level of SQL know-how or know the basics of other scripting tools. They often work with the business users (even in the same department) on solving business questions. Also, they work close with Data Engineers on accessing data sources and integrating new data sources. Also, they go for self-service analytics tools to have a basic level of data science done. However, they aren’t data scientists but might get into this direction if they invest significant time into it. This now brings us to the next level – the data scientists

Data access for Data Scientists

This is the top level of our pyramid. People working as data scientists aren’t in the majority – business users and power users are much more. However, they work on more challenging topics then the previous two. Also, they work close with power users and business users. They might still be in the same department, but not necessarily. Also, they work with advanced tools such as R and Python and fine-tune the models the power users built with self-service analytics tools or translate the business questions raised from the business users into algorithms.

Often, those 3 develop in different directions – however, it is necessary that all of them work together – as a team – in order to make projects with data a success. With Data access, it is necessary to also incorporate role based access controls.

This post is part of the “Big Data for Business” tutorial. In this tutorial, I explain various aspects of handling data right within a company.

Agility is everywhere in the enterprise nowadays. Most companies want to become more agile and also on C-Level, there are huge expectations on agility. However, I’ve seen much of the analytics (and Big Data) projects being the complete opposite: neither agile nor successful. Often, the reasons for this were different: the setup of the datalake with expensive hardware setup took years, not month and the operations with it turned out to be very inefficient to maintain these systems. What can be done for agile data science projects?

The demand for agile data science projects

Also, a lot of companies expressed their demand for agile analytics. But in fact, with analytics (and big data), we moved away from agility and to a complex waterfall-like approach. But what was worse, is the approach of doing agile analytics and then don’t stick to it (and rather do it somewhere in between).

However, a lot of companies also realised that agility can only be solved with (Biz)DevOps and the Cloud. Really, there is hardly any way around this. And a close coop between data engineering and data science. One important question for agile data science projects is the methodology. Is it Kanban or Scrum?

I would say, that this question is a “luxury” problem. If a company has to answer this, it is already at a very high maturity state on data. My ideas on this topic (which, again, is a “it depends” thing) are:

When to select Kanban or Scrum for Data projects

  • Complexity: if the data project is more complex, Scrum might be the better choice. A lot of data science projects are one-person projects (with support of data engineers and devops at some stages) and rather run for some weeks and also not always full-time. In this case (lower complexity), Kanban is the most suiteable approach. Often, the data scientist even works on different projects as the load per project isn’t much at all. Other projects with higher complexity, I would recommend Scrum
  • Integration/Productization: If the integration effort is high (e.g. into existing processes, systems and alike), I would rather recommend to go with Scrum. More people are involved and the complexity is immediately higher. If the focus is on Data Engineering or at least this part is very high, it is often delivered with Scrum.

I guess there could be much more indicators, so I am looking forward to your comments on it 🙂

This post is part of the “Big Data for Business” tutorial. In this tutorial, I explain various aspects of handling data right within a company. You might also read this discussion about Scrum for Data Scientists.

During my professional carrier, I was managing several IT projects, mainly in the distributed systems environment. Initially, these projects were cloud projects, that were rather easy. I worked with IT departments in different domains/industries and we all had the same level of “vocabulary”. When talking with IT staff, it is clear that all use the same terms to describe “things”. No special explanation is needed.
I soon realized that Big Data projects are VERY different to that. I wrote several posts on Big Data challenges in the last month and the requirements for data scientists and alike. What I am always coming across when managing Big Data projects is the different approach one have to select when (successfully) managing these kind of projects.
Let me first start by explaining what I am doing. First of all, I don’t code, implement or create any kind of infrastructure. I work with senior (IT) staff to talk about ideas which will eventually be transformed to Big Data projects (either direct or indirect). My task is to work with them on what Big Data can achieve for their organization and/or problem. I am not discussion how their Hadoop solution will look like, I am working on use-cases and challenges/opportunities for their problems, independent from a concrete technology. Plus, I am not focused on any specific industry or domain.
However, all strategic Big Data projects have a concrete schema. The most challenging part is to understand the problem. In the last month, I had some challenges in different industries; whenever I run these kind of projects, it is mainly about cooperating with the domain experts. They often have no idea about the possibilities of Big Data – and they don’t have to. I, in contrast, have no idea about the domain itself. This is challenging on the one side – but very helpful on the other side. The more experience one person gains within a specific domain, the more the person thinks and acts in the methodology for the specific domain. They often don’t see the solution because they work on a “I’ve made this experience and it has to be very similar”. The same applies to me as a Big Data expert. All workshops I ran were mainly about mixing the concrete domain with the possibilities of Big Data.
I had a number of interesting projects lately. One of the projects was in the geriatric care domain. We worked on how data can make the live of elderly better and what type of data is needed. It was very interesting to work with domain experts and see what challenges they actually face. An almost funny discussion arose around Open Data – we looked at several data sources provided by the state and I mentioned: “sorry, but we can’t use these data sources. They aren’t big and they are about locations of toilets within our capital city”. However, their opinion was different because the location of toilets is very important for them – and data doesn’t always needs to be big, it needs to be valuable. Another project was in the utilities domain, where it was about improving their supply chain by optimizing it with data. Another project for a company providing devices was about improving the reliability of their devices by analyzing large amounts of log data. When their devices have an outage, a service personal has to go to the city of the outage. This takes several days to a week. I worked on reducing this time and included a data scientist. We could reduce the time the device stands still to some hours only for the 3 mayor error codes by finding patterns weeks before the outage occurs. However, there is still much work to be done in that area. Another project was in the utilities sector and in the government sector.
All of these projects had a common iteration phase but were very different – each project had it’s own challenges, but the key success factor for me was how to deal with people – it was very important to work with different people from different domains with a different mindset – improving my knowledge and broadening my horizon as well. That’s challenging on the one hand but very exciting on the other hand.