Posts

Data Science is often this mystical thing – only a few understand it, finding people doing it is very hard. The skill gap is everywhere and companies are facing issues staffing their projects. However, most companies want to become “data driven” and thus would need to have the skills available. In this context, we talk about data science democratisation.

What is data science democratisation?

However, I still think that we are currently doing it somewhat wrong – we need to enable more people to do what data scientists are doing without the need for them to do complex algorithmic things. This is when “self service Analytics” comes into play – giving business users more power and enabling them in doing “data science” with easy tools.

In an ideal world, each business user would have some basic data capabilities and is fully capable of doing her own insights in some way – by driving all decisions with the data, not with gut feelings. There are already some tools out there that enable exactly that – self service Analytics. This would also mean that the processes in companies have to shift a lot – away from traditional processes and data ownership. The goal of self-service analytics is diverse:

  • Reducing the FTE input for Data Science. At the moment, we need data scientists to do the job. However, those people aren’t available at the market in large scale and are very hard to find. This leads to several issues in doing data science.
  • Reducing the TTM. If for every business question we would need the help of a data scientist, every question will become a project that takes weeks. Decisions often need to be done fast, otherwise they might not be relevant at all.

What needs to be done to achieve it?

In my previous paragraph, I was writing about the “ideal world”. Now you might question what is the business reality out there and what needs to be done in order to achieve this? Well, it is easer said then done. Basically, there are some organisational and technical measures that needs to be applied:

  • No Silos. People can only work with Data if they have the full view of all available data. There should be no “hidden” data and everyone in the company should be capable of checking data for integrity. Knowledge means power and if one unit possess all the data, they are very powerful. Therefore, data should be “free” within the company.
  • Self-service Data access. People and business units in the company should be capable of accessing data in an easy (and self-service) manner. It must be easy for them to find, search, retrieve and visualize data.
  • Data thinking and mindset. Everyone in the company – ranging from top managers to business users – need to have a data thinking and mindset. This means that they should use data for all of their daily decisions rather than “gut feelings”. They should challenge their decisions with provability and data.

Technical enablers for data democratisation

  • Governance, Metadata Management and Data Catalogs: I keep on repeating myself – but as long as these elementary things aren’t solved, the above one’s are impossible to reach. Most companies only do governance to an extend of legal and regulatory requirements, but they should do much more than that – enabling a self-service environment.
  • Data Abstraction / Virtualisation: This is one of the key things to enable easy data access at some level. To all data sources, an easy interface – ideally with SQL-like feeling – should be available. This gives business users an easy tool to access all data, not just parts of it.

You might now think that the data scientist will get jobless? I would argue that it is the contrary. Self-service analytics isn’t made to handle the complex things – it is made for quick insights and proving that a business hypothesis might work. Based on this, much more questions will arise and thus create more work for data scientists. Also, achieving self-service analytics will lead to a lot of work for data engineers that finally have to integrate that data.

I hope you enjoyed this post about data science democratisation. If you want to learn more about how to deal with Big Data and Data Science in Business, read this tutorial about Big Data Business.

The three data sources

To get the most out of your data strategy in an enterprise, it is necessary to cluster the different user types that might arise in an enterprise. All of them are users of data but with different needs and demands on it. In my opinion, they range from different expertise levels. Basically, I see three different user types for data access within a company

Data access on 3 different levels

Three degrees of Data Access

Basically, the different user types differentiate from their level of how they use data and from the number of users. Let’s first start with the lower part of the pyramid – Business Users

Business Users

The first layer are the business users. This are basically users that need data for their daily decisions, but are rather consumers of the data. These people look at different reports to make decisions on their business topics. They could either be Marketing, Sales or Technology – depending on the company itself. Basically, these users would use pre-defined reports, but in the long run would rather go for customized reports. One great thing for that is self-service BI. Basically, theses users are experienced in interpreting data for their business goals and asking questions on their data. This could be about re-viewing the performance of a campaign, weekly or monthly sales reports, … They create huge load on the underlying systems without understanding the implementation and complexity underneath it – and they don’t have to. From time to time, they start digging deeper into their data and thus become power users – our next level

Power Users

Power Users often emerge from Business Users. This is typically a person that is close with the business and understands the needs and processes around it. However, they also have a great technical understanding (or gained this understanding during the process of becoming power users). They have some level of SQL know-how or know the basics of other scripting tools. They often work with the business users (even in the same department) on solving business questions. Also, they work close with Data Engineers on accessing data sources and integrating new data sources. Also, they go for self-service analytics tools to have a basic level of data science done. However, they aren’t data scientists but might get into this direction if they invest significant time into it. This now brings us to the next level – the data scientists

Data access for Data Scientists

This is the top level of our pyramid. People working as data scientists aren’t in the majority – business users and power users are much more. However, they work on more challenging topics then the previous two. Also, they work close with power users and business users. They might still be in the same department, but not necessarily. Also, they work with advanced tools such as R and Python and fine-tune the models the power users built with self-service analytics tools or translate the business questions raised from the business users into algorithms.

Often, those 3 develop in different directions – however, it is necessary that all of them work together – as a team – in order to make projects with data a success. With Data access, it is necessary to also incorporate role based access controls.

This post is part of the “Big Data for Business” tutorial. In this tutorial, I explain various aspects of handling data right within a company.