Posts

cloud computing header

Honestly, a data scientist is doing a great job. Literally, they are saving all industries from a strong decline. And those heroes, they are doing all of that alone. Alone? Not fully.

The Data Scientist need the Data Engineer

There are some poor guys that support their success: those, that are called Data Engineers. A huge majority of tasks has been carried out by these guys (and girls) that hardly anyone is talking about. All the fame seems to be going to the data scientists but the data engineers aren‘t receiving any credits.

I remember one of the many meetings with C-Level executives I had. When I explained the structure of a team dealing with data, everyone in the board room agreed on „we need data scientists“. Then, one of the executives raised the question: „but what are these data engineers about? Do we really need them or could we maybe have more data scientists instead of them“.

I kept on explaining and they accepted it. But I had the feeling that they still wanted to go with more Data Scientists than Engineers eventually. This basically comes out of the trend and hype around the data scientists we see. Everyone knows that they are important. But data driven projects only succeed when a team with mixed skills and know-how is coming together.

A Data Science team needs at least the same number of Data Engineers

In all data driven projects I saw so far, it would have never worked without data engineers. They are relevant for many different things – but mainly – and in an ideal world – working in close cooperation with data scientists. If the maturity in a company for data is high, the data engineer would prepare the data for the data scientist and then work with the data scientist again on putting the algorithm back into production. I saw a lot of projects where the later one wasn‘t working – basically, the first steps were successful (data preparation) but the later step (automation) was never done.

But, there are more roles involved in that: one role, which is rather a specialization of the data engineer is the data system engineer. This is not often a dedicated role, but carried out by data engineers. Here, we basically talk about infrastructure preparation and set-up for the data scientists or engineers. Another role is the one of the data architect that ensures a company-wide approach on data and of course data owners and data stewards.

I stated it several times, but it is worth stating it over and over again: data science isn‘t a one (wo)man show, it is ALWAYS a team effort.

This post is part of the “Big Data for Business” tutorial. In this tutorial, I explain various aspects of handling data right within a company. Another interesting article about the data science team setup can be found here.