In some of my previous posts, I shared my thoughts on the data mesh architecture. The Data Mesh was originally introduced by Zhamak Dehghani in 2019 and is now enjoying huge popularity in the community. As one of the main thoughts of the data mesh architecture is the distributed nature of data, it also leads to a domain driven design of the data itself. A data circle enables this design.

What is a data circle?

A data circle is a data model, that is tailored to the use-case domain. It should follow the approach of the architectural quant from the micro service architecture. The domain model should only contain all relevant information for the purpose it is built for and not contain any additional data. Also, each circle could or should run within its own environment (e.g. database). The technology should be selected for the best use of the data. A circle might easily be confused with a data mart, that is built within the data warehouse. However, several data circles might not “live” within one (physical) data warehouse but use different technologies and are highly distributed.

Each company will have several data circles in place, each tailored to the specific needs of use-cases. When modelling data with data circles, unnecessary information will be skipped as it will – at some point – be connectable with other data circles in the company. Have we previously built our data models in a very comprehensive way (e.g. via the data warehouse), we now built the data models in a distributed way.

Samples in the telco and financial industry

If we take for example a telco company, data circles might be:

  • The customer data circle: containing the most important customer data
  • The network data circle: containing information about the network
  • The CDR data circle: containing information about calls conducted

If we look at the insurance industry, data circles might be:

  • The customer data circle: containing the most important customer data
  • The claims data circle: containing the data about past claims
  • The health data: containing the data about health related infos

If we focus back to the telco company, the data about the customer might be stored in a relational model within a RDBMS. However, network data might be stored in a graph for better spatial analysis. CDR data might be stored in a files-based setup. For each domain, the best technology is selected and the best model is designed. Similar holds true for other industries.

Several data circles make up the design

Different business units will built their own data circles to fit to their demands. This, however, makes it necessary to create a central repository that sticks it all together: a hub connecting all the circles. The hub stores information about connectivity of different circles. Imagine the network data model again – you might want to connect the the network data with customer data. There must be a way to connect this data, by still keeping its distributed aspects. The hub serves as a central data asset management tool and one-stop-shop for employees within the company to find the data they need.

A data hub connecting different data circles
Circles connected via a hub

The Data Hub also allows users to connect and analyse the data they want to access. This allows the users to use tools such as Jupyter to analyse the data. The hub also takes care about the connectivity to the data and thus provides an API for all users. A data hub is all about data governance.

What’s next?

I recommend you reading about all the other articles I’ve written about the data mesh architecture. It is fairly easy to get stated with this architectural style and the data circles contribute to this.