… this is at least what I hear often. A lot of people working in the data domain state this to be “false but true”. Business units are often pressing data delivery to be dirty and thus force IT units to deliver this kind of data in an ad-hoc manner with a lack of governance and in bad quality. This ends up having business projects being carried out inefficient and with a lack to a 360 degree view on the data. Business units often trigger inefficiency in data and thus projects fail – more or less digging their own hole.
The issue about data governance is simple: you hardly see it in P&L if you did it right. At least, you don’t see it directly. If your data is in bad shape, you might see it from other results such as failing projects and bad results in projects which use data. Often business in the blamed for bad results – even though the data was the weak point. It is therefore very important to apply a comprehensive data governance strategy in the entire company (and not just one division or business unit). Governance consists of several topics that need to be addressed:
What is data governance about?
- Data Security and Access: data needs to stay secure and storages need to implement a high level of security. Access should be easy but secure. Data Governance should enable self-service analytics and not block it.
- One common data storage: Data should stored under same standards in the company. A specific number of storages should cover all needs and different storage techniques should be connected. No silos should exist
- Data Catalog: It should be possible to see what data is available in the company and how to access it. A data catalog should make it possible to browse different data sources and see what is inside (as long as one is allowed to access this data)
- Systems/Processes using data: There is a clear tracking of data access. If there are changes to data, it should be possible to see what systems and processes might be affected by it.
- Auditing: An audit log should be available, especially to see who accessed data when
- Data quality tracking: it should be possible to track the quality of datasets under specific items. These could be: accuracy, timeliness, correctness, …
- Metadata about your data: Metadata about the data itself should be available. You should know what can be inside your data and your Metadata should describe your data precisely.
- Master data: you should have a golden record about all your data. This is challenging and difficult, but should be the target
Achieving this is very complex but can be achieved if the company is implementing a good data strategy. There are many benefits for Data Governance.
This post is part of the “Big Data for Business” tutorial. In this tutorial, I explain various aspects of handling data right within a company.