Published by Domino Data Lab
Data governance refers to the process of managing enterprise data with the aim of making data more accessible, reliable, usable, secure, and compliant across an organization. It is a critical feature of organizational data management and promotes better data quality and data democratization.
A well-planned data-governance framework is fundamental for any data-driven organization that aims to harness the business value of its data and downstream capabilities that drive robust decision-making. It covers and details best practices for data processes, roles, policies, standards, and metrics.
Naturally, data-governance frameworks vary from one organization to the next. Here are a few examples of strong data-governance frameworks recommended at companies like PWC, Hubspot, and ING.
However, there are a set of commonly accepted best practices, as listed below:
In this article, you’ll learn more about data-governance frameworks and their essential components, exploring use cases and best practices for choosing a data-governance framework for your organization.
The Importance of Data Governance
Without effective data governance, organization's data science teams fail to extract the full value of their data. Weak data governance leads to poor data practices and decision-making, causing organizations to lose their competitive edge in an increasingly data-driven corporate environment.
Crucially, poor governance can also impair compliance with regulatory standards like GDPR, HIPAA, SOX, CCPA, and the like, which can have massive consequences for businesses. Hefty fines for violating such laws can cause a dent in a company’s bottom line. For instance, in 2021, Amazon received a whopping GDPR fine of $877 million.
Strong data governance ensures that data is readily available, reliable, consistent, and of high quality to empower businesses to create value from the data. It encompasses processes, people, organizational bodies, and a set of policies that all work together to determine best practices of managing data.
The Benefits of Data Governance
The benefits of data governance are manifold, some of which include the following:
Investing in a robust data-governance framework yields significant returns and helps accelerate an organization’s digital- and data-transformation journey.
How to Choose a Data Governance Framework
There are several foundations of a modern data-governance framework. The primary focus areas include data quality, accessibility, security, and compliance. However, the success of a data-governance framework cannot be realized until people, processes, and technology are combined effectively.
Designing an effective data-governance framework also includes creating a clear set of organizational roles, responsibilities, and data stakeholders for enforcing and managing the governance policy. In this section, you’ll read about the core aspects of building an optimal data-governance framework.
Data accuracy is a central pillar of data quality and refers to error-free and reliable information. Inaccurate data is often the result of poor data entry practices, poor regulation of data accessibility, and poor data quality standards. It is critical to improve data accuracy standards as clean, consolidated, and accurate data can lead to a two times improvement on the return on investment.
There are many ways to test and improve existing data accuracy standards. A real-time data quality audit can identify issues like duplicates, missing values, incomplete information, data being stored in multiple locations, and so on. Some of these common issues can be fixed through tools that automate data quality and accuracy checks, while other issues require manual intervention by data teams.
Data relevance refers to whether the data is pertinent and applicable for deriving business value (i.e., whether it is fit for the purpose it was originally intended for). For instance, information about customers’ sexual orientation and military records are often asked on forms alongside relevant data fields like customer name, email, and other contact details. However, in many cases, data fields like sexual orientation and military records have no material consequence for the business.
In terms of data governance, every data set needs to be rigorously assessed for its relevance and potential value to the organization. As organizations accumulate more and more data, the cost of data storage, maintenance, privacy, and security increases. Therefore, not having robust data-relevance checks can have a significant financial impact as well.
Scope of Policy
In the context of modern data-driven organizations, data governance has a broad scope and covers everything from metadata and data storage to accessibility, quality, security, ownership, and organizational roles and policies for various processes, people, and technology.
Formulating a comprehensive data-governance policy that covers such a wide scope requires concerted efforts from a variety of data stakeholders as well as the executive leadership. As this requires significant investment in talent and technology—where ROI may not be evident in the short term—buy-in and support from leadership is critical.
Data Compliance Standards
Adherence to data compliance standards like the GDPR, HIPAA, SOX, and the like is another crucial element of data governance. As organizations store a lot of confidential user and business data—including personally identifiable information like names, contact details and addresses, passwords, credit card details, and so on—failure to adequately secure this data from adversarial attacks and data breaches, or even mistaken internal access, has massive business consequences.
Lack of compliance to data privacy and security regulations can result in tremendous fines. Perhaps even more significant is the reputational damage to an organization that fails to protect its customer and enterprise data, leading to greater expenses and the possibility of significant loss of revenue and future business.
Data Access Policy
A data access policy sets out which employees can access an organization’s data so that they do not face much friction and interference in their daily work that relies on data. Under a data-access policy, only a certain set of employees are authorized to access and use data for specific use cases.
This policy is applicable to all the data assets across business units and encompasses data of all types and formats. Clear guidelines on who has access and who does not also helps improve data security, promote better adherence to data compliance regulations, and reduce wastage of organizational time and resources.
Organizational Roles for Managing a Data Governance Policy
Data governance is the mandate of several stakeholders, such as the following:
Stages of Implementing Policy
Implementing a well-planned and comprehensive data-governance framework takes time and resources. It involves multiple steps, typically including the following:
It is important to start small and slowly build traction, gaining the confidence of the leadership as well as the various data stakeholders with every step in the process. Involving a variety of stakeholders as discussed above and testing the governance framework in a small-sized team helps to identify key practices and standards that can then be scaled to the level of the whole organization.
In this article, you learned about data-governance frameworks for organizations. Data governance is a fundamental requirement for data-driven companies that helps them manage in-house data assets effectively and create consistent business value.The importance of data governance for modern organizations is great, and the negative consequences of poor governance practices are significant. Choosing a data-governance framework is not straightforward, but exploring the core pillars of strong data governance, including data accuracy, relevance, compliance, and security is an important first step towards modernizing your organization's journey in digital transformation.
In the context of data science and machine learning, a robust data-governance framework is essential to maintain the quality, relevance, timeliness, completeness, accuracy, and validity of the data used to train machine learning models.
Copyright © 2022, Sundeep Teki
All rights reserved. No part of these articles may be reproduced, distributed, or transmitted in any form or by any means, including electronic or mechanical methods, without the prior written permission of the author.
This is a personal blog. Any views or opinions represented in this blog are personal and belong solely to the blog owner and do not represent those of people, institutions or organizations that the owner may or may not be associated with in professional or personal capacity, unless explicitly stated.