Data governance is a vast topic and so it is better to call out the scope of this article. It skips the legal aspects of data governance. If the desire is to understand what GDPR is or how to adhere to it, please stop reading further. It also does not cater to the geographic limitations of storing data imposed by various governments. In addition, the article has been written keeping marketing data as the centre of the universe. The concepts may be applied to other types of data like financial, retail, or supply chain. However, the author of the article does not vouch for the usage of the underlying concepts in other verticals.
Why do organisations need Data Governance?
Today’s organizations usually have a large swath of data covering most of the marketing activities like CRM, Digital Analytics, Media, Search, Social, etc. In order to have ample usage and get maximum value, organizations try to democratize data by allowing analysts to use it without any guidelines. In a hurry to reach nirvana of customer value realization, organizations often reach a point where the underlying data used in analysis becomes hard to fathom. The situation in many cases becomes similar to that of rush hour on roads where it takes 1 hour to cover a distance that ideally should have been covered in half the time. Similar to rush hour, it is the absence of guidelines or lack of awareness about them or ineffective monitoring that the availability of actionable data becomes a challenge across the organization.
Some examples of challenges created due to lack of proper data governance are the organizations where employees have left years ago but their login credentials to datasets are functioning fine. Or in an e-commerce website for capturing the product name and quantity on a product page on ‘Add to Cart’ action, the Tagging Team uses DOM (HTML of the webpage) scraping for certain categories of product pages while for others they use a Data Layer. On the other hand, there are cases where marketing tags are lying in the tag manager long after the campaigns have been stopped. It is also apt to mention about scenarios in which definition of geography to be accounted for while performing an analysis is always debatable i.e. the location at which the user was present at the time of making a transaction or the location at which the user usually logs in. There are organizations who believe in tools that can provide out of the box capability for data governance and can help them overcome such situations. They do this because of a lack of understanding of data governance landscape.
Data Governance on two-dimensional landscape
Every organization ideally would fall on a two-dimensional landscape. The first axis is ‘Data Control’ which shows the level of control organisation has on underlying data. Are there strict controls on data capture, transform, distribution or usage of data or the organization is happy with just monitoring it. The second axis is ‘Data Democratization’ which shows a range of data accessibility from who gets to access data to who all should be excluded from data access. This can be defined at the level of organization or more efficiently at the level of each dataset.
Although the image above tries to show the positioning of various industries on the landscape, it is a high level of generic groupings and do not bind the industry verticals. There are organizations in the banking industry that are way ahead in data accessibility. Where an organization would fall will largely depend on three aspects:
- The competing interests between various groups that use the underlying data will determine whether to regulate the data or monitor the data.
- The quantum of underlying data that the organization is working with would determine who could access what type of data.
- The availability of resources for data regulation and the number of groups involved in activities of data capture, transform, distribution or usage also influence the positioning.
Levers that enable categorization
Now the natural question is that this all looks good in theory. However, implementing it is not usually feasible because no one person in the organization has the authority to determine where the organization will be placed in the above quadrants. To overcome this challenge it is better to work with four levers that govern the positioning into these four quadrants.
Data Definition Lever (or Group): ‘Why’ the data has to be collected, ‘What’ data has to be collected and at ‘Which’ point in the user journey it has to be collected. Defines the data in terms of:
- Granularity i.e. to calculate site exit rates should bounces be considered or not?
- Consistency i.e. is there a standard definition of site exit rates across the organization?
- Data taxonomy i.e. where should the data be stored and for how long?
Ideally, it should categorize each identified data attribute, dimension or metric in to classes based on its criticality to the business and risks related to the data leakage.
Data Usage Lever (or group): ‘Who’ gets to work with certain set of data, ‘What’ authority the user has in terms of using the data? Defines user groups in terms of:
- Level of data access i.e. will a user group get read-only or read/write access to a dataset.
- User management i.e. ensure that relevant dataset is accessible to new users or revoke access when it is no longer required.
- Nomenclature i.e. what should be naming convention for data in order for every user to find the data easily without creating redundancies in the ecosystem.
Data Processing Lever (or group): ‘How’ the data is collected, processed, and, frames guidelines around using the data for analysis or segmentation. Defines processes related to data in terms of:
- Collection Ownership: Which team owns the collection, accuracy and concurrency for a particular dataset
- Processing Ownership: Which team owns the reporting, analysis & processing a particular dataset
- Distribution Ownership: Which team has the ownership that the appropriate business stakeholders are getting relevant reports in time for them to take action
All three levers are equally important for the smooth functioning of any data governance initiative. To ensure smooth functioning between these three groups it is necessary to have an arbitration and steering lever (or committee) which can overlook the functioning of all three and help in resolving any conflicts that may arise between them.
Most of the request & information flow from one lever to the other can be defined by the below diagram. However, close collaboration is imperative in all cases.
When an organization can clearly define the data and user categories defined in lever 1 and 2, they can find their space in the data governance landscape shown in the previous section. This would also ensure the numerous data sharing issues that large organization can have with partners and agencies they work with.
Register for our upcoming events:
- Meetup: NVIDIA RAPIDS GPU-Accelerated Data Analytics & Machine Learning Workshop, 18th Oct, Bangalore
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad
Enjoyed this story? Join our Telegram group. And be part of an engaging community.
Provide your comments below
What's Your Reaction?
Gaurav is a part of the AIM Writers Programme. He is a data consultant with over 12 years of experience working with digital marketing data. His passion is to work with large data sets generated by various customer touchpoints, assimilate and generate meaningful insights. He usually likes to use these insights and act upon them to optimise user journey and personalise customer experiences. Gaurav is also an Adobe certified expert in Analytics & Audience Manager.