Intelligence and unified data governance in the age of multi-cloud

Data mesh is a type of data architecture that makes data accessible, available, discoverable, secure and interoperable.

Today, it is imperative for organisations to adapt to an increasingly data-driven world and build analytic agility. However, it’s easier said than done, given the varied sources of information organisations handle and complex data handling mechanisms, including data movement, data discovery, cleansing and preparing trusted data for analytics etc. The challenge is magnified two-fold when you are unsure where your data is coming from and what it means. In the Data Engineering Summit 2022, Kirthi Ganapathy, customer engineering manager at Google Cloud, shared insights, key learnings and best practices around intelligent management of metadata, security and governance in a diverse and largely distributed data environment. 

What is data governance?

Data governance, at its most basic level, is the practice of enhancing an organisation’s data to make it discoverable, understood, protected and trusted. Every enterprise should think about the entire data lifecycle starting with data intake and ingestion, cataloguing persistence, retention, storage, management, sharing, archiving, backup, recovery, disposition, and data removal and deletion.

Data governance framework has four main pillars:

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.
  1. Data discoverability: Data classification, data lineage, metadata and catalogue and data quality
  2. Data management: Lifecycle and records management, reference data, master data and SRE
  3. Data protection: Masking, encryption, access management, audit and compliance, residency and recoverability
  4. Data accountability: Ownership, policies and standards, domain-based governance and ethics

“Data governance encompasses the ways that people, processes and technology can work together to enable auditable compliance with defined and agreed upon policies across different technical solutions and different infrastructure boundaries,” Kirthi said.

Data priorities

“What organisations really want is to be able to derive insights from the data they have, without any restrictions, without necessarily moving it and in a way that makes sense to them,” Kirthi said.

An intelligent data fabric enables organisations to centrally manage, monitor and govern the data across data lakes, data warehouses, and datamarts with consistent controls, providing access to trusted data and powering analytics at scale. It offers unified metadata-led data management through a single pane of glass, centralised security and governance, enabling distributed ownership with global control, built-in intelligence to unify distributed data without data movement, and an open platform with support for open source tools and a robust partner ecosystem.

What is a data mesh?

Data mesh is a type of data architecture that makes data accessible, available, discoverable, secure and interoperable. It combines two principles: domain-driven decentralisation and data as a product.

In domain-driven decentralisation, data is owned by the people who understand it best. For example, the finance team owns the finance data, and the HR team owns the HR and employee data. So no single centralised entity owns the whole organisation’s data. 

In the second approach, data is considered a product. A team owns data just like a team would own the set of services and their business. In other words, you treat other teams as internal customers of your data.

Now let us delve into how to build a data mesh architecture. Building a data mesh involves:

  1. Organising data to map to your business: Logically organising data based on how it is used instead of where it is stored.
  2. Uniformly manage and govern data: Setup standardised policies for access control, data quality, classification and lifecycle management.
  3. Access data from a variety of tools: Access distributed data from google cloud-native and open source tools with automatic metadata propagation and a unified experience. 

Google Cloud Way

“We have three data domains here, sales data, CRM data or customer data and product data, each of which can be implemented as a different data lake, with its respective data pipelines, enabling the respective product teams to set up a very fine-grained permission control, including at a sub lake or ozone level on each of these data lakes independently, as defined by the organisation best practices,” said Kirthi.

She further stated that with this architecture:

  1. Your organisation gets the freedom to store data where you want, choose the best analytics tools and have flexibility in pricing and consumption model to meet financial governance needs.
  2. Built-in data intelligence leveraging the best in class AI/ML capabilities to automate data management and reduce manual toil. 
  3. Enable standardisation and unification of metadata, security policies and data classification.

REGISTER HERE TO ACCESS THE CONTENT

More Great AIM Stories

Kartik Wali
A writer by passion, Kartik strives to get a deep understanding of AI, Data analytics and its implementation on all walks of life. As a Senior Technology Journalist, Kartik looks forward to writing about the latest technological trends that transform the way of life!

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM