Data Mesh: Moving Away From Monolithic & Centralised Data Lakes

Data mesh marks an architectural and organisational shift in the way enterprises manage big data.
Data Mesh

“My ask before reading on is to momentarily suspend the deep assumptions and biases that the current paradigm of traditional data platform architecture has established; Be open to the possibility of moving beyond the monolithic and centralised data lakes to an intentionally distributed data mesh architecture; Embrace the reality of ever-present, ubiquitous and distributed nature of data,” said Zhamak Dehghani, currently the director of emerging technologies at Thoughtworks.

Data mesh, a decentralised data architecture, marks an architectural and organisational shift in the way enterprises manage big data.

Data mesh

As per Zhamak, the planning and building process of data and intelligence platforms can be divided into three generations.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
  • In the first generation, organisations employed proprietary enterprise data warehouses and business intelligence platforms. It was a costly approach and often left the companies reeling under technical debts.
  • The second generation had a big data ecosystem and long-running batch jobs operated by a central team of data engineers who created data lakes.
  • Industries are currently developing the third generation of data platforms similar to the previous generation but with some gaps addressed, such as real-time data analytics and cost reduction in managing big data infrastructure.

Zhamak suggested the next enterprise data platform architecture should be built to converge distributed domain-driven architecture, product thinking with data, and self-serve platform design. This gives way to data mesh.




The shift to data mesh is founded on four principles:

  • Decentralisation of data ownership and architecture
  • Domain-oriented data presented as a product
  • Using self-serve data infrastructure as a platform to get autonomous domain-oriented data teams
  • Enabling interoperability through federated governance

Data mesh is a highly decentralised data architecture to solve challenges such as lack of ownership of data, lack of quality data and removing bottlenecks to encourage organisational scaling.

The goal of data mesh is to treat data as a product, with each source having a data product owner, who could ideally be part of the cross-functional team of data engineers. Despite having a separate owner, the data should be domain-focused and should have an autonomous offering that leads to a domain-driven distributed architecture.

When to consider it?

The current data platform architectures are primarily built on a data lake or data warehouse. Unlike popular belief, the goal of data mesh is not to completely replace them. A centralised data platform with a specialised team generally works well for small and medium-sized enterprises.

However, when the organisation grows, its data domains become more diverse, and new data sources are introduced. In such cases, the existing architecture starts creating unnecessary friction and may slow down the processes.

However, it is difficult to tell when the organisation becomes big enough to render existing approaches ineffective. Further, even large organisations can remain effective with the centralised data platform. A better method could be considering the size of the IT team and evaluating whether the size of the data platform slows the cycle of innovation and turn into a bottleneck. The symptoms include: continuously longer lead times, the appearance of data solutions separate from the centralised data platform, and a need for temporary solutions for integrating new data sources.

Wrapping up

Data mesh is not a plug and play solution. It comes with its own sets of  challenges, including:

  • Need for domain specialisations: Domain-specific ETL, data lake and tools will require teams with expertise in complex data systems such as Kafka, Spark, etc.
  • Creating more copies of data can prove to be a governance challenge. This problem will be further compounded by multi-cloud and hybrid-cloud infrastructure.
  • If the company is not covered on all bases for transition to a decentralised approach, the success could be short-lived.
Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Download our Mobile App

MachineHack

AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Strengthen Critical AI Skills with Trusted Corporate AI Training

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR