“My ask before reading on is to momentarily suspend the deep assumptions and biases that the current paradigm of traditional data platform architecture has established; Be open to the possibility of moving beyond the monolithic and centralised data lakes to an intentionally distributed data mesh architecture; Embrace the reality of ever-present, ubiquitous and distributed nature of data,” said Zhamak Dehghani, currently the director of emerging technologies at Thoughtworks.
Data mesh, a decentralised data architecture, marks an architectural and organisational shift in the way enterprises manage big data.
As per Zhamak, the planning and building process of data and intelligence platforms can be divided into three generations.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
- In the first generation, organisations employed proprietary enterprise data warehouses and business intelligence platforms. It was a costly approach and often left the companies reeling under technical debts.
- The second generation had a big data ecosystem and long-running batch jobs operated by a central team of data engineers who created data lakes.
- Industries are currently developing the third generation of data platforms similar to the previous generation but with some gaps addressed, such as real-time data analytics and cost reduction in managing big data infrastructure.
Zhamak suggested the next enterprise data platform architecture should be built to converge distributed domain-driven architecture, product thinking with data, and self-serve platform design. This gives way to data mesh.
The shift to data mesh is founded on four principles:
- Decentralisation of data ownership and architecture
- Domain-oriented data presented as a product
- Using self-serve data infrastructure as a platform to get autonomous domain-oriented data teams
- Enabling interoperability through federated governance
Data mesh is a highly decentralised data architecture to solve challenges such as lack of ownership of data, lack of quality data and removing bottlenecks to encourage organisational scaling.
The goal of data mesh is to treat data as a product, with each source having a data product owner, who could ideally be part of the cross-functional team of data engineers. Despite having a separate owner, the data should be domain-focused and should have an autonomous offering that leads to a domain-driven distributed architecture.
When to consider it?
The current data platform architectures are primarily built on a data lake or data warehouse. Unlike popular belief, the goal of data mesh is not to completely replace them. A centralised data platform with a specialised team generally works well for small and medium-sized enterprises.
However, when the organisation grows, its data domains become more diverse, and new data sources are introduced. In such cases, the existing architecture starts creating unnecessary friction and may slow down the processes.
However, it is difficult to tell when the organisation becomes big enough to render existing approaches ineffective. Further, even large organisations can remain effective with the centralised data platform. A better method could be considering the size of the IT team and evaluating whether the size of the data platform slows the cycle of innovation and turn into a bottleneck. The symptoms include: continuously longer lead times, the appearance of data solutions separate from the centralised data platform, and a need for temporary solutions for integrating new data sources.
Data mesh is not a plug and play solution. It comes with its own sets of challenges, including:
- Need for domain specialisations: Domain-specific ETL, data lake and tools will require teams with expertise in complex data systems such as Kafka, Spark, etc.
- Creating more copies of data can prove to be a governance challenge. This problem will be further compounded by multi-cloud and hybrid-cloud infrastructure.
- If the company is not covered on all bases for transition to a decentralised approach, the success could be short-lived.