Listen to this story
|
The coming-to-be of new data architectures has made enterprises rethink what is, to them, an ideal solution to reach their business goals. Data mesh and lakehouse are the prominent ones currently solving the bottlenecks encountered by traditional architecture and keeping up with increasing complexity of data.
However, at present, the Big Data and Cloud Space world is filled with one too many buzzwords to deal with in the absence of any clarity on what the individual architectures mean.
Consider, for example, the following twitter user who pokes fun at the terminologies used in the industry:
So, what really are these terms?
Data Lakehouse
Arockia Liborious, Principal Consultant at Clariant, defines lakehouse as a two-tier architecture that combines features of traditional architectures like warehouse and lake. According to him, companies today have products which use AI in the form of computer vision, voice models, text mining, and others. In contrast, the earlier architectures relied only on structured data to make business decisions.
S&P Global’s Matt Aslett similarly writes that data lakehouse “blurs the lines between data lakes and data warehousing by maintaining the cost and flexibility advantages of persisting data in cloud storage while enabling schema to be enforced for curated subsets of data in specific conceptual zones of the data lake, or an associated analytic database, in order to accelerate analysis and business decision-making.”
Data Mesh
Data Mesh, on the other hand, is considered to be a “paradigm shift” in the data science industry. Under a cleverly put title, ‘From data mess to data mesh’, Jarvin Mutatiina and Ernst Blaauw from Deloitte explained that the growing number of data sources and the simultaneous need for agility call for an effective data platform more than the traditional ones. According to them, data mesh is a “democratized approach of managing data where different business domains operationalize their own data, backed by a central and self-service data infrastructure”. It is believed to be more of an organisational approach than a technical one.
Data Mesh versus Data Lakehouse
Data Mesh founder Zhamak Dehghani, while speaking at the Data+AI Summit, said, “I don’t really see them [Lakehouse and Mesh] exclusive, I see them as complementary”.
In the same breath, we find organisations such as Cloudera employing a hybrid/multi-cloud model in modern data architectures. For example, Luke Roquet of Cloudera writes, “Modern data platforms deliver an elastic, flexible, and cost-effective environment for analytic applications by leveraging a hybrid, multi-cloud architecture to support data fabric, data mesh, data lakehouse and, most recently, data observability.”
However, several others have called the buzz around data mesh merely a marketing gimmick.
Bill Schmarzo likewise asserts that one of the biggest issues he finds with the data mesh architecture is that it necessitates making everyone a data management and data governance expert. And as many have pointed out, the inclusion of data management and governance at different business units in an organisation also incurs high costs and time.
Sawyer Nyquist at Microsoft, in an attempt to “clear out the noise and marketing hype” around these architectures, says that for 95%+ companies, a data warehouse or data lakehouse is the right solution, and only the top 5% of the largest companies in the world need to worry about data mesh.
As of now, experts have touted different responses to the question of ‘lakehouse versus mesh’. Some suppose a hybrid model, whereas others are more cautious about using mesh as the data platform in small- and medium-scale organisations.