Listen to this story
In the decade gone by, new enterprise technologies have come up for better data management like data lakes, cloud computing and data mesh. But, the importance of a semantic layer got forgotten somewhere along the way. Enterprises still see a gap between their data and their data science teams as descriptive analytics teams and technology still remain in two separate places. Descriptive analytics is one thing and ML-based analytics is another thing entirely but both are just as essential to companies.
Need for a semantic layer
The semantic layer is one of the underlying platforms in data architecture which helps people access the data on their own in simpler language. The value of it doesn’t solely lie in ensuring the language and perspectives are consistent throughout. Customers can then determine their business metrics and define them for themselves and keep reusing them. The semantic layer allows the critical metrics within organisations to be the same across workstreams and cuts down the amount of time data scientists spend coordinating between teams.
When an organisation has its semantic layer right in place, where the raw data is stored or what tools different teams are using to consume the data output doesn’t matter. All the teams can simply feed themselves from the semantic layer and any new information found by data scientists can just be fed into the layer so that future decisions are also based on them. This liberates data science, which can normally get tied to certain platforms or tools and removes the costs of reworking things. However, it is far easier to maintain one semantic layer for definitions instead of hundreds of scattered reports.
Sign up for your weekly dose of what's up in emerging technology.
While the concept of semantic layers is as old as the early 90s, there is a resurgence for them in the industry now. In 1991, SAP introduced semantic layers. But it wasn’t until 2021 that the ‘modern’ semantic layer, as we know it, returned. Tools like MetriQL, Airbnb’s Miernva, MetricFlow and Cube.js offered semantic layers.
Why were semantic layers abandoned?
Despite all these upsides, enterprises purposely kept themselves away from semantic layers because of how cumbersome centralised semantic layers were. Semantic layers took time to build and maintain. Besides, the layer has to continuously be in sync with the database and any changes that happen.
Download our Mobile App
The data for semantic layers also usually existed in multiple backend systems or operational data stores (ODS) so organisations had to manage several semantic layers simultaneously as one semantic layer had to be kept for each system or each tool.
But logic within enterprises had become dispersed everywhere—data was distributed, duplicated and varied combinations of data were formed. Companies indubitably needed data but there was also no getting away from it. Organisations had complicated personal models and data management was becoming a significant challenge.
Rise in new BI tools
With the intent to remain agile, companies started using new, seemingly fancy tools that came up in Business Intelligence (BI) like Tableau and Qlik. Then, heavy, centralised semantic layers were done away with completely. The idea was that these low-code no-code BI tools would simplify the process and democratise the data. IT companies were slowly being pushed to appease more sophisticated clients. But the more these offerings cropped up, the more companies adapted them, the more confusing things became.
There were multiple BI tools for multiple teams and no single semantic layer—someone used Tableau, someone used Power BI and someone else used Excel and there was no single data point.
A new universal semantic layer
The realisation came much later that while these data discovery tools were great at what they had been designed for, they weren’t necessarily appropriate for core BI. What this calls for is a new and improved universal semantic layer.
Cloud businesses like Google, Snowflake and even unicorns like dbt Labs are now speaking up about the indispensability of a universal semantic layer. The core idea behind dbt Labs’ new semantic layer is that users should define the universal metrics only once and use them anywhere.
This is not to say that data discovery tools can be discarded as discovery is a legit function in BI but tools for data discovery and semantic layer are not interchangeable.
The key thing to remember about semantic layers is that it follows an ‘All or Nothing’ principle. A semantic layer is only useful when it is truly universal and completely misses the target when it isn’t. This means that it must support a wide range of use cases and roles like data scientists, business analysts and developers. A universal semantic layer should also work with a variety of query tools like SQL, MDX, DAX, Python REST, JDBC and ODBC.
An ideal semantic layer is defined by core features like semantic modelling which maps the logical elements like the metrics and KPIs with the physical entities in the database, a multidimensional calculation engine which is scalable, performance optimisation which works on speed and analytics governance.
In case any of these requirements are amiss, a semantic layer essentially becomes unusable.