Building a resilient, scalable modern data platform

The first generation platforms were based on a data warehouse model.

Published on May 2, 2022
by Kartik Wali

Sumit Jindal, director of data engineering and Rashmi Purbey, manager of data engineering, from Publicis Sapient, spoke in detail about the evolution of modern data architecture and application at the Data Engineering Summit 2022. The duo unpacked the data buzzwords doing the rounds in the world of AI and data science in an information-packed session.

Sumit started off with an example of an online and offline multiformat retail and financial giant with a presence in multiple countries. The client needed multi-language support. “We had to build data for multi-business units spanning across multiple business domains, and also account for country dimensions,” he said.

“We enable the data platform, which is seamlessly working on diverse data set from different business units and countries. And the outcome of this system helps our clients become digitally integrated enterprises,” he added.

A modular view

“This is a logical view of a modern data platform. What we are seeing here is the data is coming in different formats such as structured data, unstructured data, semi-structured data etc. The data can be integrated through APIs such as on-demand at batch loads, direct integration through databases, and we can have real-time streaming data as a prerequisite for the system,” Sumit said.

The data collection layer should have the functionality of combining or consuming data from different sources. It should have a data provenance layer where you can go back and see if something is wrong. Apart from this, storage is a fundamental point of a data platform.

Evolution of data architecture

Sumit said the first generation platforms were based on a data warehouse model. The data was integrated with ETL tools like Informatica, data-stage, etc and databases were integrated in a batch fashion. A data warehouse could be a sequel based data system such as Oracle, Teradata, etc which were a bit more performant in terms of doing ad-hoc BI queries. The process of data integration faced limitations such as storage and compute power. Normally, a top-down approach is used while building such a data warehouse.

Two-tier architecture

The tier-two architecture is a modern method of data warehousing. With the advent of systems like Hadoop and Spark, a data lake based model has emerged. Now, in a data warehouse, the storage of unstructured data is particularly challenging. Frequent data updates or data injections pose another bottleneck.

“The two sections of a two-tier architecture are: First, a data lake layer, where you are processing your data. You are first loading data from multiple sources. And the second is data transformation where you are transforming data and making it available for ML as well as analytics use case and for a lot of ad hoc analytics,” he said.

The advantage is you can have multi-modal data available in all formats. However, the inconsistency or staleness of data is an issue.

Data lakehouse

Rashmi Purbey spoke about the applications of data lakehouse on various cloud systems.

Lakehouse on Databricks (Azure as cloud platform)
Lakehouse on AWS
BigLake – Lakehouse on GCP
Lakehouse using Snowflake

Databricks combines the best of both worlds, data warehousing and data lake. Lakehouse (AWS) provides a stable interface layer to query the data from both the data warehouse as well as the data lake. BigLake is a storage engine that allows organisations to unify data warehouses and lakes, and enables them to perform uniform fine-grained access control, and accelerate query performance across multi-cloud storage and open formats. Snowflake is a data warehouse built for the cloud. It enables the data-driven enterprise with instant elasticity, secure data sharing, and per-second pricing. Snowflake combines the power of data warehousing, the flexibility of big data platforms and the elasticity of the cloud at a fraction of the cost of traditional solutions.

“When building a resilient, scalable data platform, businesses normally focus on the platform that they are building, rather than concentrating on the analytics that goes behind building such a platform. Apart from that, one has to even consider the data being generated as a product in itself as there is a demand for such data in the market. One needs to keep enhancing and improving it to keep its quality up. Having the right quality checks and monitoring the output is of paramount importance in building a robust data product,” said Sumit.

Access all our open Survey & Awards Nomination forms in one place >>

Kartik Wali

A writer by passion, Kartik strives to get a deep understanding of AI, Data analytics and its implementation on all walks of life. As a Senior Technology Journalist, Kartik looks forward to writing about the latest technological trends that transform the way of life!

Building a resilient, scalable modern data platform

A modular view

Evolution of data architecture

Two-tier architecture

Data lakehouse

Kartik Wali

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.