The age of black box is gone

The key focus for any data-driven business is to ensure that the underlying data can be trusted. Additionally, with an increasing number of ML and AI-driven applications, Ops has become a critical component in stabilising pipelines. “Building and maintaining trust in the modern data stack is a challenging yet interesting problem to solve,” said Varun Saraogi, principal data engineer at TheMathCompany, at The Data Engineering Summit 2022. In the session titled, Building Trust in the data & All Ops, he emphasised a change in the mindset for building a data-centric thought process. 

Data trust is the confidence that data is healthy and ready to act on. However, this confidence cannot be taken on faith and has to be quantified. 

Varun listed a few criteria to build data trust:

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.
  • Data quality: Accuracy, completeness, consistency etc
  • Data pipelines: Timeliness, alerting, resolution
  • Data cataloguing and lineage: Discoverability
  • Data privacy and security: PHI/ PII/Others, encryption, etc
  • Automation and reusability: All Ops

The age of black box is gone

Varun said businesses are skeptical of data at some level. Technological advancements, need for faster decisions and the increasing number of stakeholders in the data pipeline complicate matters further.  The data can get distorted at various stages:

  • From ingestion to consumption, 
  • Multiple layers in transforming the data
  • Multiple teams managing this data

Varun outlined strategies to build trust in data:


Download our Mobile App



Get everyone involved in the data lifecycle: “We have to 

decentralise the data process and ensure that everyone who touches data feels equally involved and responsible. There is a need for collaboration where everyone in the life cycle is involved,” he said.

Built systems and culture around data quality: A unified approach to data requires transparent data management processes and documented and communal data quality standards. 

Shared data quality rules across the organisation: This include automated checks embedded in data systems and building policies that set clear expectations for how people interact and maintain data.

DataOps

“DevOps has solved many challenges in software engineering and now in data platform systems as well. DevOps principles widely adopted in the organisation have given us a clear view on the significance of looking deep down. It has given us a better understanding of the implementation and management of such systems,” said Varun. 

DataOps is the process of automating the end to end data flow and enables teams to work independently. It reduces error rates and increase quality while offering clear measurement, monitoring and transparency of results. Data observability leads to reliable data pipelines and brings transparency in monitoring, alerting, tracking and triaging incidents. 

DataOps helps reduce turnaround time of projects, increase automated tests, improve data quality and visibility into data pipelines- all contributing to building data trust. 

“To build data trust, we need to help business and tech team understand what data looks like. We also need to ensure that data is discoverable and available for the end customer analytics reach. Apart from DataOps, MLops also plays a critical role in building the trust in the system,” Varun said. He concluded the session with guiding principles to build data trust:

  • Ops first approach
  • Reusability in data pipelines
  • Testing the data pipelines
  • Data catalogue and data lineage
  • Collaborative data quality management
  • Data privacy and security
  • Alerting and monitoring

REGISTER HERE TO ACCESS THE CONTENT

More Great AIM Stories

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
AIM TOP STORIES

Is AI sexist?

Genderify, launched in 2020, determines the gender of a user by analysing their name, username and email address using AI.