The age of black box is gone

Advertisement

The key focus for any data-driven business is to ensure that the underlying data can be trusted. Additionally, with an increasing number of ML and AI-driven applications, Ops has become a critical component in stabilising pipelines. “Building and maintaining trust in the modern data stack is a challenging yet interesting problem to solve,” said Varun Saraogi, principal data engineer at TheMathCompany, at The Data Engineering Summit 2022. In the session titled, Building Trust in the data & All Ops, he emphasised a change in the mindset for building a data-centric thought process. 

Data trust is the confidence that data is healthy and ready to act on. However, this confidence cannot be taken on faith and has to be quantified. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Varun listed a few criteria to build data trust:

  • Data quality: Accuracy, completeness, consistency etc
  • Data pipelines: Timeliness, alerting, resolution
  • Data cataloguing and lineage: Discoverability
  • Data privacy and security: PHI/ PII/Others, encryption, etc
  • Automation and reusability: All Ops

The age of black box is gone

Varun said businesses are skeptical of data at some level. Technological advancements, need for faster decisions and the increasing number of stakeholders in the data pipeline complicate matters further.  The data can get distorted at various stages:

  • From ingestion to consumption, 
  • Multiple layers in transforming the data
  • Multiple teams managing this data

Varun outlined strategies to build trust in data:

Get everyone involved in the data lifecycle: “We have to 

decentralise the data process and ensure that everyone who touches data feels equally involved and responsible. There is a need for collaboration where everyone in the life cycle is involved,” he said.

Built systems and culture around data quality: A unified approach to data requires transparent data management processes and documented and communal data quality standards. 

Shared data quality rules across the organisation: This include automated checks embedded in data systems and building policies that set clear expectations for how people interact and maintain data.

DataOps

“DevOps has solved many challenges in software engineering and now in data platform systems as well. DevOps principles widely adopted in the organisation have given us a clear view on the significance of looking deep down. It has given us a better understanding of the implementation and management of such systems,” said Varun. 

DataOps is the process of automating the end to end data flow and enables teams to work independently. It reduces error rates and increase quality while offering clear measurement, monitoring and transparency of results. Data observability leads to reliable data pipelines and brings transparency in monitoring, alerting, tracking and triaging incidents. 

DataOps helps reduce turnaround time of projects, increase automated tests, improve data quality and visibility into data pipelines- all contributing to building data trust. 

“To build data trust, we need to help business and tech team understand what data looks like. We also need to ensure that data is discoverable and available for the end customer analytics reach. Apart from DataOps, MLops also plays a critical role in building the trust in the system,” Varun said. He concluded the session with guiding principles to build data trust:

  • Ops first approach
  • Reusability in data pipelines
  • Testing the data pipelines
  • Data catalogue and data lineage
  • Collaborative data quality management
  • Data privacy and security
  • Alerting and monitoring

REGISTER HERE TO ACCESS THE CONTENT

More Great AIM Stories

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MORE FROM AIM