MITB Banner

Why Data Pipeline is Important for High ROI AI Products

Salesforce, Snowflake, Databricks, and all other AI companies are expanding their hold on data management companies.

Share

Why Data Pipeline is Important for High ROI AI Products

Illustration by Diksha Mishra

Listen to this story

Salesforce is said to be in the final stages of negotiations to acquire data-management software provider Informatica for $11 billion. To many, the deal is reminiscent of Snowflake’s acquisition of Neeva in May last year. 

“Salesforce potentially acquiring Informatica seems like a push to compete more with Snowflake,” wrote Astasia Myers, partner at Felicis. 

These acquisitions are in step with big-tech companies like Google acquiring Looker, the data analytics startup, and Microsoft acquiring ADRM Software and Rubrik in 2019 and 2021, respectively. Rubrik, a data management company, is now targeting an IPO with a $5.4 billion valuation. 

Databricks’ acquisition of Acrion, Mosaic ML, and Okera is also along similar lines, aimed at managing its data pipeline and increasing generative AI capabilities.

Similarly, Salesforce’s possible acquisition of Informatica is targeted at greatly enhancing its data capabilities, especially in fields like data integration, quality assurance, and customer insights. This also points towards the importance of building a data strategy and ensuring a smooth pipeline when it comes to building high-ROI AI products.

Contextualised Data is King

James Wu, partner at M12, highlighted in a recent post that building a strong data pipeline and building data-centric AI is important. That is why the venture fund also invested in Unstructured.io, with another data curation company in the pipeline. “Big data will continue to be the foundation, but contextualised data is king,” he said. 

“We’re interested in the ‘AI-data feedback loop’ – we think better AI can analyse data to identify errors and inconsistencies, improving data quality for future models,” he explained, saying that cleaner data can also help in training superior AI models, like a cyclic loop. 

Naveen Rao, VP of generative AI at Databricks, also shared similar thoughts. “We at Databricks are very much about the lifecycle of data and GenAI working synergistically together. We demonstrated the power of our training platform by building DBRX with it and we used all the tools in Databricks. We believe in the power of all the components around the model that comprise the full system,” he said.

This points to the need for building a good data strategy for expecting high ROI on AI products. Matthew Blasa, AI strategist and lead data scientist consultant, emphasises that since AI’s lifeblood is data, it is important to have an endless clean data pipeline for AI products. 

Source: Matthew Blasa

“It’s important to ensure that your data is reliable, relevant to the large needs, and collected from multiple sources,” Blasa explained. “Without a clear data strategy, creating a model with enduring value is challenging. Relying solely on retraining and monitoring won’t close the gap. It may even make it harder.”

Crawling, walking, and running with AI

AI advisor Vin Vashishta shares the perfect plan for companies building AI products. “One thing that I’ve learned after a decade of building data and AI products is that businesses must crawl-walk-run with AI,” he wrote in a post. 

Crawling is about collecting data, walking involves using the data to create descriptive models, and running uses more advanced models such as predictive, prescriptive, and diagnostic ones. He explains how starting with crawling and walking makes running less expensive and faster in the long run. 

Each phase offers immediate benefits and builds on the previous phase, creating a solid foundation. “Walk and run handle about 90% of use cases, reducing time to value,” Vashishta explained.

In another post, Vashishta explained how high-quality data can bring quick results, and descriptive models trained on it yield quarterly gains. These efforts lay the foundation for AI products and potentially larger returns. “Trash data trains trash models, but the business needs tangible returns in months, not years. Fixing the data doesn’t deliver them unless data teams and leaders take a product-first approach,” he added.

The Data Pipeline Strategy

It is clear that data availability is important to build the best generative AI products. This is why companies like Salesforce, Snowflake, Databricks, and all other data and AI providers are expanding their hold on data companies. This would, in the end, provide them with high-quality streamlined data to improve their AI products. 

AI products are data products. “Without a solid data strategy, it’s tough to trust the decisions made by our AI-driven products and keep them profitable,” said Blasa.

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.