MITB Banner

AI Startups Need to Learn Stability Tricks from Databricks 

There is a good reason for the inclusion of Databricks in the Forbes list – stability. And much that a lot of the new flash-in-the-pan AI startups can learn from

Share

Listen to this story

The Forbes annual AI 50 list released this year was telling. There were mainstays like Scale AI, some obvious choices like OpenAI, and a few swerves that nobody saw coming. For instance, Emad Mostaque’s Stability AI, whose open-source image generator Stable Diffusion tool became wildly popular after its release last year, was conspicuously missing from the list. In contrast, a company like Databricks, which was decidedly working in an ‘unsexy’ segment in tech, had won a spot on the list. 

Databricks over Stability?

The decision surprised many as was evident on Twitter, but served to show the glaring differences between the two. On the one hand, Stability AI, a startup in the fresh and hot generative AI segment had spent tons of money, and was yet lagging in revenue generation. The company was, in fact, hunting for senior executives to help them save money. While on the other hand, Databricks seemed to be cruising just fine in the rough economic tides. 

Founded in 2013, Databricks popularised the concept of the data ‘lakehouse’, which combines raw data lakes with structured data warehouses. By 2021, the company had turned into one of the 10 most valuable startups in the world with a valuation of USD 38 billion and star financiers like Andreessen Horowitz from a16z, Alphabet’s independent fund CapitalG and asset manager T Rowe Price. 

It was hard to ignore the work that Databricks had been doing. The company essentially acts as the data infrastructure layer for enterprises – its cloud-based platform helps their data teams store data safely, generates analytics and insights and develops ML tools that can eventually be used across. Databricks sold this as ‘democratising data’ and that is what its lakehouse set out to do. 

Source: Free Dolly, Databricks blog

Open-sourcing Dolly 2.0

But maybe the company’s forward-thinking abilities are where its true strength lies. Recently, it gave its 5,000-odd employees an unusual task to write random fictional prose, bits of question-and-answer dialogues and text summaries and other random pieces of information. The end-goal in this was to gather data sufficient to train an open-source ML model to rival OpenAI’s ChatGPT for other enterprises to use. 

On March 24, the company got into the LLM game with the 6-billion parameter model Dolly 2.0 which is much slimmer than, say, GPT-3, let alone the massive new GPT-4. What Databricks intended to do with the model was to make it ‘cheap to build’ for enterprises to buck the trend of LLMs like the ones made by OpenAI and Google, which are too expensive to train on a practical basis. 

Source: Business Insider

The blog post released by the company showed that the company has stuck to its principle of ‘democratisation’. Execs Ali Ghodsi, Matei Zaharia and others wrote, “We’re in the earliest days of the democratisation of AI for the enterprise, and much work remains to be done, but we believe the technology underlying Dolly represents an exciting new opportunity for companies that want to cheaply build their own instruction-following models.”

This is besides the work that Databricks normally does, which can easily fly under the radar. But it isn’t shocking that the company has more than 9,000 clients – Shell uses Databricks to run inventory simulations, AT&T uses Databricks to train and deploy its models and detect frauds. 

Source: Replit blog

Vital to training models

A recent blog on Replit, a software company that has partnered with Google Cloud to make a coding assistant, said it relies heavily on Databricks to build their data pipelines. The article mentioned three main vendors that make up the modern LLM stack — Databricks’ Apache Spark to parallelise the dataset builder process across each programming language, Hugging Face and MosaicML. 

A combination of these factors may explain why Databricks is faring better than its rivals. Even as Snowflake and MongoDB both trade at more than 41% below what they were in August 2021. (Right around that time, Databricks was raising funds from the market). 

Meanwhile, some of the world’s most-valued private companies aren’t what they used to be. Stripe, which was at a valuation of USD 95 billion two years ago is now between USD 55 billion and USD 60 billion. Instacart has chopped its own valuation to about USD 10 billion, lower than USD 39 billion in 2021. So, there is a good reason for the company’s inclusion in the Forbes list – stability. And much that a lot of the potentially new flash-in-the-pan AI startups can learn from. 

Share
Picture of Poulomi Chatterjee

Poulomi Chatterjee

Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India