AI Startups Need to Learn Stability Tricks from Databricks 

There is a good reason for the inclusion of Databricks in the Forbes list – stability. And much that a lot of the new flash-in-the-pan AI startups can learn from
Listen to this story

The Forbes annual AI 50 list released this year was telling. There were mainstays like Scale AI, some obvious choices like OpenAI, and a few swerves that nobody saw coming. For instance, Emad Mostaque’s Stability AI, whose open-source image generator Stable Diffusion tool became wildly popular after its release last year, was conspicuously missing from the list. In contrast, a company like Databricks, which was decidedly working in an ‘unsexy’ segment in tech, had won a spot on the list. 

Databricks over Stability?

The decision surprised many as was evident on Twitter, but served to show the glaring differences between the two. On the one hand, Stability AI, a startup in the fresh and hot generative AI segment had spent tons of money, and was yet lagging in revenue generation. The company was, in fact, hunting for senior executives to help them save money. While on the other hand, Databricks seemed to be cruising just fine in the rough economic tides. 

Founded in 2013, Databricks popularised the concept of the data ‘lakehouse’, which combines raw data lakes with structured data warehouses. By 2021, the company had turned into one of the 10 most valuable startups in the world with a valuation of USD 38 billion and star financiers like Andreessen Horowitz from a16z, Alphabet’s independent fund CapitalG and asset manager T Rowe Price. 

It was hard to ignore the work that Databricks had been doing. The company essentially acts as the data infrastructure layer for enterprises – its cloud-based platform helps their data teams store data safely, generates analytics and insights and develops ML tools that can eventually be used across. Databricks sold this as ‘democratising data’ and that is what its lakehouse set out to do. 

Source: Free Dolly, Databricks blog

Open-sourcing Dolly 2.0

But maybe the company’s forward-thinking abilities are where its true strength lies. Recently, it gave its 5,000-odd employees an unusual task to write random fictional prose, bits of question-and-answer dialogues and text summaries and other random pieces of information. The end-goal in this was to gather data sufficient to train an open-source ML model to rival OpenAI’s ChatGPT for other enterprises to use. 

On March 24, the company got into the LLM game with the 6-billion parameter model Dolly 2.0 which is much slimmer than, say, GPT-3, let alone the massive new GPT-4. What Databricks intended to do with the model was to make it ‘cheap to build’ for enterprises to buck the trend of LLMs like the ones made by OpenAI and Google, which are too expensive to train on a practical basis. 

Source: Business Insider

The blog post released by the company showed that the company has stuck to its principle of ‘democratisation’. Execs Ali Ghodsi, Matei Zaharia and others wrote, “We’re in the earliest days of the democratisation of AI for the enterprise, and much work remains to be done, but we believe the technology underlying Dolly represents an exciting new opportunity for companies that want to cheaply build their own instruction-following models.”

This is besides the work that Databricks normally does, which can easily fly under the radar. But it isn’t shocking that the company has more than 9,000 clients – Shell uses Databricks to run inventory simulations, AT&T uses Databricks to train and deploy its models and detect frauds. 

Source: Replit blog

Vital to training models

A recent blog on Replit, a software company that has partnered with Google Cloud to make a coding assistant, said it relies heavily on Databricks to build their data pipelines. The article mentioned three main vendors that make up the modern LLM stack — Databricks’ Apache Spark to parallelise the dataset builder process across each programming language, Hugging Face and MosaicML. 

A combination of these factors may explain why Databricks is faring better than its rivals. Even as Snowflake and MongoDB both trade at more than 41% below what they were in August 2021. (Right around that time, Databricks was raising funds from the market). 

Meanwhile, some of the world’s most-valued private companies aren’t what they used to be. Stripe, which was at a valuation of USD 95 billion two years ago is now between USD 55 billion and USD 60 billion. Instacart has chopped its own valuation to about USD 10 billion, lower than USD 39 billion in 2021. So, there is a good reason for the company’s inclusion in the Forbes list – stability. And much that a lot of the potentially new flash-in-the-pan AI startups can learn from. 

Download our Mobile App

Poulomi Chatterjee
Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Bangalore

Future Ready | Lead the AI Era Summit

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

20th June | Bangalore

Women in Data Science (WiDS) by Intuit India

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox