Listen to this story
|
The Forbes annual AI 50 list released this year was telling. There were mainstays like Scale AI, some obvious choices like OpenAI, and a few swerves that nobody saw coming. For instance, Emad Mostaque’s Stability AI, whose open-source image generator Stable Diffusion tool became wildly popular after its release last year, was conspicuously missing from the list. In contrast, a company like Databricks, which was decidedly working in an ‘unsexy’ segment in tech, had won a spot on the list.
Really great to see Databricks featured on the AI50 list, recognizing both lakehouse as an important data management system for AI, and the Dolly open source model. https://t.co/kywoI4prZf
— Matei Zaharia (@matei_zaharia) April 11, 2023
Databricks over Stability?
The decision surprised many as was evident on Twitter, but served to show the glaring differences between the two. On the one hand, Stability AI, a startup in the fresh and hot generative AI segment had spent tons of money, and was yet lagging in revenue generation. The company was, in fact, hunting for senior executives to help them save money. While on the other hand, Databricks seemed to be cruising just fine in the rough economic tides.
Founded in 2013, Databricks popularised the concept of the data ‘lakehouse’, which combines raw data lakes with structured data warehouses. By 2021, the company had turned into one of the 10 most valuable startups in the world with a valuation of USD 38 billion and star financiers like Andreessen Horowitz from a16z, Alphabet’s independent fund CapitalG and asset manager T Rowe Price.
It was hard to ignore the work that Databricks had been doing. The company essentially acts as the data infrastructure layer for enterprises – its cloud-based platform helps their data teams store data safely, generates analytics and insights and develops ML tools that can eventually be used across. Databricks sold this as ‘democratising data’ and that is what its lakehouse set out to do.

Open-sourcing Dolly 2.0
But maybe the company’s forward-thinking abilities are where its true strength lies. Recently, it gave its 5,000-odd employees an unusual task to write random fictional prose, bits of question-and-answer dialogues and text summaries and other random pieces of information. The end-goal in this was to gather data sufficient to train an open-source ML model to rival OpenAI’s ChatGPT for other enterprises to use.
On March 24, the company got into the LLM game with the 6-billion parameter model Dolly 2.0 which is much slimmer than, say, GPT-3, let alone the massive new GPT-4. What Databricks intended to do with the model was to make it ‘cheap to build’ for enterprises to buck the trend of LLMs like the ones made by OpenAI and Google, which are too expensive to train on a practical basis.

The blog post released by the company showed that the company has stuck to its principle of ‘democratisation’. Execs Ali Ghodsi, Matei Zaharia and others wrote, “We’re in the earliest days of the democratisation of AI for the enterprise, and much work remains to be done, but we believe the technology underlying Dolly represents an exciting new opportunity for companies that want to cheaply build their own instruction-following models.”
This is besides the work that Databricks normally does, which can easily fly under the radar. But it isn’t shocking that the company has more than 9,000 clients – Shell uses Databricks to run inventory simulations, AT&T uses Databricks to train and deploy its models and detect frauds.

Vital to training models
A recent blog on Replit, a software company that has partnered with Google Cloud to make a coding assistant, said it relies heavily on Databricks to build their data pipelines. The article mentioned three main vendors that make up the modern LLM stack — Databricks’ Apache Spark to parallelise the dataset builder process across each programming language, Hugging Face and MosaicML.
A combination of these factors may explain why Databricks is faring better than its rivals. Even as Snowflake and MongoDB both trade at more than 41% below what they were in August 2021. (Right around that time, Databricks was raising funds from the market).
Meanwhile, some of the world’s most-valued private companies aren’t what they used to be. Stripe, which was at a valuation of USD 95 billion two years ago is now between USD 55 billion and USD 60 billion. Instacart has chopped its own valuation to about USD 10 billion, lower than USD 39 billion in 2021. So, there is a good reason for the company’s inclusion in the Forbes list – stability. And much that a lot of the potentially new flash-in-the-pan AI startups can learn from.