Listen to this story
The world of databases has been through many different phases. The oldest kinds are SQL or relational databases where all data fits into structured rectangular tables. The growing needs of Web 2.0 companies triggered the NoSQL revolution, where databases became more flexible and capable of handling larger troves of data. Now as the market readies itself for an AI-hungry market, emerges another one, called vector databases.
In simple words, unlike traditional databases vector databases are proficient at finding meaning in unstructured data. These databases use vector embeddings to represent the data in numerical values and arrange them in clusters that are similar to each other.
While vector databases can be used for text generation, in search engines and even in recommendation systems, the most common use case among vector databases is their search function. Vector or similarity search helps users query the database using similar objects to easily compare and find similar matches. Another advantage of vector search is that these queries have low latency, which is well-suited for generative AI applications.
Competition is heating up
It isn’t that this throws traditional databases like Postgres or NoSQL database Redis out of the competition completely. Postgres too has a vector or similarity search function called Pgvector.
Not to be left behind, older database companies are fortifying themselves with AI-related services. For instance, Oracle is offering a collection of AI algorithms while also offering them at “the speed of in-database learning”. IBM’s old-school db2 has now been rebranded as the “AI database” and now has ML to boost query performance and “confidence-based querying”.
But if AI is front-and-centre for most companies now, it is only natural that they will be demanding AI-first infrastructure.
Bob van Lujit, the CEO of SeMI Technologies, spoke to VentureBeat about just why something like Weviate was different from relational database companies. “It’s really AI-first infrastructure that we have here. For the first time, this bridge is being built between all that stuff that’s being done in data science and people seeing the promise and need for their companies. We’re making that bridge,” van Lujit explained.
Avyukt Aggarwal, the founder of software services startup Heltar spoke about how intricately tied up vector databases had become with generative AI tools. “Every gold rush has people selling shovels. For generative AI, what are the shovels? Vector databases. Almost every single LLM-powered app uses them or will use them soon. LLMs will be integrated into almost all major apps. Investing in a basket of companies that provide managed vector databases seems like the way to profit off of shovels in the modern gold rush,” he said.
Calling vector databases shovels wouldn’t be an overstatement. As the AI applications getting deployed into production in businesses grow by the day, the need for a good vector database becomes as vital as SQL is to operate cloud.
New money for a new kind of database
Investors bitten by the generative AI bug are already flocking to these interconnected areas within the ecosystem. But is all the money being thrown at these companies worth anything? Well, vectors themselves are very valuable while representing richer datasets ranging from text, audio, images and videos, which is perfect for the myriad of use cases that generative AI has brought up.
By March-end, a handful of startups like Pinecone, Chroma and Weviate raised millions from VCs like Andreessen Horowitz and Index Ventures around the same time.
SeMI Technologies, the developer of the open-source search engine Weviate announced it had picked up USD 16 million in its Series A funding led by New Enterprise Associates and Cortical Ventures. Pinecone bagged USD 28 million in Series A last year in March led by funding from Menlo Ventures, Tiger Global and Wing Venture Capital. Vector database startup Chroma scored USD 18 million in seed funding to touch a valuation of USD 75 million. (It is also worth noting that Chroma has just 1.2 stars on GitHub).
In the latest, another German startup Qdrant, which has also built an open source vector search engine and a database for unstructured data, yesterday announced a USD 7.5 million seed funding from lead investor Unusual Ventures, with participation from 42cap, IBB Ventures and angel investors like Cloudera co-founder Amr Awadallah.
At the moment, as is with any new technology that works, it might be difficult to separate the hype from the actually good. Jeff Delaney, a Google developer expert and the creator of the YouTube channel, Fireship, spoke about how he launched an impromptu vector database project called Rektor with no revenue, business plan or code to show. In a short span of time, the valuation of the company shot up to USD 420 million.