You Can Do Better than Having Just a Vector Database

SingleStore’s vector database subsystem, first made available in 2017 and subsequently enhanced, allows an extremely fast nearest-neighbour search to find objects that are semantically similar, easily using SQL.

Share

Illustration by You Can Do Better than Having Just a Vector Database

Published on May 6, 2023

by Eric Hanson

Listen to this story

In the recent past, the database market is seeing a proliferation of speciality vector databases. Companies that buy these database management systems and plug them into their data architectures may be initially hopeful about their ability to query for vector similarity. But the short-lived excitement eventually turns into regret about bringing yet another component into their application environment.

Vectors and vector search are just a data type and query processing approach, not a foundation for a new way of processing data. Using a speciality vector database (SVDB) will lead to the usual problems we see and keep solving again and again. Already, customers normally use multiple speciality systems: redundant data, excessive data movement, lack of agreement on data values among distributed components, extra labour expense for specialized skills, extra licensing costs, limited query language power, programmability and extensibility, limited tool integration, poor data integrity and availability compared with a true database management system (DBMS).

Instead of using an SVDB, we believe that application developers using vector similarity search will be better served by building their applications on a general, modern data platform that meets all their database requirements, not just one.

Case Study

The Story of a Generative AI Startup: Getting Accurate Results from a GPT-powered Chatbot

This story is inspired by real events.

Once upon a time, there was a startup building finely tuned bots to aid developers with highly technical content. It required a system that could perform the following tasks:

Rapidly process and convert semi-structured data into vectors
Employ similarity matching to find locally indexed documents that match user inquiries
Enhance the matching results with additional context and re-sort them
Transmit the context to GPT-4, receive the generated response and present it to the user

The basic operational flows for this application are listed in the diagram below. However, the key factor that distinguishes vector databases is not merely the similarity matching capability, but also the ability to enrich the matching results with supplementary information while ultimately re-sorting the outcomes and obtaining the most accurate answer from GPT.

Challenges with a Specialty Vector Database

Initially, the startup was using an SVDB but soon realized it had its limitations. The SVDB could only provide similar results for a specific text or question and had a very small number of tags that each embedding can store, whereas the startup’s approach required iterating at scale and re-ranking frequently. For instance, being able to rank based on a user’s specific context (like asking a question about a particular version of the software) was a crucial feature for providing personalized support to developers.

As their data architecture became more complex, they had to supplement the SVDB with an ElasticSearch database. User feedback and events were stored in PostgreSQL, and fed into ElasticSearch to refine the ranking. Essentially, the SVDB became an (expensive) feature of a database.

The issue with relying solely on the SVDBs vector similarity capability was that it could only provide a record ID and a score, which was insufficient for delivering accurate results. The records had to be combined with Elasticsearch to provide a more precise version with context, which could then be sent to GPT.

About SingleStoreDB

SingleStoreDB is a high-performance, scalable, modern SQL DBMS and cloud service that supports multiple data models including structured data, semi-structured data based on JSON, time-series, full text, spatial, key-value and vector data. Our vector database subsystem, first made available in 2017 and subsequently enhanced, allows the extremely fast nearest-neighbour search to find objects that are semantically similar, easily using SQL. Moreover, the so-called “metadata filtering” function (which is billed as a virtue by SVDB providers) available in SingleStoreDB is far more powerful and general than its alternatives — simply by using SQL filters, joins and other capabilities.

The beauty of SingleStoreDB for vector database management is that it excels at vector-based operations and it is truly a modern database management system. It has all the benefits one expects from a DBMS including ANSI SQL, ACID transactions, high availability, disaster recovery, point-in-time recovery, programmability, extensibility and more. Plus, it is fast and scalable, supporting both high-performance transaction processing and analytics in a single distributed system.

SingleStoreDB Support for Vectors

SingleStoreDB supports vectors and vector similarity search using dot product (for cosine similarity) and euclidean_distance functions. These functions are used by our customers for applications including face recognition, visual product photo search and text-based semantic search [Aur23]. With the explosion of generative AI technology, these capabilities form a firm foundation for text-based AI chatbots — like our very own SQrL.

The SingleStore vector database engine uses Intel SIMD instructions to implement vector similarity matching extremely efficiently.

Why SingleStoreDB Is the Ultimate Vector Database Solution

To deliver more accurate results at a lower cost per question answered, SingleStore DB required a streamlined architecture that supported semantic search while matching with re-ranking and refinement required analytics.

SingleStoreDB offered the optimal solution as it provided superior performance for processing and analyzing semi-structured data such as JSON. SingleStoreDB can also index text, store and match vectors, re-rank and refine matching results based on additional context.

By the way, SingleStoreDB can already do exact nearest-neighbor search incredibly fast via efficient, indexed metadata filtering, distributed parallel scans and Single Instruction and Multiple Data Stream (SIMD). You can also do ANN search in a way that does not require searching all vectors — with a little extra work — by creating clusters, and only examining vectors in clusters nearby a query vector. Also, most partner integrations can be easily built by the customer on top of SingleStoreDB because they are client application-side integrations that use partner services, and then just interact with the DBMS via SQL.

What can SingleStoreDB do to enable your vector database applications? Try it for free in the cloud or self-hosted today, and find out.

AUTHOR DETAILS:

Eric Hanson: Eric Hanson is a Director of Product Management at SingleStore, responsible for query processing, storage and extensibility feature areas. He joined the SingleStore product management team in 2016.

Arnaud Comet: Arnaud Comet is a Director of Product Management at SingleStore. He joined the SingleStore product management in 2022, and has 10+ years of experience driving cloud services growth.

Access all our open Survey & Awards Nomination forms in one place