Glad RAGs - To vector database or not to vector database, that is the question
Published:
Content Copyright © 2024 Bloor. All Rights Reserved.
Also posted on: Bloor blogs
There has been a torrent of interest in generative AI since the release of ChatGPT in November 2022. Corporations have been eager to explore the potential for using large language models (LLMs) in many industries and applications, from customer service chatbots to programming assistants and more. However, for many use cases it is clear that for an LLM to be useful it needs to have intimate knowledge of the business of a company. For example, a customer service chatbot that does not know the history of a customer or their orders is going to be of little use in resolving a query about an order. For this reason, companies have started to try to augment the knowledge of LLMs by exposing them to company-specific datasets. These might be customer databases or product specifications or engineering drawings, for example. A customer might take an LLM such as Llama or Mistral and then supplement it with such company-specific information, a technique known as retrieval-augmented generation (RAG).
However, an LLM cannot just look at a few existing documents directly and immediately become an expert in a subject. They like to ingest data in the form of ordered lists of numbers called vectors. A document such as a PDF or the text from a website can be converted into vectors by splitting the text into manageable chunks and then converting these into numerical representations: vectors. The vectors are rather like co-ordinates on a kind of semantic map: similar concepts are assigned numbers that will be near each other on the map, whereas completely different concepts will be farther away. A query to a vector database will result in answers that relate to a particular concept, drawing on the data that has been ingested into the database.
One way of implementing RAG is to license a specialist vector database (such as Pinecone or Weaviate) and load documents into that, then apply the results of the vector search to the LLM. However this means having to deal with a brand new database, with the learning curve that implies. Another approach is to utilise a database that you may already have that happens to support vector storage and search. You will of course want to compare the performance of these different approaches: just because a database notionally supports vector searches, it does not necessarily mean that it does so well, quickly and at scale. However, the high level of interest in this area has meant that database companies have had considerable incentive to add this kind of functionality to their products, as they clearly would prefer customers to store data in their product rather than somewhere else. Consequently, existing databases like Oracle, Snowflake, Redshift, Teradata and Yellowbrick now have varying degrees of vector support. Even open-source SQL databases like Postgres now have extensions that allow vector search. As with any such choice, you need to consider many factors such as the quality of documentation, usability and administration tools as well as simply performance when choosing a solution. Clearly, it is advantageous to use a database that you already own and know how to work with than introducing a brand-new database if you possibly can.
This is a rapidly moving area, and one in which a lot of money is being invested and so will continue to change quickly. Vector databases did not exist until 2010 and were quite niche until the last few years, so you need to do careful research and test out alternatives to make sure that you end up with a solution that meets your needs, and also fits well within your current overall infrastructure.