Understanding vector databases - Part 1

Starting today, we embark on a brief series dedicated to vector databases. Our objective is to provide a comprehensive overview, exploring their definition, the reasons behind their emergence, and shedding light on some of their internal workings.

Vector databases have gained prominence with the widespread adoption of large language models such as ChatGPT. However, their emergence predates this phenomenon, as they address a challenging issue that traditional databases struggle with: enabling similarity and semantic search functionality across diverse data types such as images, videos, and texts.

Throughout this concise series, we will delve into the reasons why vector databases transcend mere hype and are indispensable in specific scenarios. We will explore the algorithms they employ, the challenges they confront, and apply theoretical principles to a practical application: constructing a recommendation system using this technology.

Given the relatively nascent nature of the topic, authoritative textbooks on vector databases are limited. However, one notable resource, specialized in vector databases for NLP applications, serves as a commendable introduction to the subject.

Vector Databases for Natural Language Processing: Building Intelligent NLP Applications with High-Dimensional Text Representations (Allen)

Without further ado and as usual, let's begin with a few prerequisites to correctly understand the underlying concepts. Continue here.