Turn PostgreSQL Into A Vector Store
Written by Nikos Vaggalis   
Tuesday, 26 September 2023

pg_vector is an extension for PostgreSQL that renders it a viable alternative to specialized vector stores used in LLMs. We show you how to use it and report on the latest, faster incarnation, written in Rust, pgvecto.rs.

In "Learn To Chat with Your Data For Free", we've explored a LangChain course about that exact purpose going through the steps necessary.

One of those steps was storing the data into a Vector store. As a refresher, a vector or embedding is that:

embeddings take a piece of text and create a numerical representation of that text. Text with similar content will have similar vectors in this numeric space. 

What that means is we can then compare those vectors and find pieces of text that are similar. NymPy can be used to convert the text into embeddings and then load them into the Chroma vector store.

pg_vector allows you to replace Chroma or any other specialized vector engine with Postgres, so that you can have your embeddings stored alongside your JSON or relation data under the same roof.

That capability renders Postgres usable in an AI or ML setting. At the same time when being able of hosting embeddings and without even touching the LLM yet, we still can do useful similarity searches like KNN and ANN from within Postgres.

Using the extension is a simple as:

Enable the extension (do this once in each database where you want to use it):

CREATE EXTENSION vector:

-- Create a vector column with 3 dimensions
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));

-- Insert vectors
INSERT INTO items (embedding) VALUES ('[1, 2, 3]'), ('[4, 5, 6]');

-- Get the nearest neighbors by L2 distance
SELECT * FROM items ORDER BY embedding <-> '[3, 1, 2]' LIMIT 5;

You can use pgvector from any language with a Postgres client -C, C#, Perl, Java, even Dart, you name it. You can even generate and store vectors in one language and query them in another. The extension itself is written in C.

However, very recently pgvecto. rs emerged which is pgvector but written in Rust, fostering extra advantages over the original :

  • Transaction support
  • Sufficient Result with Delete/Update/Filter
  • Vector Dimension Limit 65535
  • Prefilter on HNSW
  • Parallel Index build
  • Linearly faster with more cores
  • Index Persistence mmap file
  • WAL amplification 2x

Also based on benchmarks, pgvecto.rs can be up to 2x faster than pgvector on HNSW indexes with the same configurations.

Speed is just one property of pgvecto.rs, however. pgvecto. r is architected that way to easily add new algorithms to it so that contributors can implement new indexes with ease. For instance while pgvecto. rs comes by default with two built-in index types - HNSW for maximum search speed and ivfflat for quantization-based approximate search, anyone can create additional indexes like RHNSW, NGT, or custom types tailored to specific use cases.

Using it is similar to the procedure necessary for pgvector :

CREATE EXTENSION vectors;

-- create table with a vector column

CREATE TABLE items (
id bigserial PRIMARY KEY,
embedding vector(3) NOT NULL
);

You can then populate the table with vector data as follows.

-- insert values

INSERT INTO items (embedding)
VALUES ('[1, 2, 3]'), ('[4, 5, 6]');

-- or insert values using a casting from array to vector

INSERT INTO items (embedding)
VALUES (ARRAY[1, 2, 3]::real[]), (ARRAY[4, 5, 6]::real[]);

You can then call the distance function through operators

-- squared Euclidean distance
SELECT '[1, 2, 3]' <-> '[3, 2, 1]';
-- negative dot product distance
SELECT '[1, 2, 3]' <#> '[3, 2, 1]';
-- negative square cosine distance
SELECT '[1, 2, 3]' <=> '[3, 2, 1]';

or search for a vector simply like this.

-- query the similar embeddings
SELECT * FROM items ORDER BY embedding <-> '[3, 2, 1]' LIMIT 5;
-- query the neighbors within a certain distance
SELECT * FROM items WHERE embedding <-> '[3, 2, 1]' < 5;

And with that, Postgres is here to rule them all. Due to its capability to be extend without affecting the core, Postgres is truly open to innovation; the limit is the imagination of its open source community.

postgreslogo 

More Information

pgvector

pgvecto.rs
 

Related Articles

The DbDev Package Manager For PostgreSQL TLEs

Turn Your SQLite Database Into A Server

Learn To Chat with Your Data For Free

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Google Releases Vertex AI Agent Builder
06/05/2024

Google has launched Vertex AI Agent Builder, alongside new open-source language models for the Vertex AI platform. The announcements were made at the Google Cloud Next 2024 event.



Apache Arrow 16 Adds Azure Blob Support
21/05/2024

Apache Arrow 16 has been released with improvements to the C data interface and to Arrow Flight RPC, and the addition of support for Azure Blob data format.


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 26 September 2023 )