PostgresML - Bring Your ML Workload To The Database
Written by Nikos Vaggalis   
Tuesday, 23 January 2024

PostgresML is a machine learning extension for PostgreSQL that lets you perform training and inference on text and tabular data using SQL queries. It opens up intriguing possibilities. Let's find out more.

Wouldn't be awsome to run ML models on your live data without leaving the database? With the PostgresML extension anyone who owns a PostgreSQL database can now run ML workloads on top of SQL.

The barrier right now to applying ML to your data is that you would have to replicate your schema, structures and data to an OLAP warehouse when both schema and data already exist inside your database. Why not take that stack out of the picture and run your queries where the data actually lives?

On the other hand it might sound like a terrible idea to train ML models on live data and running analytics on your OLTP database, which is already overloaded from its usual day-to-day usage. With PostgresML however, models can be retrained periodically when that makes sense or when there's a lot of variations in the data being updated.

Along those lines, you could create a database view of your training data which would be automatically refreshed when new data gets inserted. In any case you have to balance the ease of doing predictions on your data without going to the hassle of amending your tech stack, say interline an OLAP product, to the potential performance trade-offs of doing that on your OLTP. But doing so you also get feedback in real time, something of penultimate importance on some scenarios, like fraud detection.

However depending on the size of your dataset and its change frequency, you may want to offload training (or inference) to secondary PostgreSQL servers to avoid excessive load on your primary. There's three built-in mechanisms to help distribute the load. Mainly through pg_dump, Foreign Data Wrappers, or Logical replication. There's yet another option which I will outline in a future article of a related product.

Now on to the main meal.. At the highest level, With PostgresML you can

  • Perform natural language processing (NLP) tasks like sentiment analysis, question and answering, translation, summarization and text generation

  • Access 1000s of state-of-the-art language models like GPT-2, GPT-J, GPT-Neo from the HuggingFace model hub

  • Fine tune large language models (LLMs) on your own text data for different tasks

  • Use your existing PostgreSQL database as a vector database by generating embeddings from text stored in the database.

Some real world use cases you can use PostgresML are:

  • Natural Language Processing

Text Classification
Zero-Shot Classification
Token Classification
Question Answering
Text Generation
Text-to-Text Generation

  • Forecasting
  • Real Time Fraud Detection
  • Tumor Detection with Binary Classification
  • Handwritten Digit Image Classification
  • Diabetes Progression with Regression
  • Deep Learning with Transformers
  • Working with Embeddings

And how do I use it? First you've got to train your model on your data: 

PostgresML installation consists of three parts: PostgreSQL database, Postgres extension for machine learning and a dashboard app. The extension provides all the machine learning functionality and can be used independently using any SQL IDE. The dashboard app provides an easy to use interface for writing SQL notebooks, performing and tracking ML experiments and ML models.

By far the easiest way to get startes is with Docker :

docker run \
-it \
-v postgresml_data:/var/lib/postgresql \
-p 5433:5432 \
-p 8000:8000 \
ghcr. io/postgresml/postgresml:2. 7. 12 \
sudo -u postgresml psql -d postgresml

Note that PostgresML can also be accessed by your favorite tools and languages; for instance, Python:

To wrap it up, PostgresML is part of the Postgres-for-everything movement. Now it's bringing your ML workload to the Database!

More Information

PostgreML Github


Related Articles

It's 2024. Why Does PostgreSQL Still Dominate?




To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Udacity Launches New Blockchain Nanodegree

Udacity has revamped its BlockChain Developer Nanodegree program. It is a two-month program at Beginner level, although you'll need to be familiar with JavaScript and the new emphasis is how Blockchai [ ... ]

The Mycelial SQLite For Beginners Course

There's a self-paced. YouTube-based course by Mycelial on
the ins and outs of SQLite. It's short, succinct and free and a must watch for anybody wanting to get started with  SQLite.

More News

C book



or email your comment to:

Last Updated ( Tuesday, 23 January 2024 )