|PostgresML - Bring Your ML Workload To The Database
|Written by Nikos Vaggalis
|Tuesday, 23 January 2024
PostgresML is a machine learning extension for PostgreSQL that lets you perform training and inference on text and tabular data using SQL queries. It opens up intriguing possibilities. Let's find out more.
Wouldn't be awsome to run ML models on your live data without leaving the database? With the PostgresML extension anyone who owns a PostgreSQL database can now run ML workloads on top of SQL.
The barrier right now to applying ML to your data is that you would have to replicate your schema, structures and data to an OLAP warehouse when both schema and data already exist inside your database. Why not take that stack out of the picture and run your queries where the data actually lives?
On the other hand it might sound like a terrible idea to train ML models on live data and running analytics on your OLTP database, which is already overloaded from its usual day-to-day usage. With PostgresML however, models can be retrained periodically when that makes sense or when there's a lot of variations in the data being updated.
Along those lines, you could create a database view of your training data which would be automatically refreshed when new data gets inserted. In any case you have to balance the ease of doing predictions on your data without going to the hassle of amending your tech stack, say interline an OLAP product, to the potential performance trade-offs of doing that on your OLTP. But doing so you also get feedback in real time, something of penultimate importance on some scenarios, like fraud detection.
However depending on the size of your dataset and its change frequency, you may want to offload training (or inference) to secondary PostgreSQL servers to avoid excessive load on your primary. There's three built-in mechanisms to help distribute the load. Mainly through pg_dump, Foreign Data Wrappers, or Logical replication. There's yet another option which I will outline in a future article of a related product.
Now on to the main meal.. At the highest level, With PostgresML you can
Some real world use cases you can use PostgresML are:
PostgresML installation consists of three parts: PostgreSQL database, Postgres extension for machine learning and a dashboard app. The extension provides all the machine learning functionality and can be used independently using any SQL IDE. The dashboard app provides an easy to use interface for writing SQL notebooks, performing and tracking ML experiments and ML models.
By far the easiest way to get startes is with Docker :
docker run \
Note that PostgresML can also be accessed by your favorite tools and languages; for instance, Python:
To wrap it up, PostgresML is part of the Postgres-for-everything movement. Now it's bringing your ML workload to the Database!
or email your comment to: email@example.com
|Last Updated ( Tuesday, 23 January 2024 )