Query Your Database In Natural Language With Vanna
Written by Nikos Vaggalis   
Thursday, 01 February 2024

Vanna is a framework with which you can talk to your database not with SQL, but in plain English.

It was expected that at some point someone would use Generative AI to compile English to SQL. SQL although easy to grasp has been the barrier to management's direct interaction with the data silos of the organization. The developer was the bridge between those two. He would get the request of the Management in English or another natural language and translate that to SQL/code in order to generate the necessary report. That role is now played by AI so that even those who do not understand SQL operations can easily use the ability to quickly query business data and generate reports.

But where the developer had the edge, is that he had knowledge of the schema, the internal details of the database and the business logic which he utilized to construct the query. An AI system without that knowledge was destined to fail. Nowadays however LLM's can be trained on your data and thus gain the knowledge that was missing.

Vanna is one of those tools. It aims to make the wealth of information stored in your personal or enterprise database approachable to anyone, as easy as chatting with ChatGPT.

For Vanna's LLM to work its magic, it first needs to be trained. To do that you can feed it pre-made SQL queries, database schemas or even your database's documentation so that it can become familiar with the data. For instance to train with DDL Statements, which contain information about the table names, columns, data types, and relationships you do:

name VARCHAR(100),
age INT

To train with documentation of business terminology or other definitions:

vn.train(documentation="Our business defines XYZ as ... ")

To train with SQL queries, something useful if you were already using queries, you can just copy and paste them:

vn.train(sql="SELECT name, age FROM my-table WHERE name = 'John Doe'")

Once the model is trained you can start asking questions in English (other languages are also supported)

The magic here is that because the underlying LLM model is a "RAG" it feeds itself with new information and as such
your questions will be added to the training data so that the model can self-improve automatically.

As a refresher, RAG or retrieval augmented generation, extends the capabilities of a LLM to gather domain specific knowledge, without the need to retrain the model. It goes without saying that is a very cost-effective approach and dramatically improves the quality of the answers you are provided.

After firing your question, Vanna will translate it into a SQL query which then it runs against the `database. For instance :

vn.ask("What are the top 10 customers by sales?")

Generates the following SQL:

When you run it you'll get a table filled with the result set as well as a nice bar chart for free.

Note that this example although was run against Snowflake, Vanna has connectors for other databases such as Postgres , Biquery and SQLite.

Starting with it is simple - you just need Python and an API key:

%pip install vanna

import vanna
from vanna.remote import VannaDefault
api_key = vanna.get_api_key('my-email@example.com';)

Set the the name of the RAG model to use. This is typically associated with a specific dataset. In this instance we load the RAG 'chinook' that refers to the Chinook sample database.

vanna_model_name = 'chinook'
vn = VannaDefault(model=vanna_model_name, api_key=api_key)

Connect to the Database. Here we're connecting to a SQLite database.


Finally we can ask Questions with vn.ask in order to generate SQL, run the SQL, show the table, and generate a chart.

vn.ask("What are the top 5 artists by sales?")

Of course, this example uses a pre-trained model. If you want to use Vanna on your own data you've got to train it first.
There's detailed instructions on how to go about it up on Vanna's official site documentation.

Because Vanna generates both the SQL, the dataset and the chart, it caters for everyone's needs; the developer, the manager , the data scientists. Will that make you forego writing SQL from now on? As always, LLMs are complementary to one's work and not supplementary.


More Information



Related Articles

Learn To Chat with Your Data For Free

Turn PostgreSQL Into A Vector Store


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


SplitBody - Using EMS To Free Your Mind

Can automating a repetitive movement via electrical muscle stimulation reduce mental workload while users concurrently perform an unrelated task? The SplitBody research study from the University  [ ... ]

SQL Turns 50

The first release of SQL was in June 1974. Designed at IBM by Donald D. Chamberlin and Raymond F. Boyce, it was based on the relational model proposed by E.F. Codd. SQL became the most widely used dat [ ... ]

More News

C book



or email your comment to: comments@i-programmer.info