|Query Your Database In Natural Language With Vanna
|Written by Nikos Vaggalis
|Thursday, 01 February 2024
Vanna is a framework with which you can talk to your database not with SQL, but in plain English.
It was expected that at some point someone would use Generative AI to compile English to SQL. SQL although easy to grasp has been the barrier to management's direct interaction with the data silos of the organization. The developer was the bridge between those two. He would get the request of the Management in English or another natural language and translate that to SQL/code in order to generate the necessary report. That role is now played by AI so that even those who do not understand SQL operations can easily use the ability to quickly query business data and generate reports.
But where the developer had the edge, is that he had knowledge of the schema, the internal details of the database and the business logic which he utilized to construct the query. An AI system without that knowledge was destined to fail. Nowadays however LLM's can be trained on your data and thus gain the knowledge that was missing.
Vanna is one of those tools. It aims to make the wealth of information stored in your personal or enterprise database approachable to anyone, as easy as chatting with ChatGPT.
For Vanna's LLM to work its magic, it first needs to be trained. To do that you can feed it pre-made SQL queries, database schemas or even your database's documentation so that it can become familiar with the data. For instance to train with DDL Statements, which contain information about the table names, columns, data types, and relationships you do:
To train with documentation of business terminology or other definitions:
vn.train(documentation="Our business defines XYZ as ... ")
To train with SQL queries, something useful if you were already using queries, you can just copy and paste them:
vn.train(sql="SELECT name, age FROM my-table WHERE name = 'John Doe'")
Once the model is trained you can start asking questions in English (other languages are also supported)
The magic here is that because the underlying LLM model is a "RAG" it feeds itself with new information and as such
As a refresher, RAG or retrieval augmented generation, extends the capabilities of a LLM to gather domain specific knowledge, without the need to retrain the model. It goes without saying that is a very cost-effective approach and dramatically improves the quality of the answers you are provided.
After firing your question, Vanna will translate it into a SQL query which then it runs against the `database. For instance :
vn.ask("What are the top 10 customers by sales?")
Generates the following SQL:
When you run it you'll get a table filled with the result set as well as a nice bar chart for free.
Note that this example although was run against Snowflake, Vanna has connectors for other databases such as Postgres , Biquery and SQLite.
Starting with it is simple - you just need Python and an API key:
%pip install vanna
Set the the name of the RAG model to use. This is typically associated with a specific dataset. In this instance we load the RAG 'chinook' that refers to the Chinook sample database.
vanna_model_name = 'chinook'
Connect to the Database. Here we're connecting to a SQLite database.
Finally we can ask Questions with vn.ask in order to generate SQL, run the SQL, show the table, and generate a chart.
vn.ask("What are the top 5 artists by sales?")
Of course, this example uses a pre-trained model. If you want to use Vanna on your own data you've got to train it first.
Because Vanna generates both the SQL, the dataset and the chart, it caters for everyone's needs; the developer, the manager , the data scientists. Will that make you forego writing SQL from now on? As always, LLMs are complementary to one's work and not supplementary.
or email your comment to: firstname.lastname@example.org