DataStax Adds Vector Search To Astra DB And DataStax Enterprise
Written by Kay Ewbank   
Thursday, 10 August 2023

DataStax has announced support for vector search on Astra DB and DataStax Enterprise, opening the option for storing data as vector embeddings to support uses including generative AI applications like those built on GPT-4.

Astra DB is DataStax's NoSQL cloud database that is built on Apache Cassandra, while DataStax Enterprise (DSE) is the equivalent in-house, self-managed data platform that is also built on Cassandra, and is aimed at use by companies who want to keep their data in-house.

datastax

The new vector search facility is available in Astra DB for Google Cloud, Microsoft Azure and Amazon Web Services (AWS) and in DataStax Enterprise for on-premises databases. It is also due to be added to the next release of Apache Cassandra.

Patrick McFadin, Vice President Developer Relations, DataStax, said:

"We try to keep our code base as close to each other for OSS Cassandra, DSE and Astra. DSE and Astra in this case are slightly ahead of the Cassandra 5 release with these features, but will be in parity when Cassandra 5 ships."

Cassandra 5 is expected later this year.

Vector search is a way of retrieving data that uses semantic meaning and similarity rather than specific keywords. The addition of support for vector search opens the option of querying of large volumes of unstructured data like text, audio, images, and videos using semantic meaning and can be used to uncover hidden relationships and patterns.

The technique involves creating a numeric index representing the data, then storing it in a way that lets developers ask "Given one thing, what other things are similar?" Cassandra's developers plan to use Lucene's Hierarchical Navigable Small World (HNSW) library, which they describe as the best ANN (approximate nearest neighbor) algorithm for Java, saying it provides a fast and efficient solution for finding approximate nearest neighbors in high-dimensional space. Cassandra also has a search mechanism called Storage Attached Indexes (SAI) that allows for different search implementations. 

The DataStax team modified this and used it along with Lucene HNSW for Astra DB and DBE's indexing and query syntax.

DataStax also plans to use Cassio, the open source framework which was developed to integrate generative AI and ML into Cassandra, to provide a way to integrate vector search into applications. Cassio is a Python library that simplifies the task of using vector search with Cassandra, and the DataStax developers plan to use it as an interface between their software and GenAI libraries such as LangChain.

The DataStax team says the new facilities will provide users with better search accuracy and more relevant search results, including finding hidden relationships and patterns that traditional keyword searches might miss.

The vector search also means that  Astra DB and DataStax Enterprise can perform similarity calculations and ranking directly within the database, eliminating the need to transfer large amounts of data to external systems.

The vector search capacities are available in Astra DB and DataStax as a developer preview now.


iloveai

Alongside the introduction of the vector search, DataStax is running a virtual "I Love AI" event that is designed to unlock the power of Generative AI for application architects, software developers, practitioners and CTOs. The event takes place on August 23rd with two sessions timed to cater for the USA and for Europe Asia and is designed to give insights into the data platform and AI solutions you need, delivered by experts with real-world experience making AI a reality. Topics will include how to build Generative AI apps with scale, governance and data security, and ways to overcome the biggest obstacles keeping Gen AI from being enterprise ready. Registration is open here.

 datastaxsq

 

More Information

Vector Search Developer Preview

I Love AI Event

Related Articles

DataStax Astra DB gets Change Data Capture

DataStax Extends Stargate

DataStax Adds gRPC To Stargate For Performant Microservices

Cassandra 4.1 Focuses On Pluggability

Cassandra 4 Improves Performance

Last Updated ( Thursday, 10 August 2023 )