Apache Lucene Adds Similarity Vector Searches
Written by Kay Ewbank   
Tuesday, 27 February 2024

Apache Lucene 9.10 has been released with support for similarity-based vector searches. Other improvements include block join compatible index sorting, and several improvements to ensure the software takes advantage of the now finalized JDK foreign memory API internally when running on Java 22 or later.

Apache Lucene is a high-performance search engine library written entirely in Java. The developers describe it as being suitable for nearly any application that requires structured search, full-text search, faceting, nearest-neighbor search on high-dimensionality vectors, spell correction or query suggestions. There's also a PyLucene sub project that provides Python bindings for Lucene Core.

lucene

Until recently,  the Solr sub project was part of Lucene, but this has now moved to a separate Apache Top Level Project (TLP). Solr is a popular open source enterprise search platform built on Apache Lucene.

One of the technologies underpinning Lucene is Apache OpenNLP, an open source machine learning library for natural language processing (NLP) for Java.

The commercial uses of Lucene include Amazon Elasticsearch, a free and open search and analytics solution that includes an HTTP web interface and schema-free JSON documents. Elasticsearch is built on Apache Lucene, and Amazon OpenSearch is an open source fork of Elasticsearch.

The main improvement to the latest release is the addition of support for indexing high-dimensionality numeric vectors to perform nearest-neighbor search, using the Hierarchical Navigable Small World graph algorithm. This finds all the vectors scoring above a 'resultSimilarity' while traversing the HNSW graph till better-scoring nodes are available, or the best candidate is below a score of 'traversalSimilarity' in the lowest level.

The second improvement of note means index sorting is now compatible with block joins. This means that IndexWriter preserves document blocks that are indexed when index sorting is configured.

The MMapDirectory has been improved to take advantage of the now finalized JDK foreign memory API internally when running on Java 22 (or later), and SIMD vectorization now takes advantage of the JDK vector incubator on Java 22.

A number of optimizations have also been added to speed queries that match lots of terms; and to make r that have short postings range queries on points end faster.

Lucene 9.10 is available now.

lucene

More Information

Lucene Website

Related Articles

Lucene Core and Solr updated to 3.3

Amazon Announces OpenSearch

Elastic 8 Enhances ElasticSearch

New Amazon Elasticsearch Service

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


GitLab Releases Duo Chat
22/04/2024

GitLab has announced that Duo Chat is now generally available in GitLab 16.11, offering a range of AI features in a single natural language chat experience.



Insights From AI Index 2024 Report
17/04/2024

Published this week, the latest Stanford HAI AI Index report tracks worldwide trends in AI. A mix of its new research and findings from many other sources, it provides a wide ranging look at how  [ ... ]


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 27 February 2024 )