Apache Lucene Adds Similarity Vector Searches
Written by Kay Ewbank   
Tuesday, 27 February 2024

Apache Lucene 9.10 has been released with support for similarity-based vector searches. Other improvements include block join compatible index sorting, and several improvements to ensure the software takes advantage of the now finalized JDK foreign memory API internally when running on Java 22 or later.

Apache Lucene is a high-performance search engine library written entirely in Java. The developers describe it as being suitable for nearly any application that requires structured search, full-text search, faceting, nearest-neighbor search on high-dimensionality vectors, spell correction or query suggestions. There's also a PyLucene sub project that provides Python bindings for Lucene Core.


Until recently,  the Solr sub project was part of Lucene, but this has now moved to a separate Apache Top Level Project (TLP). Solr is a popular open source enterprise search platform built on Apache Lucene.

One of the technologies underpinning Lucene is Apache OpenNLP, an open source machine learning library for natural language processing (NLP) for Java.

The commercial uses of Lucene include Amazon Elasticsearch, a free and open search and analytics solution that includes an HTTP web interface and schema-free JSON documents. Elasticsearch is built on Apache Lucene, and Amazon OpenSearch is an open source fork of Elasticsearch.

The main improvement to the latest release is the addition of support for indexing high-dimensionality numeric vectors to perform nearest-neighbor search, using the Hierarchical Navigable Small World graph algorithm. This finds all the vectors scoring above a 'resultSimilarity' while traversing the HNSW graph till better-scoring nodes are available, or the best candidate is below a score of 'traversalSimilarity' in the lowest level.

The second improvement of note means index sorting is now compatible with block joins. This means that IndexWriter preserves document blocks that are indexed when index sorting is configured.

The MMapDirectory has been improved to take advantage of the now finalized JDK foreign memory API internally when running on Java 22 (or later), and SIMD vectorization now takes advantage of the JDK vector incubator on Java 22.

A number of optimizations have also been added to speed queries that match lots of terms; and to make r that have short postings range queries on points end faster.

Lucene 9.10 is available now.


More Information

Lucene Website

Related Articles

Lucene Core and Solr updated to 3.3

Amazon Announces OpenSearch

Elastic 8 Enhances ElasticSearch

New Amazon Elasticsearch Service

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


VLOGGER - AI Does Talking Heads

Developed by Google researchers VLOGGER AI is a system that can create realistic videos of people talking and moving from a single still image and an audio clip as input. 

Amazon Ending Alexa Skills Payments

Amazon has told developers who are signed up to the Alexa Developer Rewards Program that their monthly payments will end at the end of June. The announcement follows a decision to end the program unde [ ... ]

More News

raspberry pi books



or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 27 February 2024 )