Apache Gravitino 0.9 Released
Written by Alex Denham   
Thursday, 29 May 2025

Apache Gravitino v0.9.0-incubating has been released, with optimizations to the fileset catalogs and model catalogs, making it easier for users to manage their unstructured AI data and model data.

Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. By using a technical data catalog and metadata lake, users can manage access and perform data governance for multiple data sources including filestores, relational databases, and event streams, while safely using multiple engines like Spark, Trino, or Flink on multiple formats on different cloud providers.

gravitino

It provides users with unified metadata access for data and AI assets, and the developers describe it as the world's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake. 

Gravitino provides several key features, starting with unified metadata management through a unified model and API to manage different types of metadata, including relational data such as Hive and MySQL; and and file-based metadata sources such as HDFS and S3.

gravitino meta

It provides a  governance layer for managing metadata with features like access control, auditing, and discovery; and connects directly to metadata sources via connectors, ensuring changes are instantly reflected between Gravitino and the underlying systems.

Gravitino can be deployed across multiple regions or clouds, allowing instances to share metadata for a global cross-region view. It supports query engines enabling metadata access without modifying SQL dialects, and is expanding to manage both data and AI assets, with support for AI models and features currently in development.

The improvements to the new release start with the model and fileset catalogs. Until now, the model catalog was immutable, so inflexible. Users can now alter models and model versions and add tags. Meanwhile, in the fileset catalog, Gravitino now supports multiple named storage locations within a single fileset and placeholder-based path generation. There's also new multiple location support, meaning users can reference data across different file systems such as HDFS, S3, and GCS through a unified fileset interface, each with a unique location name. The developers say the enhancements significantly improve the flexibility for multi-cloud environments and complex data organization patterns while maintaining a clean abstraction layer for data assets management. 

The Gravitino Virtual File System (GVFS) now supports accessing multiple locations within filesets, and has been refactored with a pluggable architecture allowing custom operations and hooks. 

Gravitino 0.9 is available now on GitHub.  

 

gravitino

More Information

Gravitino Website

Gravitino On GitHub

Related Articles

Apache Spark Now Understands English

Spark 3 Improves Python and SQL Support

DataBricks Open Sources All Of Delta Lake

Databricks Delta Lake Now Open Source

Apache Hive Adds Support For Set Operations

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


ACM Sigmod Competition Winners Announced
20/06/2025

This year's ACM SIGMOD competition winners have been announced ahead of the annual ACM SIGMOD PODS conference which is being June 22-27, in Berlin, Germany. 



MCP For Debugging And Reverse Engineering
27/05/2025

Model Context Protocol is now taking control over Windbg and Ghidra to automate the tedious tasks that reverse engineers have to go through in their day-to-day work.


More News

pico book

 

Comments




or email your comment to: comments@i-programmer.info