Apache Cloudberry 2 Released |
Written by Kay Ewbank | |||
Thursday, 11 September 2025 | |||
Apache Cloudberry 2 has been released. This is a major upgrade that the developers say delivers significant enhancements to the database kernel, representing a substantial leap forward in performance, reliability, and manageability. Apache Cloudberry is a Massively Parallel Processing (MPP) database for large-scale data analytics, derived from PostgreSQL and the last open-source version of Greenplum Database but built on a more modern PostgreSQL kernel and with more advanced enterprise capabilities. Cloudberry can serve as a data warehouse and can also be used for large-scale analytics and AI/ML workloads. Cloudberry is designed to perform efficient queries on big data for analysis. It uses the built-in PostgreSQL optimizer to maximize query performance in distributed environments thanks to its efficient query plans. Cloudberry uses the open-source GPORCA optimizer for this. Other techniques to support fast queries include static and dynamic partition pruning, aggregate push-down, and join filtering. It uses both rule-based and cost-based query optimization methods. Cloudberry supports multiple storage formats, including Heap storage, AO row storage, and AOCS column storage. It also supports partitioned tables. Data protection features include function encryption and transparent data encryption (TDE). This means that the Apache Cloudberry kernel performs data encryption invisibly to users. The majority of improvements to this release are aimed at query processing and optimization, particularly via the GPORCA optimizer. The GreenPlum ORCA query optimizer is a cost-based query optimizer that generates the most efficient execution plan for complex queries by improving join ordering, handling features like subqueries, and optimizing performance for partitioned tables. This version of Cloudberry has better support for index-only scans, with the ability to use a broader range of index types when using the GPORCA optimizer, including those with covering indexes using INCLUDE columns. This version also supports dynamic index-only scan when using GPORCA to accelerate queries on partitioned tables. The developers say this combines partition pruning with index-only access to avoid heap lookups, significantly reducing I/O and improving performance. Support has also been added for index-only scans when using GPORCA on append-only (AO) tables and PAX tables, meaning the technique can be used where traditional index scans on AO and PAX tables were previously inefficient. Backward index scans are also now supported when using GPORCA for queries with ORDER BY ... DESC. BRIN index enhancements reduce disk space usage for empty indexes and improve performance by avoiding unnecessary page access. BRIN indexes on AO/CO tables have also been improved. This version also adds support for queries using GROUP BY CUBE, enabling multi-dimensional grouping sets in query plans, expanding analytic query capabilities. Away from the improvements to query handling, the main improvements to this version are aimed at making the database ASF-compliant via licensing and notice/disclaimer updates, and comprehensive source code header alignment with ASF standards. A number of binary artifacts have been removed from the source release for similar reasons, and the build process for Python and C++ components has been improved. Cloudberry 2 is available now on GitHub and the Cloudberry Website. More InformationRelated ArticlesGreenplum's Cloudberry Fork Enters Apache Incubator ElasticSearch Search Capabilities Baked Into PostgreSQL To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
|||
Last Updated ( Thursday, 11 September 2025 ) |