Apache Pinot 1.4 Improves Multistage Engine |
Written by Kay Ewbank | |||
Tuesday, 14 October 2025 | |||
Apache Pinot 1.4 has been released with significant improvements to the Multistage Engine, Pauseless Consumption and Time Series Engine among a wide range of other enhancements. Pinot is a real-time distributed OLAP datastore that is purpose-built for low-latency, high-throughput analytics. Pinot was originally developed at LinkedIn to run queries including showing users who had viewed their profile. Pinot can perform typical analytical operations such as slice and dice, drill down, roll up, and pivot on large scale multi-dimensional data. Apache describes Pinot as being ideal for ingesting and immediately querying data from streaming or batch data sources (including, Apache Kafka, Amazon Kinesis, Hadoop HDFS, Amazon S3, Azure ADLS, and Google Cloud Storage). The improvements to the new version star with a new query mode added for running Multistage Engine queries against Pinot, heavily inspired from Uber's Presto over Pinot query architecture. The MSE Lite Mode runs queries following a Scatter-Gather paradigm, with a configurable limit on the number of records returned by each instance of the leaf stage. MSE Lite Mode can also scale to 1000s of QPS with minimal hardware, meaning users can now run complicated multi-stage queries making use of features such as sub-queries and window functions at high-qps and low-latencies. There's also a new query optimizer for the Multistage Engine that can automatically eliminate or simplify redundant Exchanges. The optimizer can simplify Exchanges for arbitrary complicated queries without the need for query-hints. The optimizer supports group-by, joins and union-all, and can solve constant queries within the Broker itself. The multi-stage engine has also been improved, and now supports multiple Window functions in a single query plan. The team says this enables more expressive and efficient analytical queries with improved stage fusion and execution planning. It also has new support for ASOF JOIN, allowing time-aligned joins commonly used in time-series analytics. Pauseless consumption has also been added to this version. This improves real-time analytics by minimizing ingestion delays and improving data freshness. Until now, real-time data ingestion was paused during the build and upload phases of the previous segment, meaning there was a gap in accessing the most recent data. Pauseless consumption allows Pinot to continue ingesting data while completing the build and upload phases of the previous segment. Apache Pinot 1.4 is available now. More InformationRelated ArticlesApache Iceberg Improves Spark Support Spark BI Gets Fine Grain Security Kafka Adds KRaft-Based Authorizer To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |