Apache Impala 4 Supports Operator Multi-Threading
Written by Kay Ewbank   
Thursday, 29 July 2021

Apache Impala 4 has been released with many improvements including support for multi-threading across all operators, and support for all TPC-DS 99 queries without manual rewrites. The new version has also improved authentication and authorization.

Impala is an open source, native analytic database for Apache Hadoop that provides a high-performance distributed SQL engine. It was originally developed by Cloudera, and donated to the Apache Software Foundation along with Apache Kudu.

impala

Impala can be used to run SQL queries on data stored in HDFS, HBase, Apache Kudu, Amazon S3, and Microsoft ADLS without requiring data movement or transformation.

The support for multi-threading by operators in the new release overcomes earlier limitations caused because a single query fragment ran in a quasi-single threaded manner on a node. The scanners did run in multiple threads, but all other operators (joins, aggregation) ran in the main thread. The new support adds multi-threaded execution on a single node by running multiple fragment instances, each of which runs in a single thread. The move results in significant performance improvements for some queries, in some cases up to seven times faster by taking better advantage of all the CPU cores.

impala parallel query improvement

The degree of parallelism used for certain operations that can benefit from multithreaded execution is set by a parameter called mt_dop (MultiThreading Degree Of Parallelism). Until now, Impala only supported setting MT_DOP in queries that have only scans and aggregates. This limitation has now been removed.

Another improvement to the new release is that it supports all TPC-DS 99 queries without manual rewrites, including Rollup, Cube and Grouping sets, and uncorrelated subqueries in SelectList. Support has also been added for Intersect and Except set operations.

Authentication and authorization features have been strengthened in the new release, with the ability to integrate with Apache Knox, and support for SAML (Security Assertion Markup Language) authentication. Impala is also now FIPS (Federal Information Processing Standards) compliant. A number of LDAP (Lightweight Directory Access Protocol) features have been added, including support for LDAP search bind operations, and User LDAP search bind support. 

Other authentication and authorization improvements include support for Ranger row-filtering policies, and support for basic role-related statements with Ranger. Kudu table ownership is also supported.

The full list of improvements can be seen in the Impala release notes, and Impala is available for download now.

impala

More Information

Impala Website

Impala 4 Release Notes

Related Articles

Apache Kudu Improves Web Interface

Hadoop SQL Query Engine Launched

Cloudera Impala Real Time Query On Hadoop 

Apache Arrow Adds Streaming Binary Format 

HBase Adds MultiWAL Support

Apache Kafka Adds New Streams API

Apache Beam Moves To Top Level

HBase Adds MultiWAL Support

Spark BI Gets Fine Grain Security

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Running PostgreSQL Inside Your Browser With PGLite
18/03/2024

Thanks to WebAssembly we can now enjoy PostgreSQL inside the browser so that we can build reactive, realtime, local-first apps directly on Postgres. PGLite is about to make this even easier.



nginx Core Developer Announces New Fork
23/02/2024

One of the core developers of nginx has said he is no longer working on the development of the popular and widely used nginx web server, and is instead working on a new fork. Maxim Dounin release [ ... ]


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info