Apache Flink 1.9 Adds New Query Engine

Written by Kay Ewbank

Tuesday, 27 August 2019

Significant features on this path are batch-style recovery for batch jobs and a preview of the new Blink-based query engine for Table API and SQL queries.

Apache Flink is an open source platform for distributed stream and batch data processing. It consists of a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink includes several APIs, including the DataSet API for static data embedded in Java, Scala, and Python; the DataStream API for unbounded streams embedded in Java and Scala; the Table API with a SQL-like expression language embedded in Java and Scala; and the streaming SQL API that enables SQL queries to be executed on streaming and batch tables, with a syntax is based on Apache Calcite.

flinklogo

The main improvements to the new version start with the addition of batch-style recovery for batch jobs, and a preview of the new Blink-based query engine for Table API and SQL queries.

The new batch-style recovery has significantly reduced the time to recover a batch job from a task failure. This covers DataSet, Table API and SQL jobs. Until this version, if a task failed, the recovery of a batch job involved canceling all tasks and restarting the whole job, voiding all progress. You can now configure Flink to limit the recovery to only those tasks that are in the same failover region, the set of tasks that are connected via pipelined data exchanges.

The Blink-based query engine preview is a development of the donation of Blink to Apache Flink. Blink’s query optimizer and runtime for the Table API and SQL have been integrated into Flink, and the query planner has been extended so that there are now two choices of pluggable query processors to execute Table API and SQL statements: the pre-1.9 Flink processor and the new Blink-based query processor. The Blink-based query processor offers better SQL coverage and improved performance for batch queries because it has more extensive query optimization including cost-based plan selection and more optimization rules. Because the query processor is still not fully integrated, in this release the original processor is still the default choice, though you can enable the Blink processor.

Elsewhere, the State Processor API is now fully available and can be used to read and write savepoints with Flink DataSet jobs. Finally, Flink 1.9 includes a reworked WebUI and previews of Flink’s new Python Table API and its integration with the Apache Hive ecosystem.

flinklogo

More Information

Flink website

Apache Flink 1.5.0 Adds Support For Broadcast State

Flink Gets Event-time Streaming

FLink Reaches Top Level Status

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

IBM Launches Granite Version 4.0 and Granite-Docling
23/10/2025

IBM has launched Granite 4.0, the next generation of open-source, small but efficient, IBM language models, together with Granite-Docling, the next gen document format converter.

+ Full Story

Arduino UNO Q Takes On Raspberry Pi
08/10/2025

Arduino has just been taken over by Qualcomm, a company generally known for its many patent disputes as well as its ARM processors. More importantly, a new Arduino has just been announced that could b [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

More Information

Related Articles

Comments