|Apache Flink 1.9 Adds New Query Engine|
|Written by Kay Ewbank|
|Tuesday, 27 August 2019|
Significant features on this path are batch-style recovery for batch jobs and a preview of the new Blink-based query engine for Table API and SQL queries.
Apache Flink is an open source platform for distributed stream and batch data processing. It consists of a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink includes several APIs, including the DataSet API for static data embedded in Java, Scala, and Python; the DataStream API for unbounded streams embedded in Java and Scala; the Table API with a SQL-like expression language embedded in Java and Scala; and the streaming SQL API that enables SQL queries to be executed on streaming and batch tables, with a syntax is based on Apache Calcite.
The main improvements to the new version start with the addition of batch-style recovery for batch jobs, and a preview of the new Blink-based query engine for Table API and SQL queries.
The new batch-style recovery has significantly reduced the time to recover a batch job from a task failure. This covers DataSet, Table API and SQL jobs. Until this version, if a task failed, the recovery of a batch job involved canceling all tasks and restarting the whole job, voiding all progress. You can now configure Flink to limit the recovery to only those tasks that are in the same failover region, the set of tasks that are connected via pipelined data exchanges.
The Blink-based query engine preview is a development of the donation of Blink to Apache Flink. Blink’s query optimizer and runtime for the Table API and SQL have been integrated into Flink, and the query planner has been extended so that there are now two choices of pluggable query processors to execute Table API and SQL statements: the pre-1.9 Flink processor and the new Blink-based query processor. The Blink-based query processor offers better SQL coverage and improved performance for batch queries because it has more extensive query optimization including cost-based plan selection and more optimization rules. Because the query processor is still not fully integrated, in this release the original processor is still the default choice, though you can enable the Blink processor.
Elsewhere, the State Processor API is now fully available and can be used to read and write savepoints with Flink DataSet jobs. Finally, Flink 1.9 includes a reworked WebUI and previews of Flink’s new Python Table API and its integration with the Apache Hive ecosystem.
or email your comment to: firstname.lastname@example.org