Apache Flink 1.9 Adds New Query Engine
Written by Kay Ewbank   
Tuesday, 27 August 2019

Significant features on this path are batch-style recovery for batch jobs and a preview of the new Blink-based query engine for Table API and SQL queries.

Apache Flink is an open source platform for distributed stream and batch data processing. It consists of a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink includes several APIs, including the DataSet API for static data embedded in Java, Scala, and Python; the DataStream API for unbounded streams embedded in Java and Scala; the Table API with a SQL-like expression language embedded in Java and Scala; and the streaming SQL API that enables SQL queries to be executed on streaming and batch tables, with a syntax is based on Apache Calcite.

 

flinklogo

 

The main improvements to the new version start with the addition of batch-style recovery for batch jobs, and a preview of the new Blink-based query engine for Table API and SQL queries.

The new batch-style recovery has significantly reduced the time to recover a batch job from a task failure. This covers DataSet, Table API and SQL jobs. Until this version, if a task failed, the recovery of a batch job involved canceling all tasks and restarting the whole job, voiding all progress. You can now configure Flink to limit the recovery to only those tasks that are in the same failover region, the set of tasks that are connected via pipelined data exchanges.

The Blink-based query engine preview is a development of the donation of Blink to Apache Flink. Blink’s query optimizer and runtime for the Table API and SQL have been integrated into Flink, and the query planner has been extended so that there are now two choices of pluggable query processors to execute Table API and SQL statements: the pre-1.9 Flink processor and the new Blink-based query processor. The Blink-based query processor offers better SQL coverage and improved performance for batch queries because it has more extensive query optimization including cost-based plan selection and more optimization rules. Because the query processor is still not fully integrated, in this release the original processor is still the default choice, though you can enable the Blink processor.

Elsewhere, the State Processor API is now fully available and can be used to read and write savepoints with Flink DataSet jobs. Finally, Flink 1.9 includes a reworked WebUI and previews of Flink’s new Python Table API and its integration with the Apache Hive ecosystem.

 

 

flinklogo

More Information

Flink website

Related Articles

Apache Flink 1.5.0 Adds Support For Broadcast State

Flink Gets Event-time Streaming

FLink Reaches Top Level Status

 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.

Banner


Python 3.8 Adds Walrus Operator
17/10/2019

The latest release of Python, 3.8, is available with many new features and optimizations. Notable improvements include a walrus operator and positional-only parameters.



Waltz Write Ahead Log Open Sourced
23/09/2019

A distributed write-ahead log has been open sourced by WePay. Waltz was originally designed to be the ledger of money transactions on the WePay system and has since been generalized to be suitable for [ ... ]


More News

graphics

 



 

Comments




or email your comment to: comments@i-programmer.info