|Apache Flink 1.5.0 Adds Support For Broadcast State|
|Written by Kay Ewbank|
|Friday, 08 June 2018|
The latest version of Apache Flink has been released with a rewritten deployment and process model, and support for broadcast state.
Apache Flink is an open source platform for distributed stream and batch data processing. It consists of a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink includes several APIs, including the DataSet API for static data embedded in Java, Scala, and Python; the DataStream API for unbounded streams embedded in Java and Scala; the Table API with a SQL-like expression language embedded in Java and Scala; and the streaming SQL API that enables SQL queries to be executed on streaming and batch tables, with a syntax is based on Apache Calcite.
One addition to the latest release is a new SQL CLI client that is the first step to adding a service to execute streaming and batch SQL queries. The client provides a SQL shell to run exploratory queries on data streams.
The developers have redesigned and reimplemented large parts of Flink’s process model, and say that while there is still work to do on this reworking, the changes already implemented enable more natural Kubernetes deployments, and mean that all requests to the JobManager now happen through REST. The improvements also add support for dynamic resource allocation and dynamic release of resources on YARN and Mesos schedulers for better resource utilization. In a later version it will be possible to dockerize jobs and deploy them in a natural way as part of the container deployment.
The new support for broadcast state is something that developers using Flink have requested. Broadcast state is replicated across all parallel instances of a function, and might typically be used where you have two streams, a regular data stream alongside a control stream that serves rules, patterns, or other configuration messages. Changes to the processing of the regular stream are configured by the messages of the control stream. By broadcasting rules or patterns to all parallel instances of a function, they can be applied to all events of the regular stream.
The next improvement is aimed at improving the efficiency of failure recovery. While in normal use, Flink writes copies of an application’s state to a remote, persistent storage and loads it back in case of a failure. This ensures state information is not lost, but recovering the state takes time. The new feature, task-local state recovery, adds the ability to also keep a copy on the local disk of each machine of the application’s state, so recovery is faster.
Other improvements include extended join support for SQL and Table API with the addition of support for windowed outer equi-joins; and improved reading and writing JSON messages from and to connectors.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Friday, 08 June 2018 )|