|Apache Flink ML 2.0 Released|
|Written by Kay Ewbank|
|Thursday, 27 January 2022|
Flink ML 2.0.0 has been released. Flink ML is a library that provides APIs and infrastructure for building stream-batch unified machine learning algorithms, that can be easy-to-use and performant with (near-) real-time latency.
Apache Flink is an open source platform for distributed stream and batch data processing, with a streaming dataflow engine for data distribution and distributed computations over data streams.
The updated version of Flink ML is described as a major refactor of the earlier Flink ML library with major new features that extend the Flink ML API and the iteration runtime, such as supporting stages with multi-input multi-output, graph-based stage composition, and a new stream-batch unified iteration library.
The developers have also added five algorithm implementations in this release, which is the start of a long-term initiative to provide a large number of off-the-shelf algorithms in Flink ML.
The new support for stages requiring multi-input multi-output means that algorithm developers can assemble a machine learning workflow as a directed acyclic graph (DAG) of pre-defined stages. This workflow can then be configured and deployed without users knowing the implementation details of this graph. This improvement could considerably expand the applicability and usability of Flink ML.
The next improvement is the addition of support for online learning with APIs exposing model data. The support has been added to handle situations where there's a long-running job that keeps processing training data and updating a machine learning model. The traditional Estimator/Transformer paradigm does not provide APIs to expose this model data in a streaming manner, meaning users have to repeatedly call fit() to update model data, which is very inefficient. The new release means model data can be exposed as an unbounded stream, and algorithm users can then transfer the model data to web servers in real-time and use the up-to-date model data to do online inference.
Other improvements include simpler parameter handling for algorithms, and new tools for composing DAG of stages into a new stage. There's also a new stream-batch unified iteration library that provides the function of transmitting records back to the precedent operators and the ability to track the progress of rounds inside the iteration.
Flink ML 2.0 is available now.
or email your comment to: email@example.com
|Last Updated ( Thursday, 27 January 2022 )|