Apache Flink ML 2.0 Released
Written by Kay Ewbank   
Thursday, 27 January 2022

Flink ML 2.0.0 has been released. Flink ML is a library that provides APIs and infrastructure for building stream-batch unified machine learning algorithms, that can be easy-to-use and performant with (near-) real-time latency.

Apache Flink is an open source platform for distributed stream and batch data processing, with a streaming dataflow engine for data distribution and distributed computations over data streams.


The updated version of Flink ML is described as a major refactor of the earlier Flink ML library with major new features that extend the Flink ML API and the iteration runtime, such as supporting stages with multi-input multi-output, graph-based stage composition, and a new stream-batch unified iteration library.

The developers have also added five algorithm implementations in this release, which is the start of a long-term initiative to provide a large number of off-the-shelf algorithms in Flink ML.

The new support for stages requiring multi-input multi-output means that algorithm developers can assemble a machine learning workflow as a directed acyclic graph (DAG) of pre-defined stages. This workflow can then be configured and deployed without users knowing the implementation details of this graph. This improvement could considerably expand the applicability and usability of Flink ML.

The next improvement is the addition of support for online learning with APIs exposing model data. The support has been added to handle situations where there's a long-running job that keeps processing training data and updating a machine learning model. The traditional Estimator/Transformer paradigm does not provide APIs to expose this model data in a streaming manner, meaning users have to repeatedly call fit() to update model data, which is very inefficient. The new release means model data can be exposed as an unbounded stream, and algorithm users can then transfer the model data to web servers in real-time and use the up-to-date model data to do online inference.

Other improvements include simpler parameter handling for algorithms, and new tools for composing DAG of stages into a new stage. There's also a new stream-batch unified iteration library that provides the function of transmitting records back to the precedent operators and the ability to track the progress of rounds inside the iteration.

Flink ML 2.0 is available now.


More Information

Flink website

Related Articles

Apache Flink 1.9 Adds New Query Engine

Apache Flink 1.5.0 Adds Support For Broadcast State

Flink Gets Event-time Streaming

FLink Reaches Top Level Status



To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Visual Studio C++ And Colored Braces

The preview release of Visual Studio 17.5 has been announced with a number of improvements for developers editing C++ code, plus spell-checking for C#, C++, and Markdown files but the one that made us [ ... ]

Microsoft Graph Developer Proxy In Preview

Microsoft has updated the Graph Developer Proxy, which it says can be used to build resilient and performant apps. The updated version adds support for simulating errors on Microsoft Graph and other A [ ... ]

More News





or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 27 January 2022 )