Apache Flink ML 2.0 Released
Written by Kay Ewbank   
Thursday, 27 January 2022

Flink ML 2.0.0 has been released. Flink ML is a library that provides APIs and infrastructure for building stream-batch unified machine learning algorithms, that can be easy-to-use and performant with (near-) real-time latency.

Apache Flink is an open source platform for distributed stream and batch data processing, with a streaming dataflow engine for data distribution and distributed computations over data streams.


The updated version of Flink ML is described as a major refactor of the earlier Flink ML library with major new features that extend the Flink ML API and the iteration runtime, such as supporting stages with multi-input multi-output, graph-based stage composition, and a new stream-batch unified iteration library.

The developers have also added five algorithm implementations in this release, which is the start of a long-term initiative to provide a large number of off-the-shelf algorithms in Flink ML.

The new support for stages requiring multi-input multi-output means that algorithm developers can assemble a machine learning workflow as a directed acyclic graph (DAG) of pre-defined stages. This workflow can then be configured and deployed without users knowing the implementation details of this graph. This improvement could considerably expand the applicability and usability of Flink ML.

The next improvement is the addition of support for online learning with APIs exposing model data. The support has been added to handle situations where there's a long-running job that keeps processing training data and updating a machine learning model. The traditional Estimator/Transformer paradigm does not provide APIs to expose this model data in a streaming manner, meaning users have to repeatedly call fit() to update model data, which is very inefficient. The new release means model data can be exposed as an unbounded stream, and algorithm users can then transfer the model data to web servers in real-time and use the up-to-date model data to do online inference.

Other improvements include simpler parameter handling for algorithms, and new tools for composing DAG of stages into a new stage. There's also a new stream-batch unified iteration library that provides the function of transmitting records back to the precedent operators and the ability to track the progress of rounds inside the iteration.

Flink ML 2.0 is available now.


More Information

Flink website

Related Articles

Apache Flink 1.9 Adds New Query Engine

Apache Flink 1.5.0 Adds Support For Broadcast State

Flink Gets Event-time Streaming

FLink Reaches Top Level Status



To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Liberica Alpaquita Containers Now Come With CRaC

Bellsoft has added CRaC support to its ready-to-use Alpaquita container images. This will enable developers to seamlessly integrate CRaC into their projects for performant Java in the Cloud.

Google Releases Vertex AI Agent Builder

Google has launched Vertex AI Agent Builder, alongside new open-source language models for the Vertex AI platform. The announcements were made at the Google Cloud Next 2024 event.

More News

raspberry pi books



or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 27 January 2022 )