Apache Beam Moves To Java 8
Apache Beam Moves To Java 8
Written by Kay Ewbank   
Wednesday, 28 February 2018

Apache Beam, the open source programming SDK for defining batch and streaming data-parallel processing pipelines, is now available in a new version that moves to Java 8 and Spark 2.x.


Apache Beam has an number of Beam SDKs that you can use to build a program that defines a pipeline. This is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Beam began life at Google, and is used as the Google Cloud Dataflow (GCD) service. Beam uses the same API as GCD.

The latest version now uses Java 8 as its supported Java version, and the code and examples in Beam have been reworked to take advantages of the improvements in Java 8 such as lambdas, streams, and improved type inference.

Beam's Spark runner has also been updated to the Spark 2.x development line to improve performance and for future compatibility with the Structured Streaming APIs. The Beam Pipeline Runners translate the data processing pipeline you define with your Beam program into the API compatible with the distributed processing back-end of your choice.

The support for AWS S3 has also been improved. In previous versions, AWS S3 was supported via the HadoopFileSystem, but the new release adds native support for S3, so improving performance.

The final improvement of note is the addition of the Splittable DoFn API for the Python SDK,  and Splittable DoFn support for the Python streaming DirectRunner.

Splittable DoFn Example


DoFn is a Beam SDK class that defines a distributed processing function. The DoFn object contains the processing logic that gets applied to the elements in the input collection. It processes one element at a time. Splittable DoFn is a generalization of DoFn that can be used to develop more powerful IO connectors than before, with shorter, simpler, more reusable code.


More Information

Beam Website

Related Articles

Apache Beam Moves To Top Level

Apache Spark 2.0 Released

Flink Gets Event-time Streaming

Google Announces Big Data the Cloud Way


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin.


Excel Gets JavaScript

The latest news is a little, a very little, and very very late. Microsoft has announced that Excel will soon have JavaScript support but only for functions. The spreadsheet is still without a clear wa [ ... ]

Serialization Will Go From Java - Sometime

Mark Reinhold has told attendees of the DevoxxUK conference that Oracle does have a long term goal to remove serialization from the language, but just when is less clear.

More News





or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 28 February 2018 )

RSS feed of news items only
I Programmer News
Copyright © 2018 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.