Apache Beam Moves To Java 8 |
Written by Kay Ewbank |
Wednesday, 28 February 2018 |
Apache Beam, the open source programming SDK for defining batch and streaming data-parallel processing pipelines, is now available in a new version that moves to Java 8 and Spark 2.x.
Apache Beam has an number of Beam SDKs that you can use to build a program that defines a pipeline. This is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Beam began life at Google, and is used as the Google Cloud Dataflow (GCD) service. Beam uses the same API as GCD. The latest version now uses Java 8 as its supported Java version, and the code and examples in Beam have been reworked to take advantages of the improvements in Java 8 such as lambdas, streams, and improved type inference. Beam's Spark runner has also been updated to the Spark 2.x development line to improve performance and for future compatibility with the Structured Streaming APIs. The Beam Pipeline Runners translate the data processing pipeline you define with your Beam program into the API compatible with the distributed processing back-end of your choice. The support for AWS S3 has also been improved. In previous versions, AWS S3 was supported via the HadoopFileSystem, but the new release adds native support for S3, so improving performance. The final improvement of note is the addition of the Splittable DoFn API for the Python SDK, and Splittable DoFn support for the Python streaming DirectRunner.
DoFn is a Beam SDK class that defines a distributed processing function. The DoFn object contains the processing logic that gets applied to the elements in the input collection. It processes one element at a time. Splittable DoFn is a generalization of DoFn that can be used to develop more powerful IO connectors than before, with shorter, simpler, more reusable code. More InformationRelated ArticlesApache Beam Moves To Top Level Flink Gets Event-time Streaming Google Announces Big Data the Cloud Way
To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
Last Updated ( Wednesday, 28 February 2018 ) |