Apache Samza Improves State Management
Written by Kay Ewbank   
Friday, 27 March 2020

There's a new version of Apache Samza, an open source framework for developing and running stream processing jobs. Samza was originally developed alongside Kafka by LinkedIn before being made open source and taken into the Apache Software Foundation fold.

The new version has improvements for better management and monitoring of local state, as well as improvements to the Samza SQL API and a new system producer for Azure blob storage.

Samza is designed to be usable by non-programmers as well as developers. It uses Apache Kafka for messaging, and Apache Hadoop YARN for fault tolerance, processor isolation, security, and resource management.  It has support for local state via a RocksDB store that allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD.

apache

The state management improvements to the new release start with the addition of KV metrics to track the maximum serialized value size written to RocksDB. This can be used to estimate the best message size limit to set for Kafka-backed stores.A problem whereby Samza rocksdb metrics have not emitted values since Samza 1.1 has also been fixed. Another improvement adds a null-check before incrementing the bytesSerialized metrics. Until now this has led to code failing with an null pointer exception.

Handling of job metadata has also been improved to overcome a problem whereby the time taken by the application master to save job metadata was overlong when a remote server was under heavy load. The way the job model manager carries out this task has been altered so that rather than flushing for every message, it now uses a batch put method.

 The improvements to Samza SQL mean it will now handle sql statements with trailing semi-colon, and will support subqueries in joins. It will also handle udf original names better, and validate the argument types in SamzaSQL UDF at the execution planning phase.

apache

More Information

Samza Site

Related Articles

Apache Samza Adds SQL

Apache Bigtop Adds OpenJDK 8 Support 

Apache Fluo Improves Spark Integration

Kafka 1 Becomes More Tolerant

Comparing Kafka To RabbitMQ

Apache Kafka Adds New Streams API

GoKa Stream Processing For Kafka

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.

Banner


Your Next Robot Maybe A Collection Of Balls
08/11/2020

This is fun. Take a steel ball and place a small two-wheel drive mechanism which sticks to the inside of the ball with the help of a magnet. Now make a few of them and you have a reconfigurable swarm. [ ... ]



Foojay - All About Java and the OpenJDK
02/11/2020

Tracking the OpenJDK is not an easy feat. It evolves rapidly under a release cycle of a new version every 6 months, hence there's hoards of new features, changes and bug fixes.This is where foojay ste [ ... ]


More News

{laodposition comment}

Last Updated ( Saturday, 04 April 2020 )