Apache Samza Adds SQL
Apache Samza Adds SQL
Written by Kay Ewbank   
Monday, 08 January 2018

There's a new version of Apache Samza that adds Samza SQL and both Azure EventHubs and AWS Kinesis. Samza is an open source framework originally developed alongside Kafka by LinkedIn before being made open source and taken over by the Apache Software Foundation.

The idea behind Samza is to provide a simple way to develop and run stream processing jobs that can be used by non-programmers as well as developers. Samza uses Apache Kafka for messaging, and Apache Hadoop YARN for fault tolerance, processor isolation, security, and resource management.  It has support for local state via a RocksDB store that allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD.

Samza has a simple callback-based “process message” API comparable to MapReduce. It supports managed state via snapshotting and restoration of a stream processor’s state. When a processor is restarted, Samza restores its state to a consistent snapshot. It also provides fault tolerance by working with YARN to transparently migrate tasks to another machine if the active machine in the cluster fails. Kafka is used to process messages in the order they were written to a partition, so that no messages are ever lost.

Samza is partitioned and distributed at every level. It has a pluggable API that means it can be run with other messaging systems and in other execution environments, though it is designed to work out of the box with Kafka and YARN. Samza is written in Scala and Java.

The new release of Samza adds three main new features. The first is Samza SQL. This is a high level API that is designed to expand the target audience for stream processing to make it accessible to anyone who can write SQL. The developers say Samza SQL can be used to obtain quick real time insights,  and to quickly create stream processing applications. 

 

 

Samza SQL is based on Apache Calcite, an open source SQL language framework used by several Apache projects. The way Samza SQL works is that you write a normal SQL query, and the API deals with creating, configuring, and managing the pipeline.

The second improvement is an Azure EventHubs producer, consumer and checkpoint provider. An AWS Kinesis consumer has also been added. Other improvements include durable state in high-level API, Zookeeper-based deployment stability, and multi-stage batch processing.

 apache

More Information

Samza Site

Related Articles

Apache Bigtop Adds OpenJDK 8 Support 

Apache Fluo Improves Spark Integration

Kafka 1 Becomes More Tolerant

Comparing Kafka To RabbitMQ

Apache Kafka Adds New Streams API

GoKa Stream Processing For Kafka

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin.

 

Banner


GitHub Learning Labs Now Open For Newbies
19/04/2018

Once you know your way around GitHub its difficult to remember how it felt when it was an unfamiliar environment with its strange jargon of Commits, Issues, Pull Requests and so on. The newly launched [ ... ]



Natural Conversations for Microsoft Xiaolce Chatbot
05/04/2018

Microsoft has made another technological breakthrough towards enabling chatbots to have natural interaction with humans. Borrowing the term "full duplex" from telecommunications jargon, the new abilit [ ... ]


More News

 

 
 



Comments




or email your comment to: comments@i-programmer.info

 
 

   
RSS feed of news items only
I Programmer News
Copyright © 2018 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.