Apache Samza Adds SQL
Apache Samza Adds SQL
Written by Kay Ewbank   
Monday, 08 January 2018

There's a new version of Apache Samza that adds Samza SQL and both Azure EventHubs and AWS Kinesis. Samza is an open source framework originally developed alongside Kafka by LinkedIn before being made open source and taken over by the Apache Software Foundation.

The idea behind Samza is to provide a simple way to develop and run stream processing jobs that can be used by non-programmers as well as developers. Samza uses Apache Kafka for messaging, and Apache Hadoop YARN for fault tolerance, processor isolation, security, and resource management.  It has support for local state via a RocksDB store that allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD.

Samza has a simple callback-based “process message” API comparable to MapReduce. It supports managed state via snapshotting and restoration of a stream processor’s state. When a processor is restarted, Samza restores its state to a consistent snapshot. It also provides fault tolerance by working with YARN to transparently migrate tasks to another machine if the active machine in the cluster fails. Kafka is used to process messages in the order they were written to a partition, so that no messages are ever lost.

Samza is partitioned and distributed at every level. It has a pluggable API that means it can be run with other messaging systems and in other execution environments, though it is designed to work out of the box with Kafka and YARN. Samza is written in Scala and Java.

The new release of Samza adds three main new features. The first is Samza SQL. This is a high level API that is designed to expand the target audience for stream processing to make it accessible to anyone who can write SQL. The developers say Samza SQL can be used to obtain quick real time insights,  and to quickly create stream processing applications. 

 

 

Samza SQL is based on Apache Calcite, an open source SQL language framework used by several Apache projects. The way Samza SQL works is that you write a normal SQL query, and the API deals with creating, configuring, and managing the pipeline.

The second improvement is an Azure EventHubs producer, consumer and checkpoint provider. An AWS Kinesis consumer has also been added. Other improvements include durable state in high-level API, Zookeeper-based deployment stability, and multi-stage batch processing.

 apache

More Information

Samza Site

Related Articles

Apache Bigtop Adds OpenJDK 8 Support 

Apache Fluo Improves Spark Integration

Kafka 1 Becomes More Tolerant

Comparing Kafka To RabbitMQ

Apache Kafka Adds New Streams API

GoKa Stream Processing For Kafka

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin.

 

Banner


MistyThe Robot For Programmers
13/01/2018

It is arguable that what we need is a sort of IBM PC of the robot world; a robot that we can use without worrying too much about hardware and just get on with programming. Misty is aiming to be just s [ ... ]



Microsoft Opens Door To Its AI School
29/12/2017

Microsoft has made a big push into AI over the last couple of years and has its Cognitive Toolkit and its Cognitive Services APIs available to all comers. Now it has launched AI School with resources  [ ... ]


More News

 

 
 

 

blog comments powered by Disqus

 
 

   
Banner
RSS feed of news items only
I Programmer News
Copyright © 2018 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.