Storm Reaches 1.0
Storm Reaches 1.0
Written by Kay Ewbank   
Monday, 18 April 2016

The new version of Apache Storm has improved performance, new APIs, and state management among a long list of other new features and improvements. 

Apache Storm is a distributed real-time analytics system for Hadoop that can be used for processing large volumes of rapidly changing data.

Storm 1.0 is up to 16 times faster than previous versions, with latency reduced up to 60%. The developers say that most users can expect the new version to run three times faster.

An addition to the new version is Storm Pacemaker. This is an optional Storm daemon designed to process heartbeats from workers, providing an alternative to ZooKeeper. As Storm is scaled up, ZooKeeper begins to become a bottleneck due to high volumes of writes from workers doing heartbeats. Pacemaker is an in-memory store so avoids excessive network traffic attempting to maintain consistency. It functions as a simple in-memory key/value store with ZooKeeper-like, directory-style keys and byte array values.

The first of the API improvements is a distributed cache API. This can be used to share files (BLOBs) among topologies. Storm 1.0 comes with two versions of the distributed cache API: One backed by the local file system on Supervisor nodes, and one backed by Apache Hadoop HDFS.

There's also a new native streaming Window API. Previous versions of Storm relied on developers to build their own windowing logic. There were no recommended or high level abstractions that you could use to define a Window in a standard way in a Topology. The new version has a native windowing API that lets you specify Windows with two parameters - the length or duration of the window, and the interval at which the window slides. Window based computations are common in stream processing, where the unbounded stream of data is split into finite sets based on some criteria (e.g. time) and a computation is applied on each group of events.

Another improvement in this release is HA Nimbus. While the Storm Nimbus service can be lost without affecting running topologies, losing the Nimbus node does degrade the ability to deploy new topologies and reassign work across a cluster. To overcome this problem, support has been added for an HA Nimbus. Multiple instances of the Nimbus service run in a cluster and perform leader election when a Nimbus node fails, and Nimbus hosts can join or leave the cluster at any time.

Other improvements include a stateful bolt API with automatic checkpointing; an automatic backpressure mechanism that can be used to throttle the input to a topology; and a Resource Aware Scheduler that takes into account both the memory and CPU resources available in a cluster.
storm

More Information

Storm Release Announcement

Download Link

Related Articles

Apache Storm On Azure

Apache Storm Added To Hadoop On Azure 

New Azure Services Announced 

Machine Learning Goes Azure - Azure ML Announced 

HDInsight - Brings Apache Hadoop to Windows 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter,subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin

 

Banner


New Robotics Badges for Girl Scouts
05/08/2017

What you do as a Girl Scout stays with you for life. So it's great to see that Think Like a Programmer Awards are being introduced for Girl Scout Daisies, Brownies and Juniors, along with badges for r [ ... ]



Microsoft Develops Tool To Repair Code
14/08/2017

Researchers from Microsoft, Peking University, and University of Electronic Science and Technology of China have developd a system that will automatically repair defects in software systems without hu [ ... ]


More News

 

 
 

 

blog comments powered by Disqus

Last Updated ( Monday, 18 April 2016 )
 
 

   
Banner
RSS feed of news items only
I Programmer News
Copyright © 2017 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.