Twitter plans to open source the real-time data processing technology that formed part of its recent acquisition of BackType.
The news was released in a blog post by Twitter's Nathan Marz, where he also gave some ideas about how the platform might be used, including stream processing, continuous computation, and distributed RPC (remote procedure calls).
Marz says the code will be released in September at the Strange Loop developer conference.
According to Marz, when used for stream processing Storm offers the advantage of being fault-tolerant and scalable, and it can be used to process a stream of new data and update databases in realtime.
When used for continuous computation, Storm can do a continuous query and stream the results to clients in realtime, as in the example of streaming trending topics on Twitter into browsers.
Finally, when used for distributed RPC, Storm can be used on an intensive query to run it in parallel on the fly. For this use, Storm is set up as a distributed function that waits for invocation messages. When it receives an invocation, it computes the query and sends back the results.
Marz points out that a Storm cluster is superficially similar to a Hadoop cluster, but whereas on Hadoop you run "MapReduce jobs", on Storm you run "topologies". "Jobs" and "topologies" themselves are very different -- one key difference is that a MapReduce job eventually finishes, whereas a topology processes messages forever (or until you kill it).
The blog post goes on in detail about how to use Storm, and is well worth reading.
If you would like to be informed about new articles on I Programmer you can either follow us on Twitter or Facebook or you can subscribe to our weekly newsletter.