|Apache Kafka 0.8.2 Now Ready For You To Trial|
|Written by Kay Ewbank|
|Friday, 06 February 2015|
OK, we got the Kafka joke out of the way in the headline. So what is Kafka and why does it need a manager?
Apache Kafka 0.8.2 has been released, alongside news that Yahoo is making Kafka Manager available as an open source version.
Kafka is a fault-tolerant, low-latency, high-throughput distributed messaging system that is used in data pipelines at several companies. It was originally created at LinkedIn, where it forms a critical part of LinkedIn’s infrastructure and transmits data to all systems and applications. It is a top-level Apache project and is currently under active development.
The new version of Kafka has a new Java producer client. Until now, Kafka’s JVM clients haven’t changed much since the original release, and alongside the new producer Java client, the team is rewriting the consumer client. The new producer removes the distinction between the “sync” and “async” producer. Writing about the new version on the Confluent blog Neha Narkhede said that
“effectively, all requests are sent asynchronously but always return a future response object that returns the offset as well as any error that may have occurred when the request is complete.”
The post adds that batching is now done whenever possible, meaning that the sync producer, under load, can get performance as good as the async producer. The batching means that any messages that arrive while a send is in progress are batched together.
The new version has yet another version of a delete topic feature. This is the third attempt at getting a working delete topic feature. Nerkhede says
“You would think deleting the data would be easy and not deleting it would be the hard part. But each of our previous attempts had flaws that appeared with usage. This time significant testing has been done and we are hopeful that your data will really be gone. “
A major improvement to the new version is in offset management. Kafka works out which messages have been consumed using offsets, markers it stores for the consumer’s position in the log of messages. Until now such markers were stored by the JVM client in ZooKeeper. When consumers want to update their position frequently, this can become a bottleneck, so the developers have added native offset storage functionality and a new API for making use of this. This makes offset storage horizontally scalable.
Other improvements include automated leader rebalancing, controlled shutdown, and stronger durability guarantees.
Alongside the release of Kafka 0.8.2, Yahoo is making Kafka Manager available in an open source version. Writing about the release on the Yahoo Engineering blog, Hiral P says that Kafka is used by many teams across Yahoo, including for the real-time analytics pipeline, where the Kafka cluster handles a peak bandwidth of more than 20Gbps (of compressed data).
The developers at Yahoo created a web-based tool called Kafka Manager to make it easier to maintain the Kafka clusters. The interface makes it easier to identify topics which are unevenly distributed across the cluster or have partition leaders unevenly distributed across the cluster.
It supports management of multiple clusters, preferred replica election, replica re-assignment, and topic creation. The manager was built with Scala, and the web console is based on the Play Framework which interacts with an actor based in-memory model built with Akka and Apache Curator. The Yahoo developers have also ported some utilities from Apache Kafka to work with the Apache Curator framework as well. Yahoo uses Curator to inspect the state of the cluster from Zookeeper.
The management utility has now been made available as an open source version on GitHub.
To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin, or sign up for our weekly newsletter.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Friday, 06 March 2015 )|