Kafka: The Definitive Guide

Author: Neha Narkhede, Gwen Shapira and Todd Palino
Publisher: O'Reilly
Pages: 322
ISBN: 978-1491936160
Print: 1491936169
Kindle: B0758ZYVVN
Audience: Developers using Kafka
Rating: 4.5
Reviewer: Kay Ewbank

Kafka is increasingly popular for moving large amounts of streaming data. This guide, subtitled Real-time data and stream processing at scale, has been written to show how the people who built Kafka control and use it. 

The authors are from Confluent and LinkedIn, and were among the team responsible for developing Kafka. They say that they wrote the book from the perspective of asking 'what are the most useful things we can share with new users to take them from beginner to expert'.



The book has some parts that are aimed at developers, others that are more useful for administrators of Kafka. It opens with a general introduction to Kafka and what it does, followed by a chapter on installing Kafka.

Having got those openers out of the way, the authors get into the heart of the book, beginning with a chapter on Kafka Producers and how to write messages to Kafka. Next comes a chapter on Kafka Consumers, and how to read data from Kafka. Both chapters have plenty of code snippets that illustrate the concepts being discussed. The samples are there to show the concepts rather than being full programs that you could copy and paste to produce a program you could run.

A chapter on Kafka Internals is next, looking at how Kafka replication works, how it handles requests from producers and consumers, and how it deals with message storage. There are explanations of how Kafka handles replication and partitions. All these topics are explained with the idea of giving a better understanding of why Kafka behaves in certain ways in certain situations.


The next chapter is titled Reliable Data Delivery, and looks at reliability guarantees and how to configure brokers.A chapter on building data pipelines comes next, starting with what to think about when building a pipeline, then going on to an introduction to Kafka Connect, with examples on connectors between a file source and a file sink, and between MySQL and ElasticSearch. There's also a discussion of alternatives to Connect.

Cross-cluster data mirroring is the next topic to be considered. The rest of the book concentrates on single Kafka cluster use, but this chapter shows how to handle the situation where you need to copy data between clusters using Kafka's MirrorMaker cross-cluster data replicator, including configuring and tuning it.

A chapter on administering Kafka is next, mainly looking at Kafka's command line utilities that you can use for basic cluster administration. However, as the authors point out, there are better third party tools available on the Kafka website. This chapter is followed by a look on how to monitor a Kafka cluster using the Java Management Extension (JMX) interface. The authors discuss the different metrics, which are the critical ones to monitor all the time, and what you should do in response to different results. They also look at which metrics are useful when debugging problems.

The final chapter looks at stream processing and how Kafka Streams works. This is Kafka's stream-processing library, and the authors show how to use it to build a topology and use it. The chapter ends with some stream processing use cases.

Overall, I found this book to be clearly written and it gave me a good explanation of what Kafka is capable of. The code samples illustrated the points well, and the authors obviously have a detailed knowledge of everything about Kafka. The one drawback of this is that sometimes it led to them giving a much shorter explanation of a point or concept where I'd have preferred a slower, more detailed description. That's still a minor point, and if you need to learn about Kafka, this is a very good book.


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.


Securing SQL Server

Author: Peter Carter
Publisher: Apress
ISBN: 978-1484241608
Print: 1484241606
Kindle: B07KLW99YM
Audience: DBAs
Rating: 5
Reviewer: Kay Ewbank

As a developer, you're probably well versed in how to write a secure app that won't be vulnerable to attack, but the database component is a whole dif [ ... ]

The Logician and the Engineer

Author: Paul J. Nahin
Publisher: Princeton University Press
Pages: 244
ISBN: 978-0691151007
Print: 0691151008
Kindle: B0091XBUTM
Audience: Electronics enthusiasts interested in the origins of computing
Rating:  4
Reviewer: Harry Fairhead

George Boole and Claude S [ ... ]

More Reviews