Kafka: The Definitive Guide

Author: Neha Narkhede, Gwen Shapira and Todd Palino
Publisher: O'Reilly
Pages: 322
ISBN: 978-1491936160
Print: 1491936169
Kindle: B0758ZYVVN
Audience: Developers using Kafka
Rating: 4.5
Reviewer: Kay Ewbank

Kafka is increasingly popular for moving large amounts of streaming data. This guide, subtitled Real-time data and stream processing at scale, has been written to show how the people who built Kafka control and use it.

The authors are from Confluent and LinkedIn, and were among the team responsible for developing Kafka. They say that they wrote the book from the perspective of asking 'what are the most useful things we can share with new users to take them from beginner to expert'.

The book has some parts that are aimed at developers, others that are more useful for administrators of Kafka. It opens with a general introduction to Kafka and what it does, followed by a chapter on installing Kafka.

Having got those openers out of the way, the authors get into the heart of the book, beginning with a chapter on Kafka Producers and how to write messages to Kafka. Next comes a chapter on Kafka Consumers, and how to read data from Kafka. Both chapters have plenty of code snippets that illustrate the concepts being discussed. The samples are there to show the concepts rather than being full programs that you could copy and paste to produce a program you could run.

A chapter on Kafka Internals is next, looking at how Kafka replication works, how it handles requests from producers and consumers, and how it deals with message storage. There are explanations of how Kafka handles replication and partitions. All these topics are explained with the idea of giving a better understanding of why Kafka behaves in certain ways in certain situations.

The next chapter is titled Reliable Data Delivery, and looks at reliability guarantees and how to configure brokers.A chapter on building data pipelines comes next, starting with what to think about when building a pipeline, then going on to an introduction to Kafka Connect, with examples on connectors between a file source and a file sink, and between MySQL and ElasticSearch. There's also a discussion of alternatives to Connect.

Cross-cluster data mirroring is the next topic to be considered. The rest of the book concentrates on single Kafka cluster use, but this chapter shows how to handle the situation where you need to copy data between clusters using Kafka's MirrorMaker cross-cluster data replicator, including configuring and tuning it.

A chapter on administering Kafka is next, mainly looking at Kafka's command line utilities that you can use for basic cluster administration. However, as the authors point out, there are better third party tools available on the Kafka website. This chapter is followed by a look on how to monitor a Kafka cluster using the Java Management Extension (JMX) interface. The authors discuss the different metrics, which are the critical ones to monitor all the time, and what you should do in response to different results. They also look at which metrics are useful when debugging problems.

The final chapter looks at stream processing and how Kafka Streams works. This is Kafka's stream-processing library, and the authors show how to use it to build a topology and use it. The chapter ends with some stream processing use cases.

Overall, I found this book to be clearly written and it gave me a good explanation of what Kafka is capable of. The code samples illustrated the points well, and the authors obviously have a detailed knowledge of everything about Kafka. The one drawback of this is that sometimes it led to them giving a much shorter explanation of a point or concept where I'd have preferred a slower, more detailed description. That's still a minor point, and if you need to learn about Kafka, this is a very good book.

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Oracle PL/SQL By Example, 6th Ed

Author: Elena Rakhimov
Publisher: Oracle Press
Pages: 480
ISBN: 978-0138062835
Print: 0138062838
Audience: Developers interested in Oracle PL/SQL
Rating: 4
Reviewer: Kay Ewbank

This is the sixth edition of a well established title that has been updated for the latest version of PL/SQL (21c).

+ Full Review

Object-Oriented Python

Author: Irv Kalb
Publisher: No Starch Press
Date: January 2022
Pages: 416
ISBN: 978-1718502062
Print: 1718502060
Kindle: ‎ B0957SHYQL
Audience: Python developers
Rating: 3
Reviewer: Mike James
Python, Object-Oriented? Not a lot of programmers know that!

+ Full Review

More Reviews

Last Updated ( Saturday, 28 November 2020 )

Recent Articles

Recent Book Reviews

Popular Articles