|Understanding Software Dynamics (Addison-Wesley)|
|Written by Kay Ewbank|
Author: Richard L. Sites
This book looks at the different reasons why software runs too slowly, and what developers can do about it, starting by looking at how to measure the problem, then observing what's going wrong, and finally how to fix it.
The first seven chapters of the book examine in detail how to measure the problem, starting with an overview of 'my program is too slow', then looking at ways to measure the use of CPU, memory, CPU and memory interaction, disks and SSDs, networks, and disk and network interaction.
Part two of the book is titled 'observation', and Sites looks at tools and techniques that you can use to measure and quantify the problems a program is having. There's a good chapter on logging tools, and other useful examinations of how to interpret aggregate measures, dashboards showing multiple real-time information, and tracing tools.
Part three moves on to how to build and use a Linux-based kernel-user trace tool to find out how a server is using CPU core, and how to use this to identify where problems such as program interactions are happening. This is a detailed description that goes into the kernel patches to construct, the Linux loadable module, how to control it at runtime and carry out post processing.
The final part of the book is titled 'Reasoning', and here Sites looks at how to use everything you've learned so far to find, understand and fix problems with your code. The section starts with an overview of what to look for, then goes into a number of case studies of specific problems and what they look like to the monitoring tools. The case studies include CPU-bound user-mode execution, code that runs normally on some runs then slowly another time, and multi-threaded programs that have problems waiting for a CPU to be assigned to some threads. Other case studies cover apps that use lots of memory and trigger paging to disk, apps that write multiple megabytes to disk, network-limited apps, code that waits for software locks, and problems caused by time delays and queuing delays.
The case study chapters are fascinating reads, with suggestions for multiple experiments to narrow down where the problems lie, and what the data teaches.
This is a really good book. Richard Sites has been coding since 1959 and is a member of the US National Academy of Engineering. He's carried out code tracing and analysis at DEC, Adobe, Google and Tesla, and throughout the book you feel this is him passing on his hard earned knowledge in a really understandable way. Highly recommended.