Page 2 of 3
Author: Christian Bolton, Rob Farley et al.
Audience: Troubleshooting DBAs and Developers
Reviewer: Ian Stirk
Chapter 5 Query processing and execution
What happens when you submit SQL? This is the topic of chapter 5. In overview, the SQL passes to a Parser (which ensure syntax is correct), an algebrizer (name resolution, binding etc), an optimizer (to process data efficiently), and finally it is executed. The chapter focuses on the optimization step, discussing parallel plans, statistics, plan caching and recompilation. Various hints are discussed as a method of influencing optimization. The various SQL plan operators are discuss, together with how to read a query plan. SQLOS is expanded upon in this section, detailing the use of memory nodes, CPU nodes, schedulers, workers and tasks.
Again, another interesting background chapter, but most of the information is available elsewhere. (4/5)
Chapter 6 Locking and concurrency
Locking is necessary to ensure data integrity, however, locking typically reduces concurrency. This chapter discusses the need for locking and how the different isolation levels can improve concurrency at the expense of an increasing number of possible data problems. The chapter starts with a look at the ACID properties of transactions. It moves on to look at problems relating to lost updates, dirty reads, non-repeatable reads, phantom reads, and the Halloween effect. Lock escalation is discussed together with the lock hierarchy, and the lock modes (S, U, X, Sch-S, Sch-M, IS, IU, IX, SIX, SIU, UIX). In the past the NOLOCK hint was often used on SQL statements to reduce the amount of locking/blocking to improve concurrency, in more recent times this has been replaced by the use of snapshot or read committed snapshot setting at the database level, which has the added bonus of reducing deadlocks too. There is some good sample code to illustrate the concepts discussed.
This chapter is definitely needed to understand how SQL Server works, and locking is often a major problem in systems, a well-written and detailed chapter. (5/5)
Chapter 7 Latches and spinlocks
Locks are needed to protect transactions, similarly latches are needed to protect memory when multiple users try to update the same data structures concurrently. A spinlock is similar to a latch but the memory isn’t available, so the spinlock keeps checking it for a while. Both are a result of large data volumes/throughputs on multicore processors, so may be seen increasingly in the future as data volumes increase. The symptoms of both latches and spinlocks are described, with latches the number of transactions per second falls as the number of latches rises, with spinlocks the CPU tends to rise exponentially as the transactions per second decreases. Code snippets are provided to measure the degree of latch and spinlock contention. There is an excellent example of latch contention using a clustered index key based on an identity field. Unlike locks, latches and spinlocks cannot be influenced by hints, luckily the chapter shows how good schema and query design can reduce their occurrence.
This chapter seems to belong to the troubleshooting section rather than internals. I also wonder, since it is currently a relatively niche concern, if it warrants inclusion in this book at all. The chapter is both interesting and innovative. It would have been nice to see a note about Microsoft’s forthcoming Hekaton technology that’s designed to eliminate latches and spinlocks. (4/5)
Chapter 8 Knowing Tempdb
As the recent versions of SQL Server have increased, so have SQL Server’s own uses of tempdb. Additionally, the application databases typically make extensive use of tempdb (e.g. for sorting). Tempdb is often the most active database on a server, so a problem with tempdb can be a problem for all the other databases. The chapter discusses the many uses of tempdb, and provides details on troubleshooting various common problems (i.e. latch contention, I/O performance, and space issues). Lastly various tempdb best practices are discussed. Some useful SQL is provided. This is an enjoyable chapter, discussing tempdb usage, monitoring, problems and solutions. (4/5)
Chapter 9 Troubleshooting methodology and practices
This chapter provides a practical and detailed 10 step approach to troubleshooting SQL Server problems. These steps are:
Define problem (and success criteria). Get logs, diagram of end to end. Isolate problem.
Ascertain problem’s impact (losing £10,000 per hour etc)
Engage correct resources
Identify potential causes
Plan/co-ordinate across teams
Select communication plan (managers/end users/tech team)
Identify root cause (various iterations, isolated, repeatable)
Test and implement
In many ways this chapter relates to soft-skills, but it does provide useful and comprehensive guidance on how to approach a problem. Various background considerations are given, together with a brief outline of tools used in subsequent chapters. If you don’t currently have a methodology in place (or your approach is ad-hoc), this set of steps may prove useful to you. (5/5)
Chapter 10 Viewing Server performance with PerfMon and the PAL tool
Now we get down to the nitty gritty. This and following two chapters work very well together (PerfMon, SQLDiag, and SQL Nexus) – collectively they show you how to monitor, collect and analyse performance metrics. Often you’ll want to go outside SQL Server itself and monitor the server from a windows perspective, looking at memory, CPU, disk, and network usage – these can highlight problem areas that can then be targeted for further investigation. The chapter provides an overview of the new SQL Server counters available in 2012, relating to availability replica, batch resp statistics, database replica, filetable, memory broker clerks, and memory nodes. An introduction to using PerfMon is given for the beginner (is this really necessary?!), together with factors that may need attention (e.g. sampling interval, performance impact etc). The chapter provides details of what counters to investigate (CPU, memory, disk, SQL Server), why they are important and what are problematic values (this could prove very useful).
Having collected various metrics, they now need to be interpreted, and this is where the tool PAL (Performance Analysis of Logs) is useful. PAL takes the PerfMon log as input, asks some questions about the server (e.g. amount of memory, number of CPUs etc), and then analyses the data in terms of expected values for that specification of server. The PAL output report highlights any alerts, and puts the counter values into context. The Relog tool is also introduced, this useful tool allows you to create other PerfMon logs with different sampling, counters and time periods.
This chapter starts as introductory, but covers a lot of useful ground, and incorporating PAL and Relog is a bonus. An easy and interesting read. (5/5)