|Pro SQL Server Wait Statistics|
Page 1 of 2
Author: Enrico van de Laar
This book aims to troubleshoot problems and improve the performance of SQL Server queries by using Wait Statistics, how does it fare?
When SQL code is not running on the CPU, SQL Server records the reason why it is not running (e.g. blocking or waiting for IO to complete). Analyzing these Wait Statistics (Wait Stats), and making changes based on them, can improve the performance of your queries.
The book is targeted at DBAs and SQL developers who want to troubleshoot problems and improve the performance of their SQL Servers and queries. There are 771 types of Wait Stats in SQL Server 2014, and the most common ones are discussed here.
The book assumes little specific knowledge about SQL Server; however, an understanding will make the book easier to read. The book contains three chapters relating to the Wait Stats analysis method, and nine chapters on wait types relating to specific areas of processing.
Below is a chapter-by-chapter exploration of the topics covered.
Part I Foundations of Wait Statistics Analysis
Chapter 1 Wait Statistics Internals
This chapter opens with a discussion on what Wait Stats are: they indicate why SQL code is not running on the CPUs. They are not well-known to DBAs or developers. A brief history of Wait Stats is given, having existed in earlier versions of SQL Server, they became more usable with the advent of Dynamic Management Views (DMVs) in SQL Server 2005.
The chapter continues with a look at SQLOS, which provides operating system (OS) functionality, allowing SQL Server to take advantage of hardware improvements. Next, schedulers, tasks, and worker threads are discussed, and their work illustrated with the use of DMVs.
The chapter ends with a look at Wait Stats and how they relate to the various queues (i.e. running, runnable, and waiter list). The movement of queries between the queues is shown with examples.
This chapter provides a useful overview of what Wait Stats are, together with the SQL Server OS architecture, and the flow between sessions, requests, tasks and threads. There’s a reference to Tom Davidson’s helpful paper “SQL Server 2005 Waits and Queues” - which is still applicable today.
I don’t agree with everything that’s said. The technique is not new, indeed Davidson’s related paper dates from 2006. Additionally, people that deal with performance tuning will be very familiar with Wait Stats, it’s a well-known tuning technique.
This chapter contains well written discussions, helpful references to websites, inter-chapter links, example SQL code to support the assertions, useful diagrams, outputs, and practical instructions. These traits apply to the book as a whole.
Chapter 2: Querying SQL Server Wait Statistics
This chapter opens with a look at the DMV sys.dm_os_wait_stats which contains accumulative Wait Stats data, by wait type, for the instance, since the last restart, reboot, or manual reset. The various columns in the DMV are explained. Next, the DMV sys.dm_os_waiting_tasks is discussed, this contains wait information for queries that are currently executing, and its major columns are discussed. Lastly, the DMV sys.dm_exec_requests is discussed, this contains information about the currently executing queries, and this can be used to obtain the execution plan and the SQL text.
The chapter continues by discussing how to combine various DMVs together to detect what queries are currently waiting. An example is provided that shows how these DMVs can be used to detect a blocking problem. A useful flowchart is provided of the method involved.
It is also possible to view Wait Stats via the Performance Monitor (Perfmon) tool. This is briefly shown, and the advantage of graphical output highlighted.
The chapter ends with a look at using Extended Events (XE) to capture Waits Stats. XE is a lightweight monitoring tool, intended to replace the deprecated SQL Profiler/Trace tool. A step-by-step example is shown of recording the Wait Stats for a given query, via its session id.
This chapter provides a useful insight into capturing Wait Stats of currently executing queries. The underlying DMVs and their columns are discussed, and useful examples provided. There’s a helpful reference to Grant Fritchey’s free book “Execution Plan Basics”, and Jonathan Kehayias’s XE posts.
The chapter is largely concerned with using Wait Stats to investigate problems that are currently occurring. It should be noted that other methods exist (e.g. sp_who2) to discover blocking etc. Perhaps some emphasis could have been given to using the accumulated Wait Stats information to analyze the overall health on the SQL Server instance – again this is a well known technique.
While some example code shows the query’s underlying text, if the query is a stored procedure or a batch, the specific line of SQL executing is not shown – however, it is relatively easy to get this useful additional information, and this should have been included.
Chapter 3: Building a Solid Baseline
The chapter opens with a discussion on the importance of baselines. If you’re familiar with your own SQL Server processing, you may know when certain wait type values are unusual. However this is not possible if you manage a large number of servers.
The chapter continues with an examination of what baselines are, and the need to refresh them periodically. Comparing the current Wait Stats with the historic baseline should highlight differences that probably reflect the underlying problem.
Using baselines can have its weaknesses, and these are discussed next. These include: too much information, understanding the performance metric, and focussing on the wrong metric. SQL code is provided to create, maintain, and compare baselines and measurements. A SQL Server Agent job is used to schedule the recording of the Wait Stats information.
The chapter ends with a look at baseline analysis. Essentially, a comparison is made of the Wait Stats that exist during the problem time and the baseline, any differences are noted. An example is provided of reports taking a long time to run, the comparison highlights disk latency as being very high, contacting the stage team confirms a disk problem.
This chapter provides a useful understanding of the importance of baselines in assessing if current waits are within an expected range or highlight a cause for concern.
The author says “Without a solid baseline your measurements mean nothing!” - I think this is wrong. If you record the Wait Stats for a given query (e.g. using XE), or for all queries over a given time (e.g. 10 minutes), or use the Wait Stats accumulated since the instance was restarted, you can use the relatively position of the wait types to determine the underlying problems. Again this is a known method. Another concern is, if the baseline contains high values for certain wait types, they may not be investigated, they will be taken as normal – since it’s the baseline!
|Last Updated ( Wednesday, 23 November 2022 )|