SQL Server Advanced Troubleshooting and Performance Tuning (O'Reilly)
Article Index
SQL Server Advanced Troubleshooting and Performance Tuning (O'Reilly)
Chapters 3 - 6
Chapters 7 - 9
Chapters 10 -13
Chapter 14 on; Conclusion

Banner

Chapter 3. Disk Subsystem Performance

The previous two chapters have set the background, this and the next few chapters look at troubleshooting from a resource problem perspective. Traditionally, many SQL Server systems have been IO bound, you would expect a modern system to have average disk times of 1-2ms for log files, and 3-5ms for data files. 

The chapter opens with a look at the various components of an IO subsystems, including the buffer pool cache, data disks, and the log file. Data is read from the disks into the buffer pool, where the application interacts with it, any changes are written to the buffer pool before being written to the log, and eventually to the data files. Latches are used to protect data in memory. A useful diagram describes this processing. 

The author encourages a holistic approach to IO problems. It is common for the Network team and the DBAs to argue about where the IO problems lie. Various tools can be used to simulate a SQL Server workload, and DiskSpd is discussed. The DMV sys.dm_io_virtual_file_stats is often the starting place to identify the database and data/log file having the biggest problem. It is possible to obtain the queries using the most IO, and these should be investigated for improvement (e.g. missing indexes). 

DMV metrics are typically accumulated since the last restart or reboot. It is possible to create a DMV snapshot of values for a given time period. Here, a snapshot of the current DMVs is taken, then the query you want to investigate is run, then a second DMV snapshot is taken, and finally the delta between the 2 snapshots is taken. Code for this is provided. 

Additionally, Performance Monitor counters and OS metrics can be investigated. A list of the more important counters is provided, here these mostly relate to the physical disk. 

Where possible the DBA should talk with network team, various checks can be undertaken, including:

  • the VM host is not overprovisioned
  • HBA queue depth is optimal
  • vendor’s storage optimization recommendations

The chapter ends with a look at some of the more common IO related waits, what they mean, and how they might be fixed.

This chapter provides a very useful look at how to identify IO related problems, together with their potential solutions. Use code is provided throughout.

Chapter 4. Inefficient Queries

Many of the chapters look at problems relating to components, like CPUs, memory, and IO. In each case, the underlying problem might be inefficient SQL code, this could be due to missing indexes, stale statistics, etc. Inefficient queries can increase IO, memory usage, CPU usage, and blocking. 

It may be possible to improve query performance by increasing the amount of hardware, however, while this may be an answer in the short-term, this tends not to be scalable and often the problem can return. The author highlights that performance tuning may not give the best Return on Investment (RoI).

Code is provided to find the top 50 queries using the most CPU, and another for finding the most IO. However, these queries refer to average CPU and average IO usage, I do wonder if these queries should be sorted by total CPU and total IO, rather than the average. If a query takes 20mins of CPU to run, and runs twice, its average duration is 20mins, if another query take 2 mins of CPU to run, and is run 1000 times, its average duration is 2 mins, but if you can improve the latter query by 50% you save 500 mins, whereas improving the first query by 50% you save 20mins, so I think we should focus on the total value rather than the average.

There’s some useful code for looking at related queries, those which are stored in the cache separately, but which really could be the same i.e. they have the same hash. Often you must enable forced parameterization to reuse the plans (which saves space and can improve performance significantly). 

There’s a useful reminder that getting performance data can be expensive, so it makes sense to run these queries with the OPTION (MAXDOP 1) hint. DMVs do not record all queries since some are not cached (e.g. those with RECOMPILE). That said, using DMVs is often a useful fast and ‘good enough’ initial method for investigation of performance problems. 

For more exact methods, you might want to look at Extended Events (XEs), SQL Traces, and Query Store. these are examined next. The use of SQL Trace has been superseded by XEs, so only XEs are discussed here. Contrary to most information sources, which regards XEs are lightweight, the author suggests XEs can sometimes be demanding on resources, so should be used carefully. Useful code is provided to capture and process queries with high CPU and IO usage. 

DMVs that use cached plans can omit queries, and XEs and SQL Traces can be expensive to run. The author suggests using Query Store is the solution. Query Store does have some overhead, especially if you capture details of all the queries (it may be preferable to ignore queries with little resource usage). Brief details are provided on the 6 Query Store reports. 

The data in the Query Store can be accessed via system views, these are discussed together with a useful diagram. Again, useful code to obtain the 50 most IO intensive queries is provided (this can be altered for CPU usage etc). There’s a very useful tip about using DBCC CLONE DATABASE, this copies a database’s structure, without its data, but includes its Query Store, allowing analysis of data away from a production database.

This chapter provided a useful overview of various built-in tools for investigating performance, with helpful code provided throughout. I’m not sure why this chapter was placed here, I would have expected chapters on CPU, memory, IO, network, to be adjacent to one another.

Chapter 5. Data Storage and Query Tuning

Understanding how data is stored, should help you understand how data is retrieved and modified, enabling you to identify areas for performance improvements.

First, there’s a look at the 3 types of data storage: rows (traditional), columns (typically optimal for OLAP), and in-memory (typically optimal for OLTP). Most of the chapter focuses on the traditional row-based storage (i.e. heaps, clustered indexes, non-clustered indexes). Some common problems are identified, including: excessive lookups, mismatched data types, and non-SARGable predicates (the latter typically means the index cannot be used). Some tips relating to composite and non-clustered indexes are provided. There’s some useful diagrams and discussions relating to table/index navigation.

Next, there’s a look at index fragmentation, which typically means it takes longer to retrieve the data. Its causes and possible solutions are outlined (e.g. FILLFACTOR of 85% can often reduce page splits). It’s recommended to convert heaps to clustered indexes. The author encourages you to investigate and use Ola Hallengren’s SQL Server Maintenance Solution, which includes index maintenance scripts/jobs.

Database statistics record the density and distribution of column values, typically of indexes. It’s important to have up to date statistics, since they are used by the optimizer to estimate the efficiency of data access, this includes whether and index is used, and how it will be used (i.e. seek, scan, lookup). Sometimes the default statistics update method is suboptimal, and a proactive regular update stats job may be preferable. 

From SQL Server 2014 and onwards, there’s a new Cardinality Estimator, this uses the stats and other inputs to estimate the number of rows retrieved. This new Cardinality Estimator typically improves performance, but some queries can regress, these can be changed to use the legacy Cardinality Estimator. SQL Server 2022 can use Cardinality Estimator feedback to dynamically determine the correct Cardinality Estimator to use.

There’s a useful section on how to analyze Execution Plans, showing both row mode and batch mode execution. There’s a selective examination of some of the more useful operators. The author suggests SolarWinds Plan Explorer is a must have free tool for examining execution plans. 

Discussing the storage background, leads to a very useful section on common problems and inefficiencies. These include inefficient code, incorrect join type, and excessive key lookups. In each case the problem is discussed, and some potential solutions offered.

The chapter ends with a reiteration that indexes and workloads should be looked at holistically. When I read this, I was expecting it to lead to a discussion of the Database Engine Tuning Advisor (DTA), which can help determine your ‘optimal’ data structures based on a workload – however it was not discussed.

This was a useful chapter, showing how tables, heaps, clustered and non-clustered indexes, can impact performance. Common problems are identified, and some very useful tips and solutions are provided.

Chapter 6. CPU Load

In the past, I’ve found that most SQL Server systems were limited by their IO subsystem, but in recent years, I’m finding that IO subsystems are improving, and the bottleneck is now often related to CPU and/or memory.

A high number of logical reads (i.e. data read from the memory) can use a lot of CPU, and these reads can be due to nonoptimized queries. Similarly, row by row cursor processing can also impact CPU usage. Query Store and the DMV sys.dm_exec_procedure_stats can be used to identify the most CPU intensive queries, these should be examined for improvements (e.g. missing indexes, stale stats). 

Some useful scripts for troubleshooting high CPU load are provided (e.g. last 256 mins of server CPU usage). Advice is provided on some common nonoptimized query patterns to look for, including queries that scan millions of rows, sorting, and aggregation.

Next, query compilation and plan caching are discussed – both can impact CPU load. The issue of parameter sniffing is examined, with useful example code, various solutions to address it are examined (e.g. disabled database level parameter sniffing). 

Following on, compilation and parameterization is discussed. The plan cache can contain many plans for ad-hoc queries that are really the same. Enabling Forced Parameterization can fix this problem, and reduce compilation/recompilation, and thus CPU load. You should examine the type of plans in the plan cache, to determine if forcing parameterization could be advantageous (code is provided for this). 

The chapter ends with a look at Parallelism. Queries can often run faster if they are run across several CPUs. However, there is a cost associated with splitting and then merging/aggregating the results back together. On OLTP systems, this overhead of parallelism can be a problem. Two setting are involved with parallelism:

  • max degree of parallelism (MAXDOP) - specified the number of CPUs that can be used when a query executes in parallel

  • cost threshold for parallelism – the optimizer produces a cost for the query, if this is greater than the value of cost threshold for parallelism then the query can run in parallel

On modern systems, the default value for cost threshold for parallelism is 5 (seconds), this is too low, the author suggests a value of 50 should be used. Some DBAs set MAXDOP to 1 to get rid of the parallelism waits, however this only hides the problem, and limits the use of CPUs.

There’s a very useful suggestion of enabled Forced Parameterization and disabling Parameter Sniffing (with the latter being a basic problem of cached plans).

I found this a very useful chapter, especially with the growing number of systems having CPU related concerns.



Last Updated ( Wednesday, 24 August 2022 )