Page 1 of 3
Author: Diana Dee et al
The book’s introduction says "This is a book for [SQL Server] DBAs, for things you think they really ought to know…", so how does it fare?
This book is a diverse collection of SQL Server related articles by 15 experienced, but previously unpublished authors. It stems from a collaboration between Midnight DBA, who hatched the idea to collect the tribal knowledge of its community of DBAs and Red Gate Publishing who teamed up with them to produce it. The topics covered range from compression and page-level data content, through to Agile Development and Project Management for DBAs.
Below is a chapter-by-chapter exploration of the topics covered.
Chapter 1 SQL Server Storage Internals 101
This chapter provides an introduction to the physical storage internals within SQL Server. The premise being you should be able to take advantage of this internals knowledge to delivery more optimal solutions.
The chapter starts with a look at rows, discussing the fixedvar format - a structure that details the layout of each record. Next, page layout is examined in terms of the header, body and record offset array. The content of the physical page is described using the DBCC PAGE command.
Heaps and indexes are discussed in relation to what they are, and how they change as their data changes. The problems of fragmentation and forwarded records are described. B-trees are discussed in detail. There’s some useful SQL code for obtaining the number of forwarded rows.
The author uses the information discussed to get a storage capacity estimate. I do wonder if the calculation is a little too complex, since this is an estimate, and various factors could invalidate it (e.g. varying row size and number of rows).
The chapter contains an error using the DBCC IND command, there is no closing quote mark around Person.Person i.e. DBCC IND (AdventureWorks2008R2, ‘Person.Person, 1). Additionally, in investigating sys.database_files, rows are selected WHERE type = 0, but does not say what ‘type = 0’ means (it means files of type ‘row’).
Overall the chapter provides a useful introduction into the content of the rows and pages of a database. Additionally, the overview of heaps and indexes is useful together with the problems that changing data cause.
Chapter 2 SQL Server Compression
If the data required by users is on the same page or nearby pages, data compression should improve performance, since more data is on each page, you get more data per read/write.
The chapter begins with an overview of compression, its benefits (typically less I/O and reduced storage) and its costs (typically increased CPU usage). Row compression is discussed as effectively moving data into a smaller data type, thus saving storage. A useful example is discussed, and links to further information provided. Page compression is the removal of duplicate values, having 3 stages of compression (row, column prefix and page dictionary), again a useful example is provided.
The relationship between compression ratios and CPU overhead is examined, row compression can add up to 10%, and page compression can add up to 20%. The use of the Microsoft routine sp_estimate_data_compression_savings in estimating any saving is discussed. The impact of compression can be measured using Performance Monitor counters (e.g. % processor time). Finally the impact of data usage patterns on compression choice is examined (scanned data is often a good compression candidate). Compression of older data (especially partitioned) is discussed.
The author rightly asserts you should test compression on your own system to determine if it is worthwhile. Finally, the impact of data compression on Backup Compression and Transparent Data Encryption is briefly discussed.
There’s some useful SQL for calculating the ratio of index updates relative to the number of scans, which should help determine if an index is a good candidate for compression. However, there’s a small error (on Safari), “s.user seeks” should read “s.user_seeks”, and same for “s.user updates”.
This chapter provides a useful overview of the benefits of compression (more data per read/write), at the cost of greater CPU usage. The examples provided for row and page compression are especially useful in helping illustrate how they work. There’s a useful point about not compressing everything, but use the Microsoft supplied compression estimation tool to determine what should possibly be compressed. There’s some useful links to further information on compression.
Chapter 3 Verifying Backups Using Statistical Sampling
Taking backups is good, but unless you verify them you are taking risks. The system discussed here automatically performs restores and verifies they are viable. For large systems, verifying all backups is not feasible, the author discusses a sampling method that allows you to assign a degree of confidence that your backups are good i.e. 95% confident level.
The chapter starts with the assertion that performing backups with the WITH CHECKSUM option, and restoring via the RESTORE VERIFYONLY option, still leaves plenty of room for errors. The only true way to ensure a backup is valid is to restore it and run DBCC CHECKDB on it.
For larger systems, where time/resources are limited, only a sample of the backups can be checked. You can use the statistical method provided to determine the number of backups you should verify, for a given confidence level.
Only selective parts of the author’s code are discussed in the chapter, but the pseudocode used to describe this automated method is:
There’s a very useful discussion of sample size, confidence levels, margin of error, response distribution, and population size – all useful in understanding the author’s routine that calculates the number of backups to verify based on a given level of confidence etc.
Unfortunately, in both the book and on Safari, the supplied routine up_GetSampleSize contains a small error, there are no declarations for variables @Q3 and @Q4.
I found this chapter in particular very interesting, the use of statistics to make an informed decision about the number of backups to verify – is a great idea!
Chapter 4 Performance Tuning with SQL Trace and Extended Events
SQL Trace is a great tool for monitoring and troubleshooting, however it has limitations, and is marked as deprecated, meaning it will be removed from future version of SQL Server. Luckily, there’s another tool that can take over – Extended Events.
The chapter opens with a look at the various ways we can monitor systems, including DMVs, PerfMon, Extended Events (XE) and SQL Trace. The latter will be with us for a while, and owing to its ease of use is a very popular tool.
The chapter continues with an in-depth look at SQL Trace, how it works, its advantages and disadvantages, and some best practices (e.g. don’t save trace to a table, avoid events that fire often). An example trace is given, and its various sections explained. Viewing and filter data from the trace file via the fn_trace_getinfo function is also explained. The free and very helpful trace analysis tool ClearTrace (www.scalesql.com/cleartrace/) is examined. Finally, potential solutions to common performance problems are discussed (e.g. missing indexes, stale statistics, parameter sniffing).
The second sections of the chapter discusses XE, this is a lightweight and more flexible replacement for SQL Trace. Creating an XE session via the GUI is shown with step-by-step instruction (there’s also a wizard). The next section discusses the conversion of your traces to XE sessions, various methods are given including manual conversion, via a SQL CLR utility, and Jonathan Kehayias’s converter.
This was an interesting chapter, a mix of old and new, with a balanced discussion of the pros and cons of both SQL Trace and XE. Additionally, there are some very useful links.
|Last Updated ( Friday, 14 February 2014 )|