|Pro SQL Server Internals|
Page 3 of 7
Chapter 12 Temporary Tables
This chapter discusses various temporary holding structures. SQL Server makes extensive use of temporary tables itself, and so do developers.
Temporary tables live in tempdb and hold temporary results. There are 2 types, local – these are visible only in the session that created them (and called routines), and global – these are visible from all session. Temporary tables are often used to simplify queries, additionally, since they often contain a subset of data, they have better statistics and produce good execution plans. Sometimes creating and deleting temporary tables can impact various system pages (IAM/SGAM/PFS) causing a general performance degradation – to obviate this, small temporary tables may be cached.
Table variables are also created in tempdb, they are similar to temporary tables, but they have more limitations. Limitations include restricted scope, and inability to create indexes (except PK/unique constraints). No statistics are kept, it’s assumed they have one row, unless you use a statement level recompile. For a small number of rows, they may perform better than temporary tables.
For completeness, user-defined table types and table-valued parameters (TVP) are examined. Many tests are provided that compare the processing of inserts, XML, SQLBulkCopy, and TVP speeds with differing row counts (TVP are generally very fast, almost as fast as bulk insert).
Since tempdb is often the busiest database, its performance can be critical to the performance of all other databases. It is suggested tempdb is placed on the fastest disks available, and be spread over several disks. The exact number of files should be tested for your own workload. It is suggested that trace flag T118 is enabled on all systems since it reduces contention on tempdb without side effects.
This chapter discussed the importance of the various temporary holding structures, especially in relation to their impact on performance and how it can be improved. There are some good points about temporary tables giving the optimizer more opportunity to produce better execution plans. There’s an interesting grid that compares temporary tables with table variables.
I was surprised the chapter didn’t mention large temporary tables should be allowed to go out of scope, instead of dropping them explicitly - this allows SQL Server to perform the clean up when it has time (a bit like the garbage collector in .NET or java).
Chapter 13 CLR
This chapter starts by differentiating between declarative languages like SQL, which say what needs to be done, and imperative languages like C#, which say how something needs to be done. SQL is optimized for set-based processing, but has limited functionality. The CLR can be used to extend SQL’s functionality by using .NET code, which tends to be better than SQL at calculations and string processing.
SQL Server controls its .NET runtime (memory etc). It’s used to create, compile, and sign assemblies (similar to executables). The various permission sets are described: safe, external_access, and unsafe. Troubleshooting using Extended Events and DMVs is briefly covered. Many DBAs feel uneasy in extending the security surface area via the CLR. By default CLR usage is disabled – the system CLR is always enabled and used internally by SQL Server.
The last section discusses performance considerations. In many ways it is difficult to compare SQL and CLR performance, and this should be so, because ideally they have different uses. Generally, SQL is better for data access and set-based processing, and the CLR is better for calculations.
This chapter provides a good overview of CLR usage, and the problems of persuading DBAs to enable it. There are useful links to troubleshooting DMVs. While C# code is used in the many examples, no attempt is made to create it in Visual Studio, or indeed explain it. The CLR is an all or nothing database level setting, it might be easier to induce DBAs to enable it, if it could be enabled with finer granularity e.g. for safe/external_access/unsafe.
Chapter 14 CLR Types
It is possible to use the CLR to create your own data types, having their own properties, similar to the built-in CLR data types e.g. geography datatype. The chapter start by discussing current methods of creating user-defined datatypes, including their limitations e.g. can’t alter it. A CLR example is provided of a USPostalAddress structure, which can be validated against valid US states, and the address is parsed. The assembly can be altered to create an amended version.
Spatial datatypes (e.g. geography) are discussed with reference to their rich methods. Performance impact is also considered, since they often take up lots of space. The Hierarchyid datatype is useful when working with trees, again this comes at a cost – maintaining the hierarchy can be expensive.
This chapter contains a useful example of using a CLR type. Although the code is supplied, it is not explained, e.g. what is a struct? There’s a good section on expanding the SQL datatypes with CLR types, with the warning that this flexibility often comes at a cost.
Chapter 15 Data Partitioning
The world is producing more data, it’s taking longer to process, and can lead to timeout of clients. Partitioning table and indexes can help. The chapter opens with reasons why you might want to partition e.g. action occurs against recent operational data, which may be relatively small compared with all the data. Index maintenance gets progressively longer as data grows, sometimes eating into non-maintenance time. More recent data could benefit by being placed on faster storage.
Various data partitioning techniques are discussed, including:
Sliding window functionality is discussed with reference to data purging. Purge for partitioned views and tables being relatively simple and fast. Lastly, potential problems are discussed, these include: the partitioned column must be part of the clustered index, so it uses more space, compression may help this, but it needs testing.
This chapter discusses the very real need for partitioning. Various techniques are given. There’s a good example of an order entry system that stores data for 2.5 years, with the aims of partitioning the data on monthly basis. There’s a good comparison of using partitioned tables and views, with the pros and cons of each given. Partitioning problems and their solutions are given. I was a little surprised there was no reference to big data.
Chapter 16 System Design Considerations
The chapter opens with a reminder that databases don’t exist in isolation, they interact with client applications and are part of a wider system. The main emphasis of the chapter is the importance of a layered architecture, which improves maintainability and eases refactoring.
Typically the architecture is partitioned into a data layer, a business layer and a presentation layer. There may be a services layer that allows systems to bypass the presentation layer and interact directly with the business layer.
The chapter then concentrates on the data layer, identifying the various methods of interaction:
The chapter finished by examining Microsoft’s Entity Framework 6 which provides object relational mapping (ORM) capabilities. Patterns examined include the repository pattern, and unit of work pattern. Examples include using stored procedures, lazy data loading, parameterized queries, use of IN (often slow), deletions (tricky), and optimistic concurrency.
This chapter provides a view of database interaction from a client perspective. The importance of a layered architecture is highlighted. Various methods of interaction are examined, and their pros and cons given. There’s a useful link to Microsoft’s architecture guide (patterns and practices). There’s an interesting point about SQL clients that call sp_reset_connection, however this doesn’t reset the transaction isolation level, resulting in difficult to debug problems.
Chapter 17 Lock Types
This chapter discusses the various locks within SQL Server. Data modifications take an exclusive lock and hold it until the end of the transaction. Locks are needed to ensure data consistency, but they come at a cost – they are bad for concurrency.
There’s a useful example showing various locks using the DMV sys.dm_tran_locks in combination with BEGIN TRANS i.e. don’t commit the transaction so we can see its locks. There’s an interesting matrix of Shared, Update and Exclusive lock compatibility, which should prove useful in understanding your locking problems. Transaction isolation level is discussed in relation to pessimistic and optimistic locking, together with the potential problems of each e.g. dirty reads.
This chapter had no subsections or summary section, which left it feeling a little unstructured, however that actual content was very good. In many ways it contains background information necessary for subsequent chapters. Various figures show SQL profiler being used to obtain the lock information, however its use is not explained in the body of the text. Additionally figure 17-3 highlights the row count (20+ million), again this is not explained in the text. Perhaps this chapter was a little rushed!
|Last Updated ( Tuesday, 01 May 2018 )|