|Expert SQL Server In-Memory OLTP|
Page 1 of 2
Author: Dmitri Korotkevitch
This book aims to explain SQL Server In-Memory OLTP, which can dramatically improve performance, how does it fare?
In-Memory OLTP (Online Transaction Processing) is the primary reason to migrate to SQL Server 2014. Under the right conditions it can offer significant performance benefits, with code typically running 20 to 40 times faster.
This book is targeted at SQL Server developers and architects wanting to know how to take advantage of the potential performance improvements. Some background knowledge of SQL Server, especially performance, will be useful.
This is a single topic book, having 11 chapters and 4 short appendices, contained within around 240 working pages.
Below is a chapter-by-chapter exploration of the topics covered.
Chapter 1 Why In-Memory OLTP?
The chapter opens with a look at the background of In-Memory OLTP. Historically, memory was expensive, it made sense to store data on disks and bring it into memory for processing. With the advent of much cheaper memory, a different processing model could be used. In-Memory OLTP was designed for scalable fast processing. The design resulted in In-Memory structures, a concurrency model that eliminates locks and latches, and natively compiled stored procedures that use significantly fewer CPU cycles – all of which can significantly improve performance.
Additionally, In-Memory OLTP is integrated seamlessly with other SQL Server 2014 components, there is no need for system refactoring – this is a major advantage when compared with other database vendors. There are some limitations, but you can expect these to reduce with future versions of SQL Server.
The chapter continues with a look at In-Memory OLTP architecture. While In-Memory objects are fully integrated into SQL Server, and can be used transparently, internally they are different. The objects are stored in memory, they live outside the buffer pool, and have different structures compared with on-disk structures. Classic (i.e. interpreted) SQL can access In-Memory tables via interop, and this is typically 2 to 4 times faster than accessing the corresponding on-disk table. Natively compiled stored procedures can only access In-Memory tables, and are typically much faster than using interpreted SQL.
This chapter provides a useful overview of how In-Memory OLTP arose, together with its major components.
Useful discussions, diagrams, practical example code, website links, and inter-chapter links are given throughout. These traits apply to the whole of the book.
Chapter 2 In-Memory OLTP Objects
This chapter opens with a look at preparing a database to use In-Memory OLTP. Firstly a separate filegroup with the keyword CONTAINS MEMORY_OPTIMIZED_DATA is required, a code example of this is provided. Next, sample code is provided to create an In-Memory table, and the various keywords discussed (memory_optimized, durability). Indexes are created in-line at the same time as the creation of the table. Various limitations are briefly discussed, including: no triggers, no foreign keys, and no off-row storage. The two types of index are briefly discussed, hash indexes are optimized for key lookups, and nonclustered indexes are useful for searching for a range of data.
The chapter continues with a look at working with In-Memory tables. Interpreted SQL treats In-Memory tables like on-disk tables, and most SQL is supported but there are a few restrictions (e.g. can’t truncate tables). Next, natively compiled stored procedures are examined, these can only access In-Memory tables, and are compiled into DDLs for improved performance.
The chapter ends with a look at the optimistic concurrency model used by In-Memory OLTP, this is lock and latch free, using versioned records similar to the snapshot isolation level. Example code is provided to compare the performance of interpreted SQL running against both In-Memory tables and on-disk tables, and this is compared with running natively compiled stored procedures against In-Memory tables. The results show that In-Memory tables perform better than on-disk tables, additionally In-Memory OLTP provides improved scalability.
This chapter provides helpful example code to create the relevant In-Memory structures, additionally, the performance advantage of In-Memory tables, and natively compiled stored procedures are demonstrated.
Chapter 3 Memory-Optimized Tables
This chapter provides greater implementation detail. First, the structure of on-disk and In-Memory tables are discussed and compared. In-Memory tables don’t use pages (except the Bw-tree), and data rows reference each other via pointers, a table needs to include at least 1 index, and indexes are embedded into the table.
The chapter continues with a review of the optimistic concurrency model used. In-Memory table data is never updated, instead the existing row is marked as deleted and a new row inserted, this is controlled via the BeginTs and EndTs columns on each row. This approach means there is no need for locks and latches, improving performance and scalability. Deleted rows are removed later by the Garbage Collector. The format of the data row is discussed, having a row header (containing: BeginTs, EndTs, StmtId, idxLinkcount, Index pointer array), and a Payload (the actual data).
Next, the process of compiling the In-Memory tables is discussed. A separate DLL is created for each table, these DLLs are used to access and manipulate data in payload section of the row. Is it not possible to alter a table or index after it has been created – instead you need to drop and recreate the table/indexes. The section continues with a look at the various limitations, these relate to: data types, table constraints, and database-level limitations.
The chapter ends with a brief look at how In-Memory OLTP interacts with High Availability (HA) technologies. AlwaysOn failover cluster, Availability Groups, log shipping, and replication are supported. Mirroring is not supported, and this is to be expected since AlwaysOn Availability Groups is the preferred HA mechanism.
This chapter provides a helpful comparison of on-disk and In-Memory tables. There’s a useful overview of the new Multiversion Concurrency Control (MVCC), showing why locks and latches are no longer necessary. The discussion of the data row format provides a helpful insight into the internals. The limitations discussed should prove useful when migrating existing systems to In-Memory OLTP. Although simple, I liked the explanation of the Halloween protection.
Chapter 4 Hash Indexes
This chapter discusses a new type of index, the hash index. The chapter starts with an overview of what hashing is - it provides a compact value of data, which typically improves performance. Ideally each hash will have its own bucket, but uniqueness can’t be guaranteed and collisions can occur, this can impact performance.
After examining the structure of the hash index, the chapter continues with a discussion on the importance of the bucket count. The bucket size is pre-allocated, too many and space is wasted, too few and collisions occur resulting in traversing more rows in bucket to get to correct data - decreased performance. Example code is provided that illustrates the performance impact of having too high and too low bucket counts. Ideally, the bucket count should relate to the number of unique keys, but also take into account any future expected growth.
It is not possible to alter the bucket count after the table has been created, instead the table needs to be dropped, recreated with the required bucket count, and the data reloaded. Various Dynamic Management Views (DMVs) are shown to examine bucket counts and row chain lengths etc.
The chapter next looks at hash indexes and SARGability – the ability to use predicates to search via an index. Hash indexes are very efficient for point-lookup equality searches. Is should be noted, for composite indexes, you need to specify all the columns in the composite index – there is no partial matching as is possible with B-tree indexes.
The chapter ends with a look at statistics on In-Memory tables. Statistics describe the distribution and density of column values, and are used by the optimizer to estimate cardinality, and thus produce an efficient query plan. Unlike on-disk tables/indexes, the statistics are not updated automatically as the data changes. It is important to update the statistics of In-Memory tables manually using UPDATE STATISTICS, or sp_updatestats – this is especially true after a large data load or system restart.
This chapter provides a very useful discussion of hash indexes, describing the importance of the bucket count, and its role in performance. The importance of manually updated the statistic is noted.
Chapter 5 Nonclustered Indexes
Nonclustered indexes, like their on-disk counterparts, are optimized for searching for a range of values. The chapter has example code to create a nonclustered index, and this needs to be part of the table creation, it cannot be created afterwards – again the index is cooked into the table, to alter an index you need to drop and create the table and indexes. It’s noted that character data needs to be defined with BIN2 binary collation (i.e. case-sensitive).
Nonclustered indexes are good for index seeks that locate a subset of data. The SARGability rules are similar to on-disk indexes, however the index is unidirectional, so if you want to search an index in both directions, you need to create 2 indexes. Various example code is provided.
The chapter continues with at look at internals. A variation on the traditional B-tree is used - the Bw-tree, and this is described. An index cannot be altered once created, index delta records are created for any inserts and deletes, an update creates 2 delta records – a delete and an insert. SQL Server needs to traverse and analyze all delta records when accessing an index page, and long chains can degrade performance. Delta records are periodically consolidated and new index pages created, and the old page is marked for collection by the Garbage Collector.
The chapter next looks at how to obtain information about the nonclustered Indexes, this is primarily via the DMVs sys.dm_db_xtp_index_stats and sys.dm_db_xtp_nonclustered_index_stats, example code showing their usage is discussed.
The chapter ends with a look at when to use hash or nonclustered indexes. Hash index are best for point lookups with equality predicates, and good if the workload and data are relatively static. Nonclustered indexes have much wider usage, as well as being useful for identifying a subset of rows, they may be useful if the hash has long chain due to an incorrect bucket_count – code is given that illustrates this. Overall, the author suggests nonclustered indexes are safer choice, especially if it is difficult to estimate the bucket count.
This chapter provides a useful discussion about nonclustered indexes, describing their internal structure, how they’re created, its SARGability rules, how to obtain information about them, and how they compare to hash indexes.
|Last Updated ( Tuesday, 01 May 2018 )|