Beginning T-SQL (3e)
Article Index
Beginning T-SQL (3e)
Chapters 7 to 13
Chapters 14 to 18, Conclusion

 

begtsql3

 

Chapter 7 Grouping and Summarizing Data

Aggregate functions summarize data, several are provided in T-SQL, including: COUNT, SUM, AVG, MIN, and MAX. The aggregate functions ignore NULL values, except the COUNT(*) function. The chapter discusses various examples together with their output results.

The GROUP BY and ORDER BY clauses are discussed with referenced to data aggregation. Common problems are highlighted. The WHERE clause can’t contain an aggregate function, but its associated HAVING clause can.

The performance section introduced the SET STATISTICS IO command, this records the IO associated with a query, allowing related/changed queries to be compared in terms of their IO usage.

This is a useful chapter, containing detail that will prove useful in the developer’s everyday work. I was surprised there was no reference to the ability to create your own aggregate function via the CLR. Also, STATISTICS IO has been discussed as a means of measuring performance, I wonder if STATISTICS TIME should have also been introduced (since users are concerned about query duration).

Chapter 8 Discovering Windowing Functions

Windowing functions allow you to perform processing on a set (window) of rows. The basis of these functions is the OVER clause. The chapter opens with a look at various ranking functions, specifically: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. Each is shown with example code and output results.

Results can be summarized with window aggregates, and PARTITION BY specifically is examined. Next, the concept and functions relating to framing are examined with reference to the functions CURRENT ROW, UNBOUNDED PRECEDING, and UNBOUNDED FOLLOWING. The common requirement of calculating running totals is much easier using windowing functions, and an example is provided.

The chapter continues with a look at window analytic functions, specifically LAG, LEAD, FIRST_VALUE, LAST_VALUE, PERCENT_RANK, CUST_DIST, PERCENT_CONT, and PERCENT_DISC. Practical problems that make use of windowing functions include removing duplicates, and the island problem – solutions to both of these are provided. In the performance section, it’s noted that maintenance is typically easier with windowing functions, however they should be tested since they can degrade performance.

This chapter explains the common windowing functions included with SQL Server. Useful examples are provided and discussed. Including the version of SQL Server that introduced the windowing function should prove useful. There’s a helpful reference to Ben-Gan’s work on windowing functions – it might have been useful to have included a reference to his Windows Functions book.

Chapter 9 Advanced WHERE Clauses

The chapter opens with a look at pattern matching, where you supply only a part of the item to search for. Specifically, the LIKE clause is examined with reference to the wildcard characters % (any character), _ (any single character), [] (any of these characters), and ^ (not). A useful method of searching for data that contains wildcards themselves is also given (you need to escape them). As expected, excellent example code is given and discussed.

Multiple predicates can be combined on the WHERE clause by the use of AND, OR and NOT. Since this can get complex, and you need to remember operator precedence, it is recommended to always use brackets to explicitly identify what you are aiming to achieve.

The last section looks at Full-text Search. This allows certain text columns to be searched quickly, using the CONTAINS and FREETEXT commands. Useful links are provided for further information.

Chapter 10 Manipulating Data

Up to now, the book has concentrated on SELECTing data, now this chapter focuses on changing data. The chapter opens with a look at the different ways of adding rows, including: adding a single row, adding multiple rows with one statement, inserting rows from another table and inserting missing rows (uses NOT EXISTS). This is followed by creating and populated a table in one statement (SELECT INTO). A useful section on common insert errors is given (e.g. PK violation, NULL entered, too few columns etc).

The chapter proceeds with a look at deleting rows, leading with the observation that you need to be careful when modifying data! This is following by a brief look at the associated truncate statement. Lastly, the update statement is examined, including updating via a JOIN. All the modification statements (insert/delete/update) are illustrated with have multiple examples.

Finally, the performance section discusses the observation that beginners often process data one row at a time, using procedural code, rather than the preferred SET processing.

This chapter contains good basic information and examples on modifying data (insert/update/delete). The section on common errors associated with modifying data should prove useful for beginners. I was surprised no mention is made of the sequent object (it is given in chapter 14), or IDENTITY_SCOPE. There’s a good point that it is better to create a table and then insert into it, rather than do this in one statement, since the latter can lock system tables and cause problems for other users.

Chapter 11 Writing Advanced Queries

The chapter opens with a look at some of the more advanced features of CTEs, including: using multiple CTEs, referencing a CTE multiple times, joining a CTE to another CTE, using a CTE in a recursive query, and modifying data via a CTE.

There’s a useful discussion concerning the separation of the aggregation query from the rest of the query. Various methods can achieve this, including subqueries and derived tables, but CTEs often provide clearer separation. The OUTPUT clause is briefly discussed, this can retrieve the modified data via the internal inserted and deleted tables. This data can be saved to a table for later analysis.

The MERGE statement is examined next. This combines inserts, deletes, and updates into a single statement – often given better performance but at the cost of increased complexity. This statement is often used in Data Warehouse processing.

GROUPING SETS allow different levels of grouping to be combined within one statement, something achieved with more difficulty with the use of UNIONs. CUBE and ROLLUP are briefly discussed, these add subtotals to aggregates via the GROUP BY clause. CUBE gives subtotals for all possible combinations of grouping levels, and ROLLUP gives subtotals for the hierarchy.

Pivoting converts a column of data values to column headings. The older method of doing this involved the CASE statement, later versions of T-SQL have the PIVOT command. The author argues that using the CASE statement is often easier to understand. The UNPIVOT command is also examined. The chapter ends with a look at paging, where you obtain a subset of rows from a given starting position. Example code is given for paging using ROW_NUMBER and OFFSET/FETCH NEXT.

This chapter builds naturally on the previous chapters to give techniques that provide advanced solutions to problems. The various CTE examples should prove useful in solving a range of problems developers are likely to face. There’s a helpful discussion on separating aggregation logic from the rest of the query – this should also improve maintenance. It might have been useful to state explicitly that using the OUTPUT statement is a great debug technique, but it is implied. It might have been useful to mention the MERGE statement occurs as one transaction.

Chapter 12 Understanding T-SQL Programming Logic

This chapter contains various topics that are loosely tied together via programming. These items provide structures that extend the language features you’ve learned so far. The chapter opens with a look at variables, these hold data. Examples are provided showing how to declare and initialize variables.

The IF/ELSE construct is examined next, this allow decisions to be made. BEGIN and END are used to group statements for a given condition. It’s possible to nest IF/ELSE constructs, here formatting and comments can improve code maintenance. IF EXISTS is used to quickly check that at least one row exists.

Next, the WHILE construct is examined, this allows processing to continue until a given condition is met. Nested loops are also examined. It is possible to exit a loop early via the BREAK command, or to skip processing via the CONTINUE command.

The remainder of the chapter concentrates on temporary tables. Firstly temporary tables and table variables are compared – the author discusses their many misunderstandings and controversies. Temporary tables are very good for breaking up complex queries, making the work easier to understand and can often improve performance. Local and global temporary tables are discussed. The chapter continues with code to create temporary tables, and table variables.

The performance section discusses the use of cursors, which use procedural row-by-row processing, and compares this to the often faster and more maintainable SET processing.

This chapter provides structures that extend data retrieval and modification code. There’s a useful section that dispels many temporary table myths. Perhaps the use of STATISTICS TIME could have been used to compare the performance of the cursor and SET processing examples.

Chapter 13 Managing Transactions

Transactions ensure data quality. This chapter starts with a look at four tenets of transactions, the properties Atomic, Consistent, Isolated, and Durable (ACID). Next explicit transactions are discussed, they start with BEGIN TRANSACTION command, and end with COMMIT or ROLLBACK. The XACT_ABORT setting is discussed with examples, this automatically rollback a transaction when a runtime error relating to data integrity occurs (e.g. PK violation).

The chapter continues with a look at error handling. In previous versions of SQL Server, @@error was used for error processing, however this is messy and inefficient. Newer versions of SQL Server use TRY/CATCH statements, which is more consistent with modern programming languages and practices. The THROW statement allows errors to be raised.

Since transactions take out locks, they can block other queries, this has implication for performance. It is recommended to keep transaction short, and not dependent on user input/confirmation. It is possible to bypass the locking by lowering the isolation level or use NOLOCK, but this can result in dirty reads. Snapshot isolation level, which uses row versioning, is the preferred method of improving concurrency.

This chapter was a useful introduction to transactions, their properties, and how they are used. There’s a useful link to Kalen Delaney’s book “Server Concurrency: Locking, Blocking and Row Versioning”. It might have been useful to explicitly state what a transaction is i.e. ensuring all the queries within a transaction are treated as one unit of work, however this is implied.



Last Updated ( Wednesday, 14 January 2015 )