Page 3 of 3
Chapter 15 Reliability and Scalability
The authors suggest that reliability and scalability problems often occur together in SSIS, hence this chapter’s grouping.
The chapter opens with a discussion of the use of Checkpoints to enable the restart of a package from a given point. Checkpoints enhance recoverability, and are typically set after a long running piece of work (so you can restart from after that point). The various checkpoint settings are explained, and a very simple example then given, allowing you to concentrate on the workings of the Checkpoint itself. This is followed with more complex examples including: nested containers, and failing parent not the package.
Various examples of Package Transactions are given, using the Distributed Transaction Co-ordinator (DTC) or native transactions. The TransactionOption property (i.e. Supported, NotSupported, Required) is explained, again with examples.
Error outputs are discussed in relation to improving reliability and scalability. The pattern of separating out error rows, correcting them, and integrating them back in, is discussed – again the emphasis is on applying error processing to only those rows that don’t match ‘normal’ processing.
The section on ‘Scaling Out’ looks at the Lookup transform (caching data), data pipeline (multi-threaded engine for true parallelism), scaling out memory (use memory instead of staging tables). Lastly, scaling across machines is discussed.
I was a bit surprised, in light of its prevalence in the media, that any reference to Big Data was not discussed here.
Chapter 16 Understanding and Tuning the Data Flow Engine
This chapter provides a detailed look at the Data Flow engine. Since much SSIS development work concerns the Data Flow, its optimization should lead to improved performance.
Serial and parallel tasks, together with synchronous and a synchronous processing are discussed. The importance of the memory buffer architecture is shown. The distinction between blocking (aggregate, fuzzy lookup, sort, script etc), semi-blocking (merge, pivot etc), and non-blocking (audit, conditional split, data conversion, derived column, union all etc) components is given. Obviously when you have a choice, you should aim for solutions with minimal blocking.
Reports and log-related events can be used to examine, understand, and monitor the data flow. There are many steps in the data flow pipeline that can be logged. Many tips are given for Data Flow tuning, including:
- Limit synchronous processing
- Reduce staging and disk I/O
- Reduce reliance on RDBMS (a common mistake ex-DBAs make)
- Careful use of row-based transformations,
Of course, you need to monitor and profile before you optimize. Using the aforementioned reports and pipeline logging events will be helpful in detecting the bottlenecks. Additionally, various Performance Monitor (perfMon) counters are discussed.
In essence, the chapter states the more you understand the underlying Data Flow architecture, the more options you’ll have in creating solutions that perform and scale well.
Chapter 17 SSIS Software Development Life Cycle
The chapter opens with a review of main types of Software Development Life Cycle (SDLC), specifically waterfall and agile. It notes that previously, database work has traditionally little integration with SDLCs, however this is changing.
The importance of versioning (ability to restore to a previous version of the code), and Source Control (the system that hold the versions) is discussed. The examples given involve the Subversion (SVN) source control system. There’s a useful walkthrough of setting up subversion, and integrating it with Visual Studio, together with example usage.
Visual Studio Team System (VSTS) is Microsoft’s powerful SDLC and project management tool. A walkthrough of setting up and integrating VSTS is given.
This chapter discusses SDLC and versioning, which are traditionally missing from database work. The chapter explains how to use SVN and VSTS to provide this functionality. For me, this is a useful but non-core chapter, it should have been moved closer to the end of the book (error handling, the next chapter, being more relevant).
Chapter 18 Error and Event Handling
This chapter discusses features that allow you to easily control the workflow of your SSIS packages at a low-level, by using precedence constraints. Additionally, error and event handling are discussed.
Precedence constraints determine what to do after a task finishes with success, failure or completion. They are especially powerful when combined with expressions. Several useful examples are provided.
The rest of the chapter is concerned with event handling, which allows you to respond to certain events being raised, by writing your own code. The events include OnError, OnPreExecute, OnPostExecute, OnPreValidate, OnWarning, OnQueryCancel, and OnTaskFailed. Error handling is discussed in detail, with reference to breakpoints and component failure.
Finally, logging is discussed, aiding both monitoring and troubleshooting. Log data can be written to text files, SQL Server trace file, SQL Server table, Windows Event Log, XML file, and the SSIS catalog. As with event handling, logs can be written in response to various events being raised.
This will undoubtedly be a popular chapter, developers often spend a long time in debug/troubleshooting mode. In some ways, the content of this chapter seemed muddled, in that both normal and error handling concepts are treated together, and related topics exist elsewhere.
Chapter 19 Programming and Extending SSIS
SSIS provides plenty of built-in components, and other sites can provide others (e.g. www.codeplex.com). However, there may be times when none of the available components do exactly what you want (e.g. process a file with a proprietary format). This chapter provides examples and templates that allow you to develop your own components.
Specifically, 3 components are created: a source adapter, a transformation, and a destination adapter. Each walkthrough discusses the functionality and code required to create the component, which are built using C#. To get the most out of this chapter you probably need to be a .NET developer or perhaps have aspiration to be one.
Chapter 20 Adding a User Interface to Your Component
This chapter is an extension of the previous chapter. When you create your own components, the default user interface supplied to its properties is basic. The aim of this chapter is create a more user-friendly User Interface for your custom component. Perhaps this chapter could have been combined with the previous one?
Chapter 21 External Management and WMI Task Implementation
This chapter adds a new SSIS package to one of the projects from Chapter 19 and is again aimed at .NET developers with code presented in C# and VB. It first looks at the Managed Object Model Code Libary, then goes through Pacage Operations, Package Log Provides nd Package Configuarations. In the secocond half of the chapter the focus moves to Windows Management Instrumentation (WMI) tasks and you learn how to use the WMI Reader Task and the WMI Event Watcher Task in your packages.
Chapter 22 Administering SSIS
This chapter contains a miscellany of functionality under the general heading of administration. The SSIS catalog provides centralised information about SSIS, for example, package parameters, execution success, and execution times. The walkthrough describes how you create the SSIS catalog and the SSISDB database (this has information and objects for the SSIS catalog).
The chapter moves on to the factors involved in choosing and using the different deployment models (i.e. Project or Package deployment). Project deployment is preferred since you can take advantage of the SSIS catalog. It is possible to swap between the different deployment models. Note the older package deployment model deploys to the file system or MSDB database.
Example SQL is provided to query the SSIS catalog, including SQL to execute and validate packages. Using the various catalog menu options to script to a new window, provides a good way to learn about the underlying SQL used to query the catalog.
Various command line utilities are discussed, these include:
- DTExec (for configuring and executing packages)
- DTExecUI (wrapper for DTExec)
- DTUtil (a hidden tool, useful for moving and signing packages)
The final section discusses scheduling packages using SQL Server Agent. Lastly package execution can be monitored using perfMon counters, and various in-build reports (additionally, custom reports are also obtainable e.g. www.complex.com).
The book contains some very useful appendixes. Namely:
- SSIS Crib Notes (provides a lookup list of what functionality you require against the relevant component to perform it)
- SSIS Internal Views and Stored Procedures (this briefly outlines the objects in the SSISDB catalog, it should prove very useful when examining package execution)
- Interviewing for an ETL Developer Position (contains a good sample of ETL questions, concentrating on technical aspects of development. It also has a link to the relevant book chapter for further information). For example: discuss why you use checkpoints and how they affect the execution of an SSIS package (see Chapter 15).
This book aims to take you to a professional level of understanding of SSIS, and I think it succeeds admirably. The book is an easy read, the authors have taken the time to explain things in a clear yet concise manner. The book assumes no previous knowledge of SSIS, and will take your level of understanding to around level 8 (out of 10). The book as a whole has a good flow between chapters.
The book is replete with step-by-step examples, with good use of screenshots to further aid understanding. Most of the examples are based on the sample Microsoft databases, allowing you follow along, any additional files can be downloaded from the publisher. There’s a helpful (if short) summary at the end of each chapter, and useful links between the chapters. There are plenty of incidental tips introduced during the discussions (e.g. use the Union All transform as a dummy destination).
As the authors admit in Chapter 1, SSIS in SQL Server 2014 is very similar to SSIS in SQL Server 2012, very little has changed. Indeed, perhaps 95% of this book is the ‘same’ as the SQL Server 2012 version of this book. I would suggest if you have the 2012 version of the book you don’t need this version. Similarly, if you deal with SQL Server 2012 and don’t have an SSIS book, you could use this 2014 version of the book. Perhaps publishers should produce a book(let) of changes that exist between the two book versions too.
However joyous it may be for the authors, I found the religious references in the acknowledgments both comical and disconcerting (e.g. I want to thank god). If you swap ‘god’ with Huitzilopochtli or Zeus you’ll understand what I mean.
This book is suitable for both developers that are new to SSIS development, and experienced developers looking for a more complete understanding. I highly recommend it.