Page 1 of 3
Author: Brian Knight et al
Audience: Developers, DBAs, Architects
Reviewer: Ian Stirk
A popular SQL Server Integration Services (SSIS) book receives an upgrade for 2014 – how does it fare?
This is a revised version of a popular SSIS book, updated for SQL Server 2014. SSIS allows you to extract, transform and load (ETL) data. The tool can be used by a variety of users (developers, DBAs, and casual users) and at various levels of expertise.
The introduction highlights the importance of SSIS in transforming and moving data, in a speedy manner. Compared with other related products (e.g. Informatica and DataStage), SSIS is cheap, in that it comes free with SQL Server – similar products can cost many thousands of dollars. The book’s initial chapters are aimed at developers that are new to SSIS, while the later chapters contain patterns and practices targeted at more experienced developers.
Chapter 1 Welcome to SQL Server Integration Services
The chapter starts with a short history of SSIS, with its roots in Data Transformation Services (DTS), evolving into SSIS in SQL Server 2005. Each successive new version of SQL Server typically adding new features and enhanced functionality, this was especially so of SQL Server 2012.
The Import and Export Wizard is introduced as an entry point into SSIS, allowing simple movement and transformation of data. For more complex solutions, SQL Server Data Tools (SSDT) is used, this is where most SSIS work is done.
The architecture of SSIS is outlined, with the major components being:
- Packages (comparable to executables, containing workflow and business logic)
- Control Flow (the brain of the package, orchestrating workflow)
- Data Flow (heart/pump, moving and transforming data from source to destination)
- Variables (allow dynamic evaluation of expressions and decision making)
- Parameters (like variables, allow package to be dynamic)
- Error Handling and Logging (useful for troubleshooting and auditing)
An outline of the common Control Flow tasks is given, together with common sources, destinations, and transformations used by the Data Flow component. Finally, an overview of the various editions of SQL Server is given.
This chapter puts SSIS into context, and provides an overview of what to expect in the rest of the book. The summary list of Control Flow tasks, together with the summary list of Data Flow sources, destinations and transformations, are useful for getting an early understanding of the potential functionality of SSIS. The chapter contains useful links to more detailed chapters. Helpful examples are provided, together with plenty of screenshots and an easy reading style (as it is throughout the book).
Chapter 2 The SSIS Tools
This chapter examines the main tools used in SSIS, namely:
- Import and Export Wizard (useful for quick and simple solutions)
- SSDT (this is a subset of Visual Studio 2013)
- Solutions window (has packages, connections, project parameters etc)
- SQL Server Management Studio (SSMS)
The chapter’s emphasis is on SSDT, since this is where most developers spend most of their time. It is now decoupled from the SQL Server installation, and can be downloaded from the Microsoft website, this should facilitate more frequent updates.
The SSIS package designer is explained in detail. It’s used to specify workflow and data movement, using component building blocks. It contains 5 tabs:
- Control Flow (used for workflow, has precedence constraints)
- Data Flow (adding a Data Flow task to the Control Flow tab creates a Data Flow tab)
- Parameters (these are parameters passed to the package)
- Event Handlers (enables workflows to handle errors, warnings and completion)
- Package Explorer (summarises all design panes in a single view)
The chapter provides a useful overview of the main SSIS wizards and tools. In particular, the Import and Export Wizard, and SSDT with its various tabs explained. In both cases, useful step-by-step walkthrough examples are provided. The separation of workflow (Control Flow tab) and data (Data Flow tab) is especially useful, since this is often a source of confusion when you start to work on SSIS.
Chapter 3 SSIS Tasks
This chapter discusses Control Flow tasks, which are components that perform units of work e.g. send an email when ETL is complete. The task editor is briefly explained, its content varies with the task type. The following tasks are examined and some simple example usage provided:
- Analysis Services Tasks (Analysis Services Execute DDL Task, Analysis Services Processing Task, Data Mining Query Task)
- Data Flow Tasks (Data Profiler, File System Task, FTP Task, Web Service Task, XML Task)
- RDBMS Server Tasks (Bulk Insert Task, Execute SQL Task)
- Workflow Tasks (Execute Package Task, Execute Process Task, Message Queue Task, Send Mail Task, WMI Data Reader Task, WMI Event Watcher Task)
- SMO Administration Tasks (Transfer Database Task, Transfer Error Messages Task, Transfer Logins Task, Transfer Master Stored Procedures Task, Transfer Jobs Task, Transfer SQL Server Objects Task)
This chapter provides a very useful overview of the more common Control Flow tasks, explaining why and how they’re used. There is perhaps too much reference to SSIS’s predecessor DTS – which might be confusing to developers new to ETL.
Chapter 4 The Data Flow
The previous chapter looked at Control Flow tasks, this chapter takes a detailed look at the most commonly used task, the Data Flow task, which links the Control Flow and Data Flow tabs. This is where SSIS developers spend most of their time. In essence Data Flow is largely concerned with taking data from a source, transforming it, and writing it out to a destination.
The following sources are examined, and example usage given: OLE DB, Excel, Flat File, Raw File, XML, and ADO.NET. Similarly, the following destinations are examined, and example usage given: Excel, Flat File, OLE DB, Raw File, Recordset, Data Mining Model Training, and DataReader.
The following transformations are examined, and often example usage given: Aggregate, Conditional Split, Data Conversion, Derived Column, Lookup, Cache, Row Count, Script, Slowly Changing Dimension, Sort, Union All, Audit, Character Map, Copy Column, Data Mining Query, DQS Cleansing, Export Column, Fuzzy Lookup, Fuzzy Grouping, Import Column, Merge, Merge Join, Multicast, OLE DB Command, Percentage and Row Sampling, Pivot, Unpivot, Term Extraction, and Term Lookup.
Finally, a useful step-by-step data flow example is provided. The example takes transactional data from the AdventureWorks database, and performs aggregation, sorting and calculates new columns.
This chapter provides a wide-ranging look at the various sources, destinations, and common transformations used in SSIS work. In most cases, helpful examples are given, additionally some useful tips are provided e.g. use the FastParse option with the Flat File source to improve performance. The impact of synchronous and asynchronous transformations is discussed.
Chapter 5 Using Variables, Parameters, and Expressions
This chapter looks at how variables, parameters and expressions can be used to make packages dynamic, this can be important when running and deploying solutions on different servers e.g. UAT, Production etc.
Examples are provided of using variables, parameters and expressions. The importance of data types and data conversion is explained in some detail – since this is a common problem area. Expressions are especially powerful when combined with conditional logic. The expression builder is also described, together with expression syntax e.g. line continuation and string concatenation.
The chapter explains why dynamic objects are important, especially with reference to expressions. There’s a useful table that maps the different data types between SSIS, SQL Server and .NET – this should help resolve some common data type problems. Typical expression problems together with their solutions are given.