Authors: Sedat Yogurtcuoglu, Henk van der Valk, and Thomas Kejser
Technical Reviewers: David Noor, Bob Bojanic, Xiaoning Ding, An Liu, Carl Rabeler, Robert Zare
Editor: Beth Inghram
The Derived Column transformation is one of the most common data flow items used in Microsoft® SQL Server® Integration Services packages. Typically, existing column values are overwritten by an expression or a new column is added to the data flow using the Derived Column transformation.
Consider the situation where you need to change multiple columns in the pipeline. Because the Derived Column task is a synchronous task, splitting the work done into multiple Derived Column tasks will not add a new execution tree to the execution plan of the Integration Services package. This means that you would expect that using a single Derived Columns task to change multiple columns to have the same speed as using multiple Derived Column transformations to change the columns.
However, our testing has shown that if you want to change more than one column in a data flow by using a Derived Column task, there is a performance benefit to splitting that transformation into multiple Derived Column tasks. In this note we will show that doing many smaller pieces of work in multiple Derived Column tasks is significantly faster than doing all the work in a more complex work in a single Derived Column task.
We performed the tests using SQL Server 2008 R2 RC0 running on a 96-core Unisys ES7000 Server. To eliminate other variables, we used simple package design that reads from a flat file source, calculates data in one or more Derived Column tasks, and then sends the result to a Row Count task transformation.
Before we ran the Derived Column transformation, we needed a baseline of the throughput of reading a file without transformations. To measure this, we used the data source and a row count.
Fig. 1: Baseline without transformations
The input file contained 1 million rows with 96 columns and has a size of 1 GB. It took 23 seconds, with a throughput of 50 MB per second, to read the data into a row count.
The Integration Services flat file reader retrieved data with 128 KB Read IOPS, which was completed within 1 millisecond. The average CPU utilization of the the DTEXEC process was less than 5 percent. There was room to spare for our transformations, and there was no I/O bottleneck.
Next, we established the baseline for the Derived Column task. Initially, we tested with 96 transformations in a single Derived Column task. The package we ran looks like this.
Fig. 2: One derived column task
The run time was 56 seconds, and throughput was 20 MB per second.
We then proceeded to split the transformations into more Derived Column tasks. For example, we ran a package like this.
Fig. 3: Derived Column task package
Each Derived Column task included 24 tranformations, or a quarter of the original 96 transformations. The run time was 24 seconds, and throughput was 47 MB per second.The following table and graph show the tests we did with varying numbers of Derived Column tasks used to perform the 96 transformations.
Row count only
1 DC
2 DC
4 DC
8 DC
16 DC
32 DC
64 DC
96 DC
Duration (seconds)
23
56
32
24
25
26
35
CPU Util% (avg)
5
0.25
0.354
0.289
0.315
0.263
0.204
0.223
11.84
DTEXEC.exe process
140
263
349
354
432
561
540
Effective throughput approx. (Read Bytes /sec )
50
20
36
47
Buffer Memory (bytes max)
41.940K
73.396K
41.490K
52.426K
62.911K
# Buffers in use (max)
8
10
13
16
17
* DC stands for derived column
** The tests were performed on a 96-core Unisys ES7000/7600R server
Fig. 4: Measuring derived columns (DC) and duration
Even inside a single execution tree in Integration Services you can increase parallelism. As long as source throughput keeps up, Integration Services will spawn multiple buffers (up to five for sources, more for asynchronous transformations) in a single execution tree.
Fig. 5: Multiple buffers within an execution tree
The Derived Column task performs each column transformation serially inside a single task on the currently active buffer. However, when you split the column transformations into multiple Derived Column tasks, each individual column transformation can now be fully parallelized across multiple buffers. As you will notice from the test results, there is a sweet spot around four or five buffers where throughput reaches the peak. Because the data source cannot provide more than five buffers concurrently, five is also the maximum amount of added parallelism we can gain in this scenario by splitting into multiple Derived Column tasks. If you have a scenario with multiple sources, this sweet spot may be different in your case.
Synchronous tasks do NOT add a new execution tree to the execution plan of the Integration Services package – and hence, you would not expect synchronous tasks chained together to increase throughput of a pipeline. However, even inside one synchronous execution tree, you can increase parallelism by splitting the transformation into multiple task steps.
If you have a Derived Column task that contains complex operations, splitting the work in the Derived Column task into multiple tasks can improve overall throughput performance.
Pingback from Twitter Trackbacks for Increasing Throughput of Pipelines by Splitting Synchronous Transformations into Multiple Tasks - Technical Notes [sqlcat.com] on Topsy.com
Pingback from How to Speed Up SSIS Derived Columns Transformations | Henk's tech blog
A new technical note about splitting up transforms to increase performance has been posted on the SQLCAT
Pingback from Primera Bravo Pro ? Best Printer for Cds & Dvds | DVD HOME
A derived column taszk egy szinkron transzformációt megvalósító egység, így definíció szerint a taszk
Pingback from Increasing Throughput of Pipelines by Splitting Synchronous Transformations into Multiple Tasks | Best Servers Directory
Pingback from Work Flow or Stored Procedures « Data Warehousing and Business Intelligence
Pingback from Work Flow vs Stored Procedures « Data Warehousing and Business Intelligence