Increasing Throughput of Pipelines by Splitting Synchronous Transformations into Multiple Tasks

  • Comments 8

Increasing Throughput of Pipelines by Splitting Synchronous Transformations into Multiple Tasks

Authors: Sedat Yogurtcuoglu, Henk van der Valk, and Thomas Kejser

Technical Reviewers: David Noor, Bob Bojanic, Xiaoning Ding, An Liu, Carl Rabeler, Robert Zare

Editor: Beth Inghram

Introduction

The Derived Column transformation is one of the most common data flow items used in Microsoft® SQL Server® Integration Services packages. Typically, existing column values are overwritten by an expression or a new column is added to the data flow using the Derived Column transformation.

Consider the situation where you need to change multiple columns in the pipeline. Because the Derived Column task is a synchronous task, splitting the work done into multiple Derived Column tasks will not add a new execution tree to the execution plan of the Integration Services package. This means that you would expect that using a single Derived Columns task to change multiple columns to have the same speed as using multiple Derived Column transformations to change the columns.

However, our testing has shown that if you want to change more than one column in a data flow by using a Derived Column task, there is a performance benefit to splitting that transformation into multiple Derived Column tasks. In this note we will show that doing many smaller pieces of work in multiple Derived Column tasks is significantly faster than doing all the work in a more complex work in a single Derived Column task.


Test Setup

We performed the tests using SQL Server 2008 R2 RC0 running on a 96-core Unisys ES7000 Server. To eliminate other variables, we used simple package design that reads from a flat file source, calculates data in one or more Derived Column tasks, and then sends the result to a Row Count task transformation.

Before we ran the Derived Column transformation, we needed a baseline of the throughput of reading a file without transformations. To measure this, we used the data source and a row count.

image

Fig. 1: Baseline without transformations

The input file contained 1 million rows with 96 columns and has a size of 1 GB. It took 23 seconds, with a throughput of 50 MB per second, to read the data into a row count.

The Integration Services flat file reader retrieved data with 128 KB Read IOPS, which was completed within 1 millisecond. The average CPU utilization of the the DTEXEC process was less than 5 percent. There was room to spare for our transformations, and there was no I/O bottleneck.

Next, we established the baseline for the Derived Column task. Initially, we tested with 96 transformations in a single Derived Column task. The package we ran looks like this.

image

Fig. 2: One derived column task

The run time was 56 seconds, and throughput was 20 MB per second.

We then proceeded to split the transformations into more Derived Column tasks. For example, we ran a package like this.

image

Fig. 3: Derived Column task package

Each Derived Column task included 24 tranformations, or a quarter of the original 96 transformations. The run time was 24 seconds, and throughput was 47 MB per second.The following table and graph show the tests we did with varying numbers of Derived Column tasks used to perform the 96 transformations.

 

Row count only

1 DC

2 DC

4 DC

8 DC

16 DC

32 DC

64 DC

96 DC

Duration (seconds)

23

56

32

24

24

25

25

26

35

CPU Util% (avg)

5

 0.25

 0.354

 0.289

 0.315

 0.263

 0.204

 0.223

11.84

DTEXEC.exe process

CPU Util% (avg)

5

140

263

349

349

354

432

561

540

Effective throughput approx.
(Read Bytes /sec )

50

20

36

47

47

47

47

32

32

Buffer Memory (bytes max)

41.940K

     73.396K   

     73.396K   

     41.490K

     41.940K

     41.940K

     52.426K

     62.911K

73.396K

# Buffers in use (max)

5

8

10

10

10

10

13

16

17

 

 

 

 

 

 

 

 

 

* DC stands for derived column

** The tests were performed on a 96-core Unisys ES7000/7600R server

image

Fig. 4: Measuring derived columns (DC) and duration

Why This Works

Even inside a single execution tree in Integration Services you can increase parallelism. As long as source throughput keeps up, Integration Services will spawn multiple buffers (up to five for sources, more for asynchronous transformations) in a single execution tree.

image

Fig. 5: Multiple buffers within an execution tree

The Derived Column task performs each column transformation serially inside a single task on the currently active buffer. However, when you split the column transformations into multiple Derived Column tasks, each individual column transformation can now be fully parallelized across multiple buffers. As you will notice from the test results, there is a sweet spot around four or five buffers where throughput reaches the peak. Because the data source cannot provide more than five buffers concurrently, five is also the maximum amount of added parallelism we can gain in this scenario by splitting into multiple Derived Column tasks. If you have a scenario with multiple sources, this sweet spot may be different in your case.

Conclusion

Synchronous tasks do NOT add a new execution tree to the execution plan of the Integration Services package – and hence, you would not expect synchronous tasks chained together to increase throughput of a pipeline. However, even inside one synchronous execution tree, you can increase parallelism by splitting the transformation into multiple task steps.

If you have a Derived Column task that contains complex operations, splitting the work in the Derived Column task into multiple tasks can improve overall throughput performance.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post
Page 1 of 1 (8 items)
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post