Bulk Loading Data into a Table with Concurrent Queries

Rate This
  • Comments 1
Authors: Alexei Khalyako, Stuart Ozer
Contributor: Sanjay Mishra
Technical Reviewers: Mark Souza, Denny Lee, Burzin Patel

Introduction

This article describes load scenarios for the common data warehouse scenario in which many queries read data from a table while the table is loaded. The key question that administrator/developer has to answer before deciding on a strategy is whether the read queries can afford to wait. If they can wait, you won’t want to complicate the loading process by running queries simultaneously with the load. However, if you have a business need to run queries while you load data, read on.

For these tests, we used Microsoft® SQL Server® 2008. We created two identical tables, each containing 18 million rows totaling 2.6 GB. One table was organized as a heap, and the other was organized as a clustered index. These tables were used as targets into which data was bulk inserted. The input data file for the bulk insert had 65,535 rows. We were not concerned about performance differences between loading heaps or indexes; we were looking instead at the impact of queries concurrent with loads. 

One of the test’s goals was to understand whether read committed snapshot isolation (RCSI) makes any difference in query concurrency during loads. Does it help keep readers “unblocked” from the loading process? And how does the WITH TABLOCK option affect the bulk loading process?

Bulk Insert into Heap Table

For our first test, we loaded data into the heap with RCSI turned off (that is, with the default READ COMMITTED isolation level).

BULK INSERT Without TABLOCK hint, Read Committed

To imitate a data warehouse workload, we had five concurrent connections executing random SELECT statements on the table. While the SELECT statements were running, we started the bulk load operation. We used the sys.dm_exec_requests dynamic management view (DMV) to monitor requests on the server, and we observed that the BULK INSERT statement was waiting in the queue with LCK_M_IX wait type. Figure 1. shows the output of sys.dm_exec_requests while the BULK INSERT statement was being blocked by the SELECT statement. As soon as the BULK INSERT operation started loading data (Figure 2), all SELECT statements were blocked by the BULK INSERT statement until it completed.

 

Figure 1 : BULK INSERT is waiting for the SELECT statement to complete

 

Figure 2 : BULK INSERT is loading data, and SELECT statements are blocked

BULK INSERT With TABLOCK hint, Read Committed

For this test, the same five SELECT statements were running concurrently when we started the bulk insert, but this time we specified the WITH TABLOCK option. This time the sys.dm_exec_requests DMV indicated that the BULK INSERT operation was waiting with the LCK_M_BU wait type (Figure 3). The LCK_M_BU wait type occurs in SQL Server when a task is waiting to acquire a Bulk Update (BU) lock. Just as with our first test, as soon as BULK INSERT started loading data, we saw that the SELECT statements (Figure 4) were blocked by the bulk insert operation.

 

Figure 3: BULK INSERT with TABLOCK waiting for the SELECT statement to complete

 

Figure 4: SELECT statements waiting for the BULK INSERT statement with TABLOCK to complete

For our second test, we did the same test but with RCSI enabled. We wanted to see whether it would produce different results.

BULK INSERT Without TABLOCK Hint, Read Committed Snapshot Isolation

When we loaded data into the heap table without using the TABLOCK hint, we did not observe any waits. We concluded that option provides the maximum amount of concurrency, if you need to load data into a table while you run SELECT queries.

 

 Figure 5: BULK INSERT into HEAP table with no TABLOCK hint

BULK INSERT with TABLOCK Hint, Read Committed Snapshot Isolation

Despite the fact that we were using RCSI while we were loading data into the table, the sys.dm_exec_requests DMV indicated that the BULK INSERT operation was waiting for a LCK_M_IX lock and that it was being blocked by an active SELECT statement.

Two other SELECT queries, which were issued after the BULK INSERT command started, were observed waiting on the LCK_M_S lock type.

 

Figure 6: BULK INSERT into HEAP with the TABLOCK hint

As soon as the SELECT statement with session_id 59 (in Figure 6) was completed, BULK INSERT started loading data, with the selects in sessions 55 and 60 continuing to wait until the BULK INSERT command completed.

We did observe another interesting effect. So far we were dealing only with the BULK INSERT command. But SQL Server 2008 introduced bulk optimization the INSERT INTO … SELECT statement. This operation behaved like BULK INSERT in all the tests on the heap table. The only exception was that if we loaded data into the table using INSERT INTO … SELECT with the TABLOCK hint under RCSI mode, none of the readers was blocked.

  
Isolation level TABLOCK specified? TIME to load data (min:sec) WAIT TYPE
Read Committed NO TABLOCK 3:34 LCK_M_IX
Read Committed WITH TABLOCK 1:28

LCK_M_BU*

Read Committed Snapshot Isolation NO TABLOCK 0:04 NONE
Read Committed Snapshot Isolation WITH TABLOCK 1:03 LCK_M_IX
 

*Very short; difficult to measure how long it was taken

Table 1: Duration of data loading into the heap table

BULK INSERT into a Table with Clustered Index

Next, we loaded data into the table with a clustered index and with RCSI turned off. This test didn’t show any surprises and behaved as expected: All read operations were blocked while BULK INSERT was loading data into the table.

If RCSI was enabled, no blocking was observed with BULK INSERT running with concurrent read operations. As we expected, enabling RCSI eliminated locks for read operations that allowed simultaneous loading of the data into the table.

 

Figure 7: BULK INSERT into a table with a clustered index, no TABLOCK hint, and RCSI turned off

 

Figure 8: BULK INSERT into a table with a clustered index, a TABLOCK hint, and RCSI turned off

 

Figure 9: BULK INSERT into a table with a clustered index and RCSI enabled

Figure 9 shows SELECT statements reading from the table while a BULK INSERT is running. BULK INSERT behaved similarly both with and without the TABLOCK hint. This helped us conclude that if you are loading data with RSCI enabled, the TABLOCK hint in the BULK INSERT statement doesn’t play a significant role, as far as concurrency is concerned.

Isolation level TABLOCK specified? TIME to load data (min:sec) WAIT TYPE
Read Committed WITH TABLOCK 6:49 LCK_M_IX
Read Committed NO TABLOCK 10:18 LCK_M_X
Read Committed Snapshot Isolation WITH TABLOCK 01:01 NONE
Read Committed Snapshot Isolation NO TABLOCK  0:48 NONE
 

Table 2: Duration of data loading into the table with a clustered index

 

The only time that readers were blocked by bulk operations was when data was loaded into an empty table with the clustered index built on it. In that special case only, we observed that read operations were blocked by bulk load operations and read operations if they had the LCK_M_SCH_S wait type. This lock was released only after the loading batch was completed.

Conclusion

RSCI can provide great benefits, if you are bulk loading into the table with concurrent readers working on it . In most cases, it provides you with the ability to read while bulk loads are performed. Bulk loading does not affect the size of the version store in tempdb under RCSI; however, RCSI cannot be enabled at the table level. It can be enabled on the entire database only. Therefore you should carefully analyze your workload to ensure that other operations on your database will not cause the tempdb size to explode.

Note that RCSI will introduce 14 noncompressible bytes into every row in every table in the database (http://blogs.msdn.com/sqlserverstorageengine/archive/2008/03/30/overhead-of-row-versioning.aspx). An alternative strategy for concurrent readers is to execute queries using the READ UNCOMMITTED isolation level (also known as a dirty read), but this requires application changes, and it can deliver transitionally inconsistent results. RCSI requires no application change, and it guarantees consistent results.

For more information about data loading strategies and scenarios, see the Data Loading Performance Guide at http://msdn.microsoft.com/en-us/library/dd425070.aspx

  

 

Attachment: Figure1.gif
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post
Page 1 of 1 (1 items)