This article describes load scenarios for the common data warehouse scenario in which many queries read data from a table while the table is loaded. The key question that administrator/developer has to answer before deciding on a strategy is whether the read queries can afford to wait. If they can wait, you won’t want to complicate the loading process by running queries simultaneously with the load. However, if you have a business need to run queries while you load data, read on.
For these tests, we used Microsoft® SQL Server® 2008. We created two identical tables, each containing 18 million rows totaling 2.6 GB. One table was organized as a heap, and the other was organized as a clustered index. These tables were used as targets into which data was bulk inserted. The input data file for the bulk insert had 65,535 rows. We were not concerned about performance differences between loading heaps or indexes; we were looking instead at the impact of queries concurrent with loads.
One of the test’s goals was to understand whether read committed snapshot isolation (RCSI) makes any difference in query concurrency during loads. Does it help keep readers “unblocked” from the loading process? And how does the WITH TABLOCK option affect the bulk loading process?
For our first test, we loaded data into the heap with RCSI turned off (that is, with the default READ COMMITTED isolation level).
To imitate a data warehouse workload, we had five concurrent connections executing random SELECT statements on the table. While the SELECT statements were running, we started the bulk load operation. We used the sys.dm_exec_requests dynamic management view (DMV) to monitor requests on the server, and we observed that the BULK INSERT statement was waiting in the queue with LCK_M_IX wait type. Figure 1. shows the output of sys.dm_exec_requests while the BULK INSERT statement was being blocked by the SELECT statement. As soon as the BULK INSERT operation started loading data (Figure 2), all SELECT statements were blocked by the BULK INSERT statement until it completed.
Figure 1 : BULK INSERT is waiting for the SELECT statement to complete
Figure 2 : BULK INSERT is loading data, and SELECT statements are blocked
For this test, the same five SELECT statements were running concurrently when we started the bulk insert, but this time we specified the WITH TABLOCK option. This time the sys.dm_exec_requests DMV indicated that the BULK INSERT operation was waiting with the LCK_M_BU wait type (Figure 3). The LCK_M_BU wait type occurs in SQL Server when a task is waiting to acquire a Bulk Update (BU) lock. Just as with our first test, as soon as BULK INSERT started loading data, we saw that the SELECT statements (Figure 4) were blocked by the bulk insert operation.
Figure 3: BULK INSERT with TABLOCK waiting for the SELECT statement to complete
Figure 4: SELECT statements waiting for the BULK INSERT statement with TABLOCK to complete
For our second test, we did the same test but with RCSI enabled. We wanted to see whether it would produce different results.
When we loaded data into the heap table without using the TABLOCK hint, we did not observe any waits. We concluded that option provides the maximum amount of concurrency, if you need to load data into a table while you run SELECT queries.
Figure 5: BULK INSERT into HEAP table with no TABLOCK hint
Despite the fact that we were using RCSI while we were loading data into the table, the sys.dm_exec_requests DMV indicated that the BULK INSERT operation was waiting for a LCK_M_IX lock and that it was being blocked by an active SELECT statement.
Two other SELECT queries, which were issued after the BULK INSERT command started, were observed waiting on the LCK_M_S lock type.
Figure 6: BULK INSERT into HEAP with the TABLOCK hint
As soon as the SELECT statement with session_id 59 (in Figure 6) was completed, BULK INSERT started loading data, with the selects in sessions 55 and 60 continuing to wait until the BULK INSERT command completed.
We did observe another interesting effect. So far we were dealing only with the BULK INSERT command. But SQL Server 2008 introduced bulk optimization the INSERT INTO … SELECT statement. This operation behaved like BULK INSERT in all the tests on the heap table. The only exception was that if we loaded data into the table using INSERT INTO … SELECT with the TABLOCK hint under RCSI mode, none of the readers was blocked.
LCK_M_BU*
*Very short; difficult to measure how long it was taken
Table 1: Duration of data loading into the heap table
Next, we loaded data into the table with a clustered index and with RCSI turned off. This test didn’t show any surprises and behaved as expected: All read operations were blocked while BULK INSERT was loading data into the table.
If RCSI was enabled, no blocking was observed with BULK INSERT running with concurrent read operations. As we expected, enabling RCSI eliminated locks for read operations that allowed simultaneous loading of the data into the table.
Figure 7: BULK INSERT into a table with a clustered index, no TABLOCK hint, and RCSI turned off
Figure 8: BULK INSERT into a table with a clustered index, a TABLOCK hint, and RCSI turned off
Figure 9: BULK INSERT into a table with a clustered index and RCSI enabled
Figure 9 shows SELECT statements reading from the table while a BULK INSERT is running. BULK INSERT behaved similarly both with and without the TABLOCK hint. This helped us conclude that if you are loading data with RSCI enabled, the TABLOCK hint in the BULK INSERT statement doesn’t play a significant role, as far as concurrency is concerned.
Table 2: Duration of data loading into the table with a clustered index
The only time that readers were blocked by bulk operations was when data was loaded into an empty table with the clustered index built on it. In that special case only, we observed that read operations were blocked by bulk load operations and read operations if they had the LCK_M_SCH_S wait type. This lock was released only after the loading batch was completed.
RSCI can provide great benefits, if you are bulk loading into the table with concurrent readers working on it . In most cases, it provides you with the ability to read while bulk loads are performed. Bulk loading does not affect the size of the version store in tempdb under RCSI; however, RCSI cannot be enabled at the table level. It can be enabled on the entire database only. Therefore you should carefully analyze your workload to ensure that other operations on your database will not cause the tempdb size to explode.
Note that RCSI will introduce 14 noncompressible bytes into every row in every table in the database (http://blogs.msdn.com/sqlserverstorageengine/archive/2008/03/30/overhead-of-row-versioning.aspx). An alternative strategy for concurrent readers is to execute queries using the READ UNCOMMITTED isolation level (also known as a dirty read), but this requires application changes, and it can deliver transitionally inconsistent results. RCSI requires no application change, and it guarantees consistent results.
For more information about data loading strategies and scenarios, see the Data Loading Performance Guide at http://msdn.microsoft.com/en-us/library/dd425070.aspx
Pingback from Pages tagged "concurrent"