Table-Valued Functions and tempdb Contention

  • Comments 3

Published: January 2011

Author: Steve Howard

Technical Reviewers: Sunil Agarwal, Shaun Tinline-Jones, Alexei Khalyako, Thomas Kejser, Mike Ruthruff, Kun Cheng

Overview

tempdb contention can be caused by using multi-statement table-valued functions (TVFs) in certain parts of queries, such as the WHERE clause of a query or the column list of a SELECT query. With a multi-statement TVF, a table variable is created and dropped on each call to the TVF, leading to potentially thousands of table variable creations, allocations, and deallocations for a single query. Simultaneous execution of these queries that have multi-statement TVFs can create contention, and this contention can negatively affect the performance of SQL Server even if best practices for tempdb configuration are applied.

This technical note explores the detection, cause, and elimination of tempdb contention caused by using multi-statement TVFs in certain parts of queries. The paper uses examples to demonstrate several options for reducing contention and provides extensive links for more detailed information.

NOTE: This technical note is intended for experienced database administrators (DBAs) and software developers who have an understanding of Microsoft SQL Server fundamentals.  

Detecting tempdb Contention

When troubleshooting a problem with SQL Server performance, the first question to ask is: “What is SQL Server waiting for?” For the answer, you can use the Dynamic Management View (DMV) sys.dm_exec_requests.

Querying the sys.dm_exec_requests DMV returns information about each request that is executing within SQL Server. The query shows you information about currently executing user processes that are waiting: what each process is currently doing and what each process is waiting for.

The following code shows such a query.

SELECT

      r.session_id

      ,r.status

      ,r.command

      ,r.database_id

      ,r.blocking_session_id

      ,r.wait_type

      ,AVG(r.wait_time) AS [WaitTime]

      ,r.wait_resource

 

FROM

      sys.dm_exec_requests AS r

      INNER JOIN

      sys.dm_exec_sessions AS s ON(r.session_id = s.session_id)

WHERE

      r.wait_type IS NOT NULL

      AND s.is_user_process = 1

GROUP BY GROUPING SETS((

      r.session_id

      ,r.status

      ,r.command

      ,r.database_id

      ,r.blocking_session_id

      ,r.wait_type

      ,r.wait_time

      ,r.wait_resource), ())

NOTE: You may also find it helpful to query sys.dm_os_wait_stats for evidence of PAGELATCH waits. While the sys.dm_os_wait_stats DMV does not link PAGELATCH waits to a specific session or wait resource, high levels of PAGELATCH waits may indicate that further investigation into allocation contention is needed.

For example, running the query on a poorly performing SQL Server may return the following result set.

Table 1. First result set showing contention on tempdb

session_id

status

command

database_id

blocking_session_id

wait_type

WaitTime

wait_resource

109

suspended

INSERT

5

600

PAGELATCH_UP

29

2:1:1

118

suspended

INSERT

5

600

PAGELATCH_UP

24

2:1:1

120

suspended

INSERT

5

600

PAGELATCH_UP

16

2:1:1

150

suspended

INSERT

5

600

PAGELATCH_UP

24

2:1:1

165

suspended

INSERT

5

600

PAGELATCH_UP

24

2:1:1

175

suspended

INSERT

5

600

PAGELATCH_UP

1

2:1:1

206

suspended

INSERT

5

600

PAGELATCH_UP

6

2:1:1

210

suspended

INSERT

5

600

PAGELATCH_UP

5

2:1:1

242

suspended

INSERT

5

600

PAGELATCH_UP

30

2:1:1

247

suspended

INSERT

5

600

PAGELATCH_UP

15

2:1:1

259

suspended

INSERT

5

600

PAGELATCH_UP

7

2:1:1

260

suspended

INSERT

5

600

PAGELATCH_UP

7

2:1:1

305

suspended

INSERT

5

600

PAGELATCH_UP

29

2:1:1

358

suspended

INSERT

5

600

PAGELATCH_SH

1

2:1:1

361

suspended

INSERT

5

600

PAGELATCH_UP

24

2:1:1

370

suspended

INSERT

5

600

PAGELATCH_UP

17

2:1:1

401

suspended

SELECT

5

600

PAGELATCH_UP

24

2:1:1

419

suspended

INSERT

5

600

PAGELATCH_UP

17

2:1:1

450

suspended

INSERT

5

600

PAGELATCH_UP

16

2:1:1

482

suspended

INSERT

5

600

PAGELATCH_UP

21

2:1:1

492

suspended

INSERT

5

600

PAGELATCH_UP

26

2:1:1

510

suspended

INSERT

5

600

PAGELATCH_UP

10

2:1:1

518

suspended

INSERT

5

600

PAGELATCH_UP

29

2:1:1

540

suspended

INSERT

5

600

PAGELATCH_UP

24

2:1:1

561

suspended

INSERT

5

600

PAGELATCH_UP

21

2:1:1

562

suspended

INSERT

5

600

PAGELATCH_UP

29

2:1:1

607

suspended

INSERT

5

600

PAGELATCH_UP

29

2:1:1

609

suspended

INSERT

5

600

PAGELATCH_UP

22

2:1:1

617

suspended

INSERT

5

600

PAGELATCH_UP

9

2:1:1

656

suspended

INSERT

5

600

PAGELATCH_UP

7

2:1:1

662

suspended

INSERT

5

600

PAGELATCH_UP

17

2:1:1

665

suspended

INSERT

5

600

PAGELATCH_UP

25

2:1:1

667

suspended

INSERT

5

600

PAGELATCH_UP

11

2:1:1

NULL

NULL

NULL

NULL

NULL

NULL

18

NULL

 

NOTE: When using grouping sets with one empty set of parentheses as in the preceding example, one row has NULLs for all columns except the aggregated column. The aggregated column in this one row is an aggregate of the aggregations. In the example above, the WaitTime column is averaged for each process, and the grouped row is an average of the averages.

In the earlier example, many processes are blocked by session_id 600, but because none of these queries are waiting for locks, this is not considered lock blocking. The wait_type shows that the suspended queries are waiting for PAGELATCH_UP and shows that the suspended queries have been waiting for an average of 18 milliseconds (ms). While 18 ms is a relatively short period of time, the large number of processes being blocked by a single resource means that cumulative wait degrades system performance and leads to long waits for the query response for users.

The database_id in the earlier example is 5, but this is not the location of the resource for which the processes are waiting. To find the location of the bottleneck, see the wait_resource column. In the earlier example, every process is waiting for a resource identified as 2:1:1. For a PAGELATCH, this resource description follows the format DatabaseID:fileID:pageID.

A databaseID of 2 is always tempdb. A fileID of 1 is always the primary data file for a database (fileID 2 is the transaction log). These results therefore show that there is contention on pageID 1 of the primary data file on tempdb.

From the results in Table 1 and recognizing that the contention is on fileID 1, an experienced DBA may decide that what is needed is one equally sized data file for each core, as recommended in Storage Top 10 Best Practices (http://technet.microsoft.com/en-us/library/cc966534.aspx). However, a query for tempdb files on this 8-core computer shows that it already has 8 equally sized tempdb data files, as shown in Table 2.

SELECT DB_NAME(database_id) as database_name, physical_name, size

FROM sys.master_files

WHERE DB_NAME(database_id) = 'tempdb'

Table 2. Results from sys.master_files shows tempdb with eight equally sized data files

database_name

physical_name

size

Tempdb                                                      

d:\mssql_tempdb\tempdev.mdf                                                                     

32,768                 

tempdb

D:\MSSQL_TEMPDB\templog.ldf

64

tempdb

D:\MSSQL_TEMPDB\tempdev2.ndf

32,768

tempdb

D:\MSSQL_TEMPDB\tempdev3.ndf

32,768

tempdb

D:\MSSQL_TEMPDB\tempdev4.ndf

32,768

tempdb

D:\MSSQL_TEMPDB\tempdev5.ndf

32,768

tempdb

D:\MSSQL_TEMPDB\tempdev6.ndf

32,768

tempdb

D:\MSSQL_TEMPDB\tempdev7.ndf

32,768

tempdb

D:\MSSQL_TEMPDB\tempdev8.ndf

32,768


 

(Note that this computer also has eight processor cores, though this is not visible here. The number of cores can usually be obtained by running msinfo32 from a command prompt.)

NOTE: You can query sys.database_files from any database to get a full list of the files and information about whether the files are transaction logs or rows. For more information about this file catalog view, see sys.database_files (Transact-SQL) (http://msdn.microsoft.com/en-us/library/ms174397.aspx).

For the preceding example, a subsequent query of sys.dm_exec_requests returns the following results, which show the contention moving to different files in tempdb:

Table 3. Subsequent queries show contention moving to different files in tempdb

session_id

status

command

database_id

blocking_session_id

wait_type

WaitTime

wait_resource

53

suspended

INSERT

2

541

PAGELATCH_UP

31

2:3:1

65

suspended

INSERT

5

541

PAGELATCH_UP

37

2:3:1

113

suspended

SELECT

5

541

PAGELATCH_UP

24

2:3:1

152

suspended

INSERT

5

541

PAGELATCH_UP

24

2:3:1

179

suspended

INSERT

5

541

PAGELATCH_UP

5

2:3:1

182

suspended

INSERT

5

541

PAGELATCH_UP

24

2:3:1

228

suspended

INSERT

5

541

PAGELATCH_UP

34

2:3:1

882

suspended

INSERT

5

740

PAGELATCH_UP

38

2:3:1

(Rows removed for brevity)

932

suspended

INSERT

5

740

PAGELATCH_UP

26

2:3:1

NULL

NULL

NULL

NULL

NULL

NULL

23

NULL

Further querying shows that the contention migrates among all of the data files periodically and does not always appear on pageID 1.

For example, a query may return the following results, which show the contention on a different pageID.

Table 4. Contention sometimes shows on pageID 3 instead of pageID 1

session_id

status

command

database_id

blocking_session_id

wait_type

WaitTime

wait_resource

118

suspended

INSERT

5

493

PAGELATCH_UP

2

2:7:3

120

suspended

INSERT

5

493

PAGELATCH_UP

5

2:7:3

150

suspended

INSERT

5

490

PAGELATCH_UP

28

2:7:3

259

suspended

INSERT

5

609

PAGELATCH_UP

28

2:7:3

260

suspended

INSERT

5

609

PAGELATCH_UP

9

2:7:3

296

suspended

INSERT

5

609

PAGELATCH_UP

25

2:7:3

305

suspended

INSERT

5

609

PAGELATCH_UP

7

2:7:3

321

suspended

INSERT

5

609

PAGELATCH_UP

0

2:7:3

351

suspended

INSERT

5

609

PAGELATCH_UP

36

2:7:3

876

suspended

INSERT

5

609

PAGELATCH_UP

2

2:7:3

(Rows removed for brevity )

NULL

NULL

NULL

NULL

NULL

NULL

16

NULL

Understanding the Contention

After you have detected tempdb contention, you can start to determine what this contention means.

Overview of PFS, GAM, and SGAM Pages

Every data file begins with the same series of pages, which keep track of file allocation and usage:

·         PageID 1 is the Page Free Space or PFS page.
The PFS page keeps track of which pages are allocated. It also keeps track of how full (in percentage) those pages are.

·         PageID 2 is the Global Allocation Map or GAM page.
The GAM page keeps track of which extents are allocated and which are available for allocation.

·         PageID 3 is the Shared Global Allocation Map or SGAM page.
Because it is inefficient to allocate a full extent for small objects, the first eight pages for any object are allocated on shared extents, and subsequent space for a table or index is allocated on uniform extents. The SGAM page keeps track of which extents are shared extents.

You can verify the type of any page contained in the SQL Server buffer by querying sys.dm_os_buffer_descriptors.

SELECT DB_NAME(database_id) as dbname,page_type

FROM sys.dm_os_buffer_descriptors

WHERE database_id = 2 AND file_id = 1 AND page_id = 1

You should change the database_id, file_id, and page_id to reflect the page you are interested in. For example, querying sys.dm_os_buffer_descriptors for the wait_resource in Table 1 shows that pageID 1 is a PFS page:

Table 5. Results of buffer descriptors

dbname

page_type

tempdb

PFS_PAGE

Whenever data is inserted into a table that does not have a clustered index, the PFS page must be accessed to determine which page contains enough free space to hold the row that is being inserted. If data is being inserted by several processes simultaneously, all of these processes need to find free space in which to place the data.

These processes can all insert data into separate tables without contention on any Index Allocation Map (IAM) pages because each table or index has its own set of IAMs. Each process, however, must refer to the PFS page to find a page with enough free space to hold the row(s) being inserted, and each process must then update the PFS page after data is inserted. If a large number of separate processes are inserting data into heap tables, and those tables are in an area of a data file where allocation is tracked by a single PFS, the referral to the PFS page and the subsequent updating of this page can become a bottleneck.

Because many different processes can create and drop objects at the same time, and because by default the first eight pages are allocated in shared extents, each process must refer to the SGAM to find available shared extents, and each process must then update the SGAM when the extent is no longer used. With a large number of processes dealing with rapid allocation and deallocation of pages on small tables or indexes, SGAM updating can become a bottleneck.

In the earlier example, both PFS and SGAM contention can occur, and both can be caused by the same set of processes.

Trace Flag 1118 Fails to Resolve the Issue

When DBAs identify SGAM contention, they typically enable trace flag 1118. After turning trace flag 1118 on in the earlier example, contention no longer shows on SGAM; contention may show on additional PFS pages, however, and the total contention may actually increase.

Table 6. Enabling trace flag 1118 results in a change in the pattern of contention

session_id

status

command

database_id

blocking_session_id

wait_type

WaitTime

wait_resource

65

suspended

INSERT

5

932

PAGELATCH_UP

20

2:8:24264

113

suspended

INSERT

5

932

PAGELATCH_UP

9

2:8:24264

152

suspended

INSERT

5

932

PAGELATCH_UP

15

2:8:24264

179

suspended

INSERT

5

932

PAGELATCH_UP

9

2:8:24264

182

suspended

INSERT

5

932

PAGELATCH_UP

7

2:8:24264

210

suspended

INSERT

5

932

PAGELATCH_UP

12

2:8:24264

212

suspended

INSERT

5

932

PAGELATCH_UP

24

2:8:24264

214

suspended

INSERT

5

932

PAGELATCH_UP

12

2:8:24264

(Rows removed for brevity)

791

suspended

INSERT

5

932

PAGELATCH_UP

24

2:8:24264

NULL

NULL

NULL

NULL

NULL

NULL

20

NULL

Querying sys.dm_os_buffer_descriptors shows that page 24264 is another PFS page. If you continue querying in this example, you may notice that the contention migrates among PFS pages across the tempdb data files, and the total contention has not been reduced.

Identifying the Cause of the Contention

A DBA can usually reduce or even eliminate tempdb contention, but there are some scenarios in which the tempdb contention cannot be resolved by the DBA. The earlier example is such a case because the contention is caused by what is happening in the query l

sys.dm_exec_requests contains a sql_handle column, a statement_start_offset column, and a statement_end_offset column. You can use these columns to examine the waiting processes and see which queries are causing the contention. To use these columns, modify the query that has been used to get the waiting processes as follows.

SELECT r.session_id, r.status, r.command,

r.database_id, r.blocking_session_id, r.wait_type,

r.wait_time, r.wait_resource, t.text,

stmt = SUBSTRING(t.text, (r.statement_start_offset/2) + 1,

      CASE r.statement_end_offset

            WHEN -1 THEN DATALENGTH(t.text)

            ELSE (r.statement_end_offset - r.statement_start_offset)/2

      end)

FROM sys.dm_exec_requests r join sys.dm_exec_sessions s on r.session_id = s.session_id

CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t

WHERE wait_type IS NOT NULL and s.is_user_process = 1

This query returns the information you typically see with the waiting processes and returns the full batch text in the TEXT column and the specific statement within each batch in the STMT column.

In the example, the results show many rows that are executing a function and a few that are executing a stored procedure, as follows.*

CREATE FUNCTION Person.USR_GetTopDuplicateCustomer  (@base  int) 

RETURNS @retval TABLE 

(  

      AddressLine1  NCHAR(600)  NOT NULL,  

      AddressLine2  NCHAR(600)  NULL,  

      City    NVARCHAR(30)  NOT NULL,  

      StateProvinceID  INT     NOT NULL,  

      CT     INT     NOT NULL 

) 

AS 

BEGIN  

 

      INSERT INTO @retval  

      select AddressLine1, AddressLine2, City, StateProvinceID, ct = COUNT(*)   

      from Person.Address o  

      WHERE AddressID >= @base  

      GROUP BY AddressLine1, AddressLine2, StateProvinceID, City  

      HAVING COUNT(*) > 1  

 

      return 

end

 

The specific line in the function is the text of the STMT column.*

            INSERT INTO @retval  

      select AddressLine1, AddressLine2, City, StateProvinceID, ct = COUNT(*)   

      from Person.Address o  

      WHERE AddressID >= @base  

      GROUP BY PersonID, AddressLine1, AddressLine2, StateProvinceID, City  

      HAVING COUNT(*) > 1  

Another statement within another batch is also running among the contending statements.*

SELECT DISTINCT

      (Select top 1 AddressLine1

            from Person.USR_GetTopDuplicateCustomer(AddressID)) as AddressLine1,  

      (Select top 1 ISNULL(AddressLine2, '')

            from Person.USR_GetTopDuplicateCustomer(AddressID)) as AddressLine2,  

      (SELECT TOP 1 City

            FROM Person.USR_GetTopDuplicateCustomer(AddressID)) as City,  

      (SELECT TOP 1 StateProvinceID

            FROM Person.USR_GetTopDuplicateCustomer(AddressID)) as StateProvinceID,  

      (Select TOP 1 ct

            FROM Person.USR_GetTopDuplicateCustomer(AddressID)) as ct 

FROM Person.AllAddresses   a

WHERE AddressID IN

      (SELECT AddressID FROM Person.USR_IsInTop1000Customers(@base)) 

 

The TEXT column shows the full batch text and indicates that this statement is found in a stored procedure named “Person.GetNextDuplicateCustomerSet.”

At this point in the example, you can see the call stack by querying sys.dm_exec_requests. The stored procedure Person.GetNextDuplicateCustomerSet contains a SELECT query that calls the function “Person.USR_IsInTop1000Customers” once for each row evaluated in the SELECT query. If the name of the function is correct, you expect this to result in a maximum of 1,000 rows. The Person.USR_GetTopDuplicateCustomer function is then called five times for each of those 1,000 rows—one call for each column returned by the correlated subqueries in the column list. Although the Person.USR_GetTopDuplicateCustomer TVF is simple, it is not constructed as an inline TVF.

Each call to a multi-statement TVF requires SQL Server to create a table variable to hold the return values, allocate pages to that table as required, search for free space, and update the corresponding PFS page as data is added. When the table variable goes out of scope, SQL Server must deallocate the space used by that variable. All of this work on table variables occurs in tempdb. As larger numbers of users begin to use the system, and as the size of the data contained in the table variables increases, contention develops in the SGAM and PFS pages used to track allocation and space available for these table variables.

This tempdb contention causes SQL Server to be unable to use its available resources efficiently. To improve the throughput and to scale the application further, you must eliminate this “hot spot contention.”

NOTE: Although there are calls to TVFs in the column list in this function, there are cases of tempdb contention caused by queries with calls to TVFs only in the WHERE clause.

When Did Contention Start?

In the earlier example, contention appeared with as few as three concurrent users, and average wait times for latches became higher than 10 ms with as few as 25 concurrent users. As the number of concurrent users increased, both the number of processes waiting for latches and the average time each process waited for a latch increased linearly. Changing the amount of data being stored in the table variables also had a linear effect on latch wait time for any observed load level.

Reducing or Eliminating Contention

When you have identified the cause of the contention, you can start to determine how to reduce or eliminate it.

In the earlier example, there is nothing further a DBA can do with configuration to alleviate the contention. Changes must be made to the queries and to the method used to build the desired result set. Assuming that the existing queries result in the correct data set, you should concentrate on pulling this data set more efficiently.

The options to eliminate the contention are:

·         Take advantage of the new syntax (as of Microsoft SQL Server 2005) to eliminate the need for the correlated subqueries and to lower the number of times the TVFs are called.

·         Use a JOIN clause to replace the correlated subquery in the WHERE clause.

·         Eliminate the use of TVFs for the data in the SELECT list.

Each of these solutions may work in some scenarios.

Eliminating the Correlated Subqueries in the Selected Columns

Depending on the query, you can eliminate calls to TVFs in a SELECT list by doing any of the following:

·         Use an APPLY operator in the FROM clause instead of individual correlated subqueries for each column.

·         Pull the data into a temporary or staging table and performing an appropriate JOIN to that table.

·         Rewrite the query so that temporary objects are not necessary to pull the appropriate data.

Using an APPLY Operator

In queries such as that in the earlier example, a single TVF is called multiple times in a series of correlated subqueries, and each call to that function uses the same parameter. Prior to SQL Server 2005, this was the only way to use this TVF to retrieve the data. SQL Server 2005, however, introduced the APPLY operator, and this provides another option.

In the example, the query is constructed such that it uses the TVF in correlated subqueries on each column: The table variable is built and dropped once for each column when the function is called on each row. A CROSS APPLY or OUTER APPLY clause offers an advantage over the multiple correlated sub-queries in the SELECT list because one table variable is returned for each row. In the example, the function is called for each of the five columns, so it is possible to reduce the number of times the table variable is returned for each row from five to one by using an APPLY operator.

Following is an example of writing a query to use the APPLY operator.

NOTE: You may need to modify the TVF to ensure that it only returns one row when only the first row of the results set is used.

SELECT DISTINCT

        d.AddressLine1

      , d.AddressLine2

      , d.City

      , d.StateProvinceID

      , d.ct 

FROM Person.AllAddresses   a

CROSS APPLY Person.USR_GetTopDuplicateCustomer(AddressID) d

WHERE AddressID IN

      (SELECT AddressID FROM Person.USR_IsInTop1000Customers(@base))

This query replaces a call to Person.USR_GetTopDuplicateCustomer for each column in each row with a single call to Person.USR_GetTopDuplicateCustomer for the row. This reduces the number of calls to that stored procedure by 80 percent.

Note, however, that Person.USR_IsInTop1000Customers still gets called for each row of the results set, and because the argument used to call it never changes, the same result set is returned every time. You can therefore further reduce the potential for contention by obtaining this result set only once, as follows.

-- retrieve the list of top 1,000 customers first

 

SELECT AddressID INTO #top1000Customers

FROM Person.USR_IsInTop1000Customers(@base)

 

-- use the temp table

-- instead of calling the TVF for every row

SELECT DISTINCT

        d.AddressLine1

      , d.AddressLine2

      , d.City

      , d.StateProvinceID

      , d.ct 

FROM Person.AllAddresses   a

CROSS APPLY Person.USR_GetTopDuplicateCustomer(AddressID) d

WHERE AddressID IN

      (SELECT AddressID FROM #top1000Customers)

NOTE: Multi-statement TVFs are very different from inline TVFs. Temporary objects are not created by inline TVFs as they are with multi-statement TVFs. Inline TVFs may be another option to consider, but note that even when using inline TVFs, using the correlated subqueries in the column list of the SELECT is inefficient. You should therefore use the CROSS APPLY clause when possible with inline TVFs.

Using Temporary Tables with JOINS

Another option is to rewrite the procedure so that the multi-statement TVF is not called for each column, or even for each row. Sometimes queries using multi-statement TVFs are written to reuse the logic in a TVF; with SQL Server, however, you must consider how the data is retrieved.

In the earlier example, the purpose of the query is to determine which addresses out of the next 1,000 addresses appear more than once in the table. Although the user-defined function makes it easy to reuse logic, the extra overhead of allocating and deallocating space for the table variable being returned by the function puts an additional load on SQL Server and causes a bottleneck in tempdb where the TVFs are created.

A JOIN clause can often replace the logic of the correlated subqueries in the column list of the SELECT query. In the earlier example, a JOIN on a single object eliminates the need to make calls to the TVF in the column list. If the data used in the SELECT lists can be pulled into a temporary table once per execution of the stored procedure, a JOIN can then be used to include this data in the final result set rather than using the TVFs in the query.

Using temporary staging tables provides a way to break down the logic if it is too complex to be efficiently performed in a single query. Using temporary tables also lets temporary objects be created only once per execution, and this reduces the potential for tempdb contention even more than using CROSS APPLY does.

Following is an example of how you can use temporary tables with joins for this particular example.

-- get the next 1,000 addresses

 

SELECT TOP 1000 AddressID

INTO #top1000Customers

FROM Person.Address WHERE AddressID >= @base

ORDER BY AddressID

 

-- get the list of duplicate addresses

 

select AddressLine1, AddressLine2, City, StateProvinceID, ct = COUNT(*)

INTO #duplicateAddresses

FROM Person.Address

WHERE AddressID IN (SELECT AddressID FROM #top1000Customers)

GROUP BY AddressLine1, AddressLine2, StateProvinceID, City

HAVING COUNT(*) > 1

 

 

SELECT a.AddressID, a.AddressLine1, ISNULL(a.AddressLine2, '') AddressLine2,

      a.City, a.StateProvinceID, b.ct

FROM Person.Address a JOIN

#top1000Customers ta ON a.AddressID = ta.AddressID

JOIN

#duplicateAddresses b ON a.AddressLine1 = b.AddressLine1

      AND ((a.AddressLine2 = b.AddressLine2)

OR (a.AddressLine2 IS NULL AND b.AddressLine2 IS NULL))

      AND a.City = b.City AND a.StateProvinceID = b.StateProvinceID

Rewriting the Query to Eliminate the Need for Temporary Objects

When steps that perform significant transformations produce the query results, using a temporary table gives SQL Server a way to calculate statistics on intermediate result sets and makes query performance more predictable. However, with a simple query, the results can be efficiently limited within the query without creating a temporary object.

In this example, the logic of the query can be expressed as follows.

/*

      get the list from this set of AddressIDs where duplicates exist,

      and give the count of the occurrences of this in the

next 1,000 addresses

*/

 

SELECT a.AddressID, a.AddressLine1, ISNULL(a.AddressLine2, '') AddressLine2,

      a.City, a.StateProvinceID, b.ct

FROM Person.Address a JOIN

(

      SELECT TOP 1000 AddressID FROM Person.Address WHERE AddressID >= @base

      ORDER BY AddressID

) ta ON a.AddressID = ta.AddressID -- limit to 1,000

JOIN

(

      select AddressLine1, AddressLine2, City, StateProvinceID, ct = COUNT(*)

      from Person.Address

      WHERE AddressID >= @base

      GROUP BY AddressLine1, AddressLine2, StateProvinceID, City

      HAVING COUNT(*) > 1

) b ON a.AddressLine1 = b.AddressLine1

      AND ((a.AddressLine2 = b.AddressLine2)

            OR (a.AddressLine2 IS NULL AND b.AddressLine2 IS NULL))

      AND a.City = b.City AND a.StateProvinceID = b.StateProvinceID

In this example, the additional work created by multiple calls to the multi-statement TVF is eliminated by rewriting the query. This significantly increases the performance of each individual execution, and more importantly, it removes the “hotspot” on tempdb.

No contention develops on tempdb when the same load is applied with the rewritten stored procedure, which returns the same data. Contention does not develop on tempdb even when the load is increased to 100 times as many users as the original load. Query performance improves from about 45 seconds per execution with the original query to less than 1 second per execution with the rewritten query. However, as frequently occurs with performance tuning, a bottleneck on a different resource develops at the higher level of throughput.

Conclusion

Transact-SQL programming or querying practices that rapidly create and drop temporary objects can create significant bottlenecks on tempdb, reducing SQL Server throughput even when best practices for configuration of tempdb are followed.

Multi-statement TVFs create table variables as return values. Using multi-statement TVFs that are called once or more for each row processed in a query can lead to tempdb contention because of the number of times the table variables must be created and dropped in rapid succession.

In the examples described in this white paper, the contention tended to appear on one file at a time, but contention migrated periodically from file to file. Contention was most often observed on the PFS page, but contention occasionally appeared on the SGAM page also.

tempdb contention can be reduced or eliminated by altering the Transact-SQL programming so that these TVFs are not called on each row. Alternately, you can create a temporary table once for each execution if statistics are needed on an intermediate result set, or you can perform the entire operation in a single query, removing the function logic to this query. The single-query alternative was demonstrated in this white paper, eliminating the use of tempdb and therefore any contention on tempdb. Avoiding the multi-line TVF produced the best query performance and eliminated the tempdb contention.

Because of the tempdb contention that multi-statement TVFs can cause, you should avoid using multi-statement TVFs when other ways of deriving the result set are available.

 


 

Links for Further Information

The following links provide further information.

For general information, see:

·         Troubleshooting Performance Problems in SQL Server 2008 (http://msdn.microsoft.com/en-us/library/dd672789.aspx)

·         Storage Top 10 Best Practices (http://technet.microsoft.com/en-us/library/cc966534.aspx)

For information about SQL Server data structures that manage extent allocations and track free space, see:

Managing Extent Allocations and Free Space (http://msdn.microsoft.com/en-us/library/ms175195.aspx)

For information about execution-related DMVs and functions (Transact-SQL), see:

·         sys.dm_exec_requests (Transact-SQL) (http://msdn.microsoft.com/en-us/library/ms177648.aspx)

·         sys.dm_exec_sql_text (Transact-SQL) (http://msdn.microsoft.com/en-us/library/ms181929.aspx)

·         sys.dm_exec_sessions (Transact-SQL) (http://msdn.microsoft.com/en-us/library/ms176013.aspx)

For database and files catalog views (Transact-SQL), see:

·         sys.master_files (Transact-SQL) (http://msdn.microsoft.com/en-us/library/ms186782.aspx)

·         sys.database_files (Transact-SQL) (http://msdn.microsoft.com/en-us/library/ms174397.aspx)

For a useful DML statement (Transact-SQL), see:

FROM (Transact-SQL) (http://msdn.microsoft.com/en-us/library/ms177634(v=SQL.105).aspx)

For information about how to design user-defined functions that return a table data type, see:

Table-Valued User Defined Functions (http://msdn.microsoft.com/en-us/library/ms191165.aspx)

For information about using the ROLLUP, GROUP BY, and GROUPING SETS operators to manipulate the results set, see:

Using GROUP BY with ROLLUP, CUBE, and GROUPING SETS (http://msdn.microsoft.com/en-us/library/bb522495.aspx)

For a helpful Microsoft Support article, see:

Recommendations to reduce allocation contention in SQL Server tempdb database (http://support.microsoft.com/kb/2154845)


 

Did this paper help you? Please give us your feedback. Tell us, on a scale of 1 (poor) to 5 (excellent), how you would rate this paper and why you have given the paper this rating.

For example:

·         Are you rating this paper high due to having good examples, excellent screen shots, clear writing, or another reason?

·         Are you rating this paper low due to poor examples, fuzzy screen shots, or unclear writing?

This feedback will help us improve the quality of white papers we release. Thank you!

Send feedback.

 

 

                                                                                                                             



* The formatting in the example was changed to enhance readability.

 

* The formatting in the example was changed to enhance readability.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post
Page 1 of 1 (3 items)