Using Filtered Statistics with Partitioned Tables.

  • Comments 11

Author: Alexei Khalyako

Reviewers and contributors: Eric Hanson, Dimitri Artemov, Burzin Patel, Thomas Kejser

Introduction 

We’ve worked with a customer on architectural design review and preparing a migration of a big Data Warehouse from SQL Server 2005 to SQL Server 2008.  Base on the customer’s requests, there were various advantages of the migration to SQL Server 2008 identified. One of the requests was related to control of the query resource consumption.

The main role of the Data Warehouse in the customer’s scenario  is to collect Call Data records, which have to be analyzed later and used by several other business applications. So, in general once per day new data come into a several Terabyte partitioned table and then business apps and business users were running report-type of queries. Some of the relatively small reports were taking very long times. As DBA started analyzing those queries they recognized that the difference between REQUIRED and REQUESTED memory by the query was huge (like amount of required memory was 18Mb and requested – 10 GB)!

As the part of optimization we were able to lower the appetite of those queries by using Resource Governor. We even managed to get more of the heavy queries executed at the same time. However this didn’t address the other question – why queries were requesting way more memory?

Analyzing query plans we found out that the Statistics were disabled on this table. The reason for this was that customer had to load every day very big amount of data into the new partition. Statics was becoming not valid and should have been updated.  Update process was starting on the entire table (which is several Tb) and by the tome update was finished – the new portion of the data was arriving. And so on, and so on... Customer decided to disable and delete statistics on the entire table.
Another important moment to note: customer stated that most of the report queries run against only one partition in partitioned table.

Base on the AdwentureWorks database and the table SalesOrderHeader I created a little sample to illustrate this situation. I made a partitioned table SalesOrderHistory partitioned by OrderDate by date and covering one month 2008-07.
The table structure looks like:

 

CREATE TABLE [SalesOrderHistory](
              [SalesOrderID] [int]  NOT NULL,
              [RevisionNumber] [tinyint] NOT NULL,
              [OrderDate] [datetime] NOT NULL,
              [DueDate] [datetime] NOT NULL,
              [ShipDate] [datetime] NULL,
              [Status] [tinyint] NOT NULL,
              [SalesOrderNumber]  [nvarchar](25) NOT NULL,
              [CustomerID] [int] NOT NULL,
              [ShipToAddressID] [int] NULL,
              [BillToAddressID] [int] NULL,
              [CreditCardApprovalCode] [varchar](15) NULL,
              [SubTotal] [money] NOT NULL,
              [TaxAmt] [money] NOT NULL,
              [Freight] [money] NOT NULL,
              [TotalDue]  [money] NOT NULL,
              [Comment] [nvarchar](max) NULL,
              [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
              [ModifiedDate] [datetime] NOT NULL)
              ON day_scheme ([OrderDate])

The query will look like a:

Select  b.Name, a.SalesorderId, a.OrderDate from SalesOrderHistory a
Inner Join Store b on a.CustomerID = b.CustomerID
WHERE a.OrderDate = '2008-07-13 00:00:00.000'and a.SalesOrderID = 43659
Group by b.Name, a.SalesorderId, a.OrderDate 

Query executed in 5 sec.

Let’s see the plan with no statistics on the table:  






 

 

 

 

 

 

So, we see the warning that statistics is missing, if we check the estimated number of rows in the output of SELECT is 393 rows

 

 

 


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

If we run DMV to check memory requested and granted to this query we will see following:

select session_id,requested_memory_kb,required_memory_kb,granted_memory_kb,query_cost

from sys.dm_exec_query_memory_grantswhere session_id >50  

session_id requested_memory_kb required_memory_kb granted_memory_kb query_cost
52 1360 1280 1360 5,737777

 

Now we will try to fix it. In SQL 2008 we’ve got new Filtered Statistics. In order to learn more about this feature please read http://msdn.microsoft.com/en-us/library/ms190397.aspx Having in mind that customer told us that 80% of the queries are hitting only one partition we may apply filtered statistics to improve those queries performance. I can try to create different statistics per each column which is included in SELECT statement:

CREATE STATISTICS Day0713_SalesOrderID on SalesOrderHistory (SalesOrderID )where   OrderDate =  '2008-07-13 00:00:00.000'

CREATE STATISTICS Day0713_OrderDate on SalesOrderHistory (OrderDate) where   OrderDate =  '2008-07-13 00:00:00.000'

CREATE STATISTICS Day0713_CustomerID on SalesOrderHistory (CustomerID ) where   OrderDate =  '2008-07-13 00:00:00.000' 

After creating those statistics and executing the same query as mentioned above we see that the query execution time was few milliseconds and the plan looked much better:

 

 

 


 

 

 

 

 

 

 

 

 

 

 

Let’s check how much memory this query requested now and how much was granted:

select session_id,requested_memory_kb,required_memory_kb,granted_memory_kb,query_cost

from sys.dm_exec_query_memory_grantswhere session_id >50 

session_id requested_memory_kb required_memory_kb granted_memory_kb query_cost
52 1024 512 1024 5,678132

We see that comparable to running the same query with no statistics now with filtered statistics on the table enabled has the query much lower expectations on how much memory it may need.

One of the great advantages of the filtered statistics is that if we switch in new partition into the partitioned table filtered statistics built on the older partitions does not become invalid and you don’t need to update it. You only need to create new filtered statistics for the new partition which is much faster process comparable to if you need to update or build statistics for entire table. In case of the real data in the customer’s lab we were able to get even better results using filtered statistics on much bigger amount of data. If you have similar workload, where most of the queries are going against single partition in partitioned table you may want to play with the filtered indexes and check if it will help you increasing performance of your queries.

 

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post
Page 1 of 1 (11 items)
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post