SQL Server Customer Advisory Team - SQL Server Best Practices

Enabling SQL Server customers to navigate the most challenging frontiers of large scale data management.

Technical Notes

Reintroducing Usage-Based Optimization in SQL Server 2008 Analysis Services

Author: Denny Lee
Contributors / Reviewers: Richard Tkachuk, Akshai Mirchandani, Robert Zare, Thomas Kejser, Gang Xiao, Cristian Petculescu

 

Introduction

From the early days of Microsoft® SQL Server® Analysis Services, one of the query performance optimization techniques was for developers to use the usage-based optimization feature. Developers would create their OLAP cube and cube partitions and then set the aggregations for the cube at a low percentage. It was common to see developers set the aggregations for their partitions at 0% - 5%. After the OLAP database was deployed to the production servers, there would be a beta test period for users to work with the cube and to store the components of the queries executed against the cube. Afterwards developers could build aggregations based on what users had queried, typically optimizing for the slowest queries. After this was done, these aggregations could significantly improve the performance of these initially slow queries. With the release of SQL Server 2008 Analysis Services, an improved usage-based optimization algorithm has been included that enhances Analysis Services’ ability to create better aggregations.

 

Customer Databases

To write this technical note, we performed some extensive testing on various customer databases to provide the lessons learned below. To do this, the query and database criteria were that the queries resulted in a lot of storage engine-heavy queries (as opposed to formula engine-heavy queries). This could be identified by viewing the Analysis Services profiler trace and noting that there were a lot of Event Class 12\Event SubClass 22 events (Query SubCube Verbose\Non Cache data). Specifically, we wanted queries that were not able to use the cache and needed to hit the file system.

Upon the application of usage-based optimization, we verified that aggregations were being used by verifying the Event Class 60 (Get Data from Aggregation) events in the Analysis Services profiler trace in SQL Server Profiler. For more information about the Analysis Services events, see Query Events in SQL Server Books Online. As well, you can find a translation of the events within the [Analysis Services Root Folder]\OLAP\bin\Resources\1033\tracedefinition100.xml file. For more information about how to use the Profiler to identify query performance bottlenecks, see Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services.

 

Recommendations

If usage-based optimization is used correctly, you can consistently reduce the length of long-running queries by using usage-based optimization to create smart aggregations, as shown in Figure 1.

image

Figure 1: Using usage-based optimization to improve query performance

We learned the following key lessons for using the usage-based optimization feature of SQL Server 2008 Analysis Services:

  • Usage-based opimtization should only be applied to long-running queries only. Application of usage-based optimization to any and all queries may have the inadvertent effect of reducing query improvement against your long-running queries and increasing processing costs as well.
  • Sometimes handmade aggregations will be more optimal in the situation where you want to improve the query performance of a specific query (as opposed to a set of queries). For more information about this technique, see the Analysis Services 2005 Aggregation Strategy technical note. As a general rule, you will still want to start with usage-based optimization to see whether you can get optimal query performance first. Then try applying your handmade aggregations.
  • Do not blindly apply and replace existing aggregations with usage-based optimization, because while you may improve one set of queries, you may slow down other queries.
  • When using usage-based optimization, you should set the aggregation level to 100% (which is now the default). You should lower it only if processing time is unacceptable or aggregation size increases above the threshold (for example, 1.5 times the size of the facts).
  • Test, test, test – make sure you test your queries with your original aggregations, new usage-based optimization aggregations, handmade aggregations, and/or no aggregations. It is important to do this to better understand the query characteristics of your cube.

For more information about the results and tests performed, please read-ahead – if you’re not interested – that’s it! :-)

 

Results

As noted above, we executed a large number of queries on various customer databases. Table 1 contains twelve queries that are representative of the larger customer query tests where we compared the query times before and after we applied usage-based optimization. (Times are in milliseconds.)

QueryID No aggregations Before usage-based optimization After usage-based optimization % difference Factor (UBO is x times faster)
Q1   32,391 229 99% 141
Q2   37,377 256 99% 146
Q3   337 36,532 -10,740% -108
Q4       30% 1.4
Q5 267,582 276,071 361,255 -31% -1.3
Q6   4,679 3,870 17% 1.2
Q7   933 983 -5% -1.1
Q8   2,627 2,746 -5% -1.1
Q9   3,759 2917 22% 1.3
Q10   84,647 30,861 64% 2.7
Q11   350,652 9,339 97% 37.5
Q12   328,924 11,198 97% 29.4

All of the tests (including the values in Table 1) had the cache and file system cleared before query execution to ensure that the query results were not obtained from the cache.

 

Discussion

As can be seen in Figure 2, queries Q1 and Q2 show a two-order of magnitude improvement in query performance. But, in the case of query Q3, there is a two-order of magnitude performance degradation in query performance. This occurred because Q3 was already very fast(337 milliseconds) due to an existing set of aggregations. When the usage-based optimization aggregations were applied, they replaced the existing aggregations instead of adding to them, This replacement caused the performance degradation with Q3.

image

Figure 2: First set of usage-based optimization query tests

When we reapplied the aggregations such that both the usage-based optimization aggregations and the original aggregations were applied, all three queries had return times in the hundreds of milliseconds, that is, less than one second.

Lesson Learned: While this is not really specific to usage-based optimization aggretations, remember to be careful and ensure that any new aggregations that you create with usage-based optimization do not override existing aggregations that you may want to keep.

Figure 3 breaks down query Q5, which shows the same query execution times when no aggregations, handmade aggregations (Before UBO), and usage-based optimization aggregations (After UBO). As seen here, the application of usage-based optimization may sometimes result in the creation of aggregations that may slow down query performance.

image

Figure 3: Application of usage-based optimization may be slower

There are times when the creation of any aggregations may slow down query performance. There are any number of reasons that could cause this, ranging from cube design to the way the MDX statements were written. But as you can see in Figure 3, when aggregations were removed from the cube and the query was re-executed, query performance actually improved. As noted in the Table 2, even though the number of subcube requests to the storage engine were the same, the duration is significantly higher for usage-based optimization.

 
Event SubEvent Events Duration
Usage-Based Optimization No Aggregations Handmade Aggregations
Query Subcube Cache data 1329 2,960 464 512
Query Subcube Non-cache data 204 1,514 1,223 1,480

Table 2: Usage-based optimization has longer duration to query process the same number of events

Lesson Learned: Make sure to test all of your queries with your original aggregations (if you had any) and without any aggregations at all. There are times when the Analysis Services MOLAP engine can scan through the file system faster than aggregations can be gone through. This can be especially apparent in the case where the size of the aggregations is larger than the data itself.

As well, the “Before UBO” query in Figure 3 was against a cube where handmade aggregations were made following the guidelines within the Analysis Services 2005 Design Strategy technical note. If we discount the “No Agg” query, it becomes apparent that there are times where handmade aggregations can be faster. This is because when you use the Aggregation Manager to design your own aggregations, you can optimize for that specific query. Note that you can also use usage-based optimization to design aggregations for a single query. But the advantage of usage-based optimization is that it can create and combine aggregations for all of your select queries (for example,replace aggregations 5, 17, and 33 with a new one that covers them all).

Lesson Learned: To improve the performance of individual queries, sometimes you can get faster query performance by creating handmade aggregations because you can optimize for that specific query.

As can be seen in Figure 4, all of these queries here had relatively fast query times – that is, all of them were under 5s query time. In some cases, usage-based optimiation had improved performance (Q9, Q6) while in some other cases, there was a negligible performance degradation (Q7, Q8) that may or may not have anything to do with the use of aggregations.

image

Figure 4: Usage-based optimization query comparison for fast queries

Nevertheless, the key lesson learned here is that the application of usage-based optimization aggregations may have very little effect when you apply them to fast queries.

Lesson Learned: Make sure that when you apply usage-based optimization, you apply it to your long-running queries only. Usage-based optimiation is a complex algorithm that creates aggregations based on the queries that you select. If you end up selecting many queries that are already fast, the impact of these aggregations will only be slight and may even reduce the impact of your long-running queries.

After we take into account the above lessons learned, we can consistently reduce the length of long-running queries by using usage-based optimization to create smart aggregations, as shown in Figure 5.

image

Figure 5: Using usage-based optimization to improve query performance

 

Conclusion

A summary of the recommendations to optimize your use of usage-based optimization can be found in the Recommendation section earlier in this technical note. Ultimately, these tests and lessons learned provided us with some valuable lessons learned and affirmation of this feature.

Comments

 

2005 oder 2008 | hilpers said:

Pingback from  2005 oder 2008 | hilpers

January 17, 2009 12:41 PM
 

Technical Notes said:

Author: Carl Rabeler Reviewers: Mark Souza, Prem Mehra, Lindsey Allen, Mike Weiner, James Podgorski,

February 24, 2009 12:34 PM
 

Top 10 Lists said:

Author: Carl Rabeler Reviewers: Mark Souza, Prem Mehra, Lindsey Allen, Mike Weiner, James Podgorski,

February 24, 2009 2:09 PM
 

Reintroducing Usage-Based Optimization in SQL Server 2008 Analysis Services « Denny Lee's SQL BI and DW Musings said:

Pingback from  Reintroducing Usage-Based Optimization in SQL Server 2008 Analysis Services «  Denny Lee's SQL BI and DW Musings

November 9, 2009 10:21 PM
 

Using Trace to Monitor and Audit Analysis Services | MarkTab Data Mining said:

Pingback from  Using Trace to Monitor and Audit Analysis Services | MarkTab Data Mining

August 2, 2010 7:09 AM
 

Resource Monitoring | MarkTab Data Mining said:

Pingback from  Resource Monitoring | MarkTab Data Mining

August 6, 2010 7:04 AM

About denny.lee

Denny Lee is a Senior Program Manager based out of Redmond, WA in the SQL Customer Advisory Team (SQLCAT) - DW/BI Group. He has more than 13 years experience as a developer and consultant implementing software solutions to complex OLTP and data warehousing problems. His industry experience includes accounting, human resources, automotive, retail, web analytics, telecommunications, and healthcare. He had helped create the first OLAP Services reporting application in production at Microsoft and is a co-author of "Professional Microsoft PowerPivot for Excel and SharePoint", “Professional Microsoft SQL Server Analysis Services 2008 with MDX”, “Professional Microsoft SQL Server 2000 Data Warehousing with Analysis Services” and “Transforming Healthcare through Information [Ed. Joan Ash] (2009)”. In addition to contributing to the SQLCAT Blog, SQL Server Best Practices, and SQLCAT.com, you can also review Denny's Blog (http://dennyglee.com). Denny specializes in developing solutions for Enterprise Data Warehousing, Analysis Services, and Data Mining; he also has focuses in the areas of Privacy and Healthcare.
sql, server, best practices, whitepapers, analysis services, data mining, olap, datawarehouse, datawarehousing, availability, clustering, capacity, collation, data types, data warehouse, database, design, index, mirroring, optimization, partitions, performance, precision, processing, querying, scalability, security, reporting services, integration services
Copyright 2008 Microsoft Corporation. All Rights Reserved.