Reintroducing Usage-Based Optimization in SQL Server 2008 Analysis Services

Rate This
  • Comments 13

Author: Denny Lee
Contributors / Reviewers: Richard Tkachuk, Akshai Mirchandani, Robert Zare, Thomas Kejser, Gang Xiao, Cristian Petculescu

 

Introduction

From the early days of Microsoft® SQL Server® Analysis Services, one of the query performance optimization techniques was for developers to use the usage-based optimization feature. Developers would create their OLAP cube and cube partitions and then set the aggregations for the cube at a low percentage. It was common to see developers set the aggregations for their partitions at 0% - 5%. After the OLAP database was deployed to the production servers, there would be a beta test period for users to work with the cube and to store the components of the queries executed against the cube. Afterwards developers could build aggregations based on what users had queried, typically optimizing for the slowest queries. After this was done, these aggregations could significantly improve the performance of these initially slow queries. With the release of SQL Server 2008 Analysis Services, an improved usage-based optimization algorithm has been included that enhances Analysis Services’ ability to create better aggregations.

 

Customer Databases

To write this technical note, we performed some extensive testing on various customer databases to provide the lessons learned below. To do this, the query and database criteria were that the queries resulted in a lot of storage engine-heavy queries (as opposed to formula engine-heavy queries). This could be identified by viewing the Analysis Services profiler trace and noting that there were a lot of Event Class 12\Event SubClass 22 events (Query SubCube Verbose\Non Cache data). Specifically, we wanted queries that were not able to use the cache and needed to hit the file system.

Upon the application of usage-based optimization, we verified that aggregations were being used by verifying the Event Class 60 (Get Data from Aggregation) events in the Analysis Services profiler trace in SQL Server Profiler. For more information about the Analysis Services events, see Query Events in SQL Server Books Online. As well, you can find a translation of the events within the [Analysis Services Root Folder]\OLAP\bin\Resources\1033\tracedefinition100.xml file. For more information about how to use the Profiler to identify query performance bottlenecks, see Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services.

 

Recommendations

If usage-based optimization is used correctly, you can consistently reduce the length of long-running queries by using usage-based optimization to create smart aggregations, as shown in Figure 1.

image

Figure 1: Using usage-based optimization to improve query performance

We learned the following key lessons for using the usage-based optimization feature of SQL Server 2008 Analysis Services:

  • Usage-based opimtization should only be applied to long-running queries only. Application of usage-based optimization to any and all queries may have the inadvertent effect of reducing query improvement against your long-running queries and increasing processing costs as well.
  • Sometimes handmade aggregations will be more optimal in the situation where you want to improve the query performance of a specific query (as opposed to a set of queries). For more information about this technique, see the Analysis Services 2005 Aggregation Strategy technical note. As a general rule, you will still want to start with usage-based optimization to see whether you can get optimal query performance first. Then try applying your handmade aggregations.
  • Do not blindly apply and replace existing aggregations with usage-based optimization, because while you may improve one set of queries, you may slow down other queries.
  • When using usage-based optimization, you should set the aggregation level to 100% (which is now the default). You should lower it only if processing time is unacceptable or aggregation size increases above the threshold (for example, 1.5 times the size of the facts).
  • Test, test, test – make sure you test your queries with your original aggregations, new usage-based optimization aggregations, handmade aggregations, and/or no aggregations. It is important to do this to better understand the query characteristics of your cube.

For more information about the results and tests performed, please read-ahead – if you’re not interested – that’s it! :-)

 

Results

As noted above, we executed a large number of queries on various customer databases. Table 1 contains twelve queries that are representative of the larger customer query tests where we compared the query times before and after we applied usage-based optimization. (Times are in milliseconds.)

QueryID No aggregations Before usage-based optimization After usage-based optimization % difference Factor (UBO is x times faster)
Q1   32,391 229 99% 141
Q2   37,377 256 99% 146
Q3   337 36,532 -10,740% -108
Q4       30% 1.4
Q5 267,582 276,071 361,255 -31% -1.3
Q6   4,679 3,870 17% 1.2
Q7   933 983 -5% -1.1
Q8   2,627 2,746 -5% -1.1
Q9   3,759 2917 22% 1.3
Q10   84,647 30,861 64% 2.7
Q11   350,652 9,339 97% 37.5
Q12   328,924 11,198 97% 29.4

All of the tests (including the values in Table 1) had the cache and file system cleared before query execution to ensure that the query results were not obtained from the cache.

 

Discussion

As can be seen in Figure 2, queries Q1 and Q2 show a two-order of magnitude improvement in query performance. But, in the case of query Q3, there is a two-order of magnitude performance degradation in query performance. This occurred because Q3 was already very fast(337 milliseconds) due to an existing set of aggregations. When the usage-based optimization aggregations were applied, they replaced the existing aggregations instead of adding to them, This replacement caused the performance degradation with Q3.

image

Figure 2: First set of usage-based optimization query tests

When we reapplied the aggregations such that both the usage-based optimization aggregations and the original aggregations were applied, all three queries had return times in the hundreds of milliseconds, that is, less than one second.

Lesson Learned: While this is not really specific to usage-based optimization aggretations, remember to be careful and ensure that any new aggregations that you create with usage-based optimization do not override existing aggregations that you may want to keep.

Figure 3 breaks down query Q5, which shows the same query execution times when no aggregations, handmade aggregations (Before UBO), and usage-based optimization aggregations (After UBO). As seen here, the application of usage-based optimization may sometimes result in the creation of aggregations that may slow down query performance.

image

Figure 3: Application of usage-based optimization may be slower

There are times when the creation of any aggregations may slow down query performance. There are any number of reasons that could cause this, ranging from cube design to the way the MDX statements were written. But as you can see in Figure 3, when aggregations were removed from the cube and the query was re-executed, query performance actually improved. As noted in the Table 2, even though the number of subcube requests to the storage engine were the same, the duration is significantly higher for usage-based optimization.

 
Event SubEvent Events Duration
Usage-Based Optimization No Aggregations Handmade Aggregations
Query Subcube Cache data 1329 2,960 464 512
Query Subcube Non-cache data 204 1,514 1,223 1,480

Table 2: Usage-based optimization has longer duration to query process the same number of events

Lesson Learned: Make sure to test all of your queries with your original aggregations (if you had any) and without any aggregations at all. There are times when the Analysis Services MOLAP engine can scan through the file system faster than aggregations can be gone through. This can be especially apparent in the case where the size of the aggregations is larger than the data itself.

As well, the “Before UBO” query in Figure 3 was against a cube where handmade aggregations were made following the guidelines within the Analysis Services 2005 Design Strategy technical note. If we discount the “No Agg” query, it becomes apparent that there are times where handmade aggregations can be faster. This is because when you use the Aggregation Manager to design your own aggregations, you can optimize for that specific query. Note that you can also use usage-based optimization to design aggregations for a single query. But the advantage of usage-based optimization is that it can create and combine aggregations for all of your select queries (for example,replace aggregations 5, 17, and 33 with a new one that covers them all).

Lesson Learned: To improve the performance of individual queries, sometimes you can get faster query performance by creating handmade aggregations because you can optimize for that specific query.

As can be seen in Figure 4, all of these queries here had relatively fast query times – that is, all of them were under 5s query time. In some cases, usage-based optimiation had improved performance (Q9, Q6) while in some other cases, there was a negligible performance degradation (Q7, Q8) that may or may not have anything to do with the use of aggregations.

image

Figure 4: Usage-based optimization query comparison for fast queries

Nevertheless, the key lesson learned here is that the application of usage-based optimization aggregations may have very little effect when you apply them to fast queries.

Lesson Learned: Make sure that when you apply usage-based optimization, you apply it to your long-running queries only. Usage-based optimiation is a complex algorithm that creates aggregations based on the queries that you select. If you end up selecting many queries that are already fast, the impact of these aggregations will only be slight and may even reduce the impact of your long-running queries.

After we take into account the above lessons learned, we can consistently reduce the length of long-running queries by using usage-based optimization to create smart aggregations, as shown in Figure 5.

image

Figure 5: Using usage-based optimization to improve query performance

 

Conclusion

A summary of the recommendations to optimize your use of usage-based optimization can be found in the Recommendation section earlier in this technical note. Ultimately, these tests and lessons learned provided us with some valuable lessons learned and affirmation of this feature.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post
Page 1 of 1 (13 items)
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post