Author: Eric Jacobsen, Christian Petculescu
Updated by Denny Lee
Goal
This document describes how to build Create Cache commands. Create Cache for Analysis Services (AS) was introduced in SP2 of SQL Server 2005. It can be used to make one or more queries run faster by populating the OLAP storage engine cache first.
Some customers have found certain queries benefit other later queries. For example, ascmd.exe could be used every hour to execute all queries in a directory keeping the cache ready for subsequent user queries. The other approach, which has been used, is to create a cache query for each user query. This is feasible if the MDX query is part of a report, then one simply adds another query that has the side effect of populating the cache, thereby speeding up the next query.
The root of the problem is that during a query, the AS Server does only local optimizations. Calculations, mixed granularities, and other more complex MDX statements can result in a chatty communication between the FE (Formula Engine) and the SE (Storage engine). In AS2000 and earlier, this was also a network round trip.
By issuing a cache statement, we can populate the cache with one or more subcubes that cover the regions of cube space that the query will actually need. We often find approximately the same time taken for each subcube query, so the effect can be dramatic overall.
With this methodology, the collection of MDX queries will appear as first executing inside the storage engine, and second inside the formula engine. In addition to reducing overall time, this can make it easier to predict the effect of multi-user load testing, because the first part uses 100% of all CPUs, and the second part uses 100% of one CPU.
Summary Steps
It is an iterative process. I would describe the identification of potential scenarios where create cache would help:
- Run profiler
- Run the query looking for Non-cache SE queries (query subcube filtered by subevent = 2)
- Look at the total time of the query vis-à-vis of the sum of the times of the non-cache SE queries.
- If:
- They are pretty close and
- There are many non-cache SE queries within the same range of time (not just 1 or 2 time consuming)
then create cache might help.
There could be variations, but here are one set of steps that have been used successfully:
- Extract all MDX queries as separate files.
- Add a Create Cache statement to correspond for every MDX query file.
- Run Clear Cache, Create Cache, then the user query.
- Verify that the Create Cache is effective and improve, if necessary.
- Work on next query
- When done with all queries, combine the Create Cache into 1 or more Create Cache queries.
- Verify the combined Create Cache.
Detailed Description
Below are details about each of the steps.
1. Extract MDX queries as separate files
Placing the queries in separate MDX files, it can be faster to work one by one and verify if each query is handled correctly by Create Cache before moving on.
- Start a trace.
- Run the report.
- Stop the trace.
- Extract the queries. One way is with SQLProfiler, /File /Export /Extract SQLServer Analysis Serverices Events / Extract All Queries. This creates a text file, with each query on a separate line. When queries have multiple lines this might be confusing, but is easy to add an extra line or otherwise edit. Either copy each query to a separate file, or, for every query that is being worked on, comment out the other queries.
2. Add Create Cache for every MDX query
Below is an example of a Create Cache statement.
create cache
for [MyCube]
as (
{ [USA].[Oregon], [USA].[Colorado], [USA].[Florida], [USA].[Washington] }
* { [Measures].[mybasemeasure] }
* { [2006].children, parallelperiod( [Time].[Year], 1, [2006].[Q1].[Jan] ), YTD( parallelperiod( [Time].[Year], 1, [2006].[Q1].[Jan] ) ) }
* { [Products].[Shoes].children }
)
It should be apparent that it is basically a crossjoin of each dimension member that is specified in the query. Note that set expressions are allowed.
2.a. Add Specified Members
First add all members specified in the query. NOTE: Create Cache covers static analysis rather than dynamic. MDX with dynamic members will not benefit from this approach.
2.b. Add Calculated Members and Definitions
During execution, the Formula Engine can issue a subcube query which includes calculated members.
For example, if the cube has a calculated member:
[MyDim].[Calc123] as [MyDim].[A] + [MyDim].[B]
one should include the following members in Create Cache:
[MyDim].[Calc123] as [MyDim].[A] + [MyDim].[B]
2.c. Account for Custom Rollups
During execution, there may be custom rollups that affect the calculation. For example, a custom rollup may involve QTD() or YTD(). This might be hard to detect since the actual calculations are stored in the relational database. Custom rollups can be viewed from BI Development Studio. The dimension must be processed (otherwise the calculations still only exist on the relational database).
- Go to Solution Explorer, double-click on the dimension you want to examine.
- Click on the Browser tab.
- Find the icon for Member Properties, click on it. Select "Custom Rollup", "Custom Rollup Properties".
- It will display the Custom Rollup (formula) and other properties (such as solve order) for each member.
For example, if calculations include YTD or PeriodsToDate, the member list should account for that. So, if the query includes:
[Date].[2006].[June],
ParallelPeriod( [Date].[2006].[June], [Date].[Year], -1 )
YTD( [Date].[2006].[June] )
the only calculations resulting in output are
[Date].[2005].[June], [Date].[2006].[Jan] ..[Date].[2006].[June]
However, because of indirect relationships, you should also add the member set:
YTD( ParallelPeriod( [Date].[2006].[June], [Date].[Year], -1 )
3. Run Clear Cache, Create Cache, User Query
Now, it is time to run the Clear Cache statement and examine its effect. Note that the Cube ID can be specified or left blank to clear the entire database cache entries.
<!-- Can be used through ADOMD.NET ExecuteNonQuery call, or passed to ADODB as CommandText or SQL Server Management Studio XMLA query -->
<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<ClearCache>
<Object>
<DatabaseID>FoodCmp</DatabaseID>
<CubeID>SalesCube</CubeID>
</Object>
</ClearCache>
</Batch>
3.a. Connection string parameters
We recommend these parameters to be added at the connection string.
Provider=msolap.3;Datasource=MyServer;Initial Catalog=MyDatabase;Timeout=300;Disable Prefetch Facts=true;Cache Ratio=1
| Provider |
The name of the OLE DB provider, msolap. |
| Datasource |
The name of the server. Might be named instance like MyServer\MyInstance. |
| Initial Catalog |
The name of the database to use. |
| Timeout |
Optional number of seconds for command timeout. This can help to avoid very long runs until the queries are optimized to run faster. |
| Disable Prefetch Facts |
Optional new parameter to disable the Formula Engine heuristic that sends queries for possibly more data than it is requested. |
| Cache Ratio |
Optional parameter to control the conversion from a specified set of members to the subcube query sent to the SE Storage Engine. Use 1 to send only the specified list. |
3.b. Ways to Execute Queries
Because the connection string needs to be modified, SQL Server Management Studio cannot be used. There are several alternatives:
- One could write a small program in C#.
- ascmd.exe can execute queries and output time. The ascmd utility can be found on Complex under the SSAS Product samples at: http://www.codeplex.com/MSFTASProdSamples; please refer to the Command-line Utility Sample.
- Use the mdx sample app from AS2000.
4. Verify Create Cache is Effective
The goal of using Create Cache is to isolate the storage engine and formula engine work and prevent the user query from executing subcube commands. The SE is multi threaded when the FE is single threaded. Therefore, one can just use Task Manager to verify if the SE is running.
During a good run, initially all the CPUs are at 100% since the SE is busy with the Create Cache command. Then, only one CPU reaches 100% because the FE is busy with the formulas and preparing the result set.
Task Manager is the general way to verify, but to be sure, create a trace and look for event Query Subcube, subevent 2 Non Cache. The cache subcube queries are fast and generally not an issue for AS. (In fact it shows how effective it is to create a cache first.)
4.a. Improve, if necessary
If the user query results in Task Manager CPU spikes (caused by subcube queries), one only needs to determine which members from the main query are not present in the Create Cache statement.
It might be helpful to look at trace information for Query Subcube Verbose. It might be that by comparing the ones for the Create Cache with the ones for the user query, some differences can be seen, and that could help discover which members should be added to the Create Cache query.
Another approach is to simplify the query until the problem does not happen. Try to use binary search to find the critical part of the user query that causes the extra subcubes.
4.b. Eliminate All Subcubes
This general technique works well when one can eliminate all subcubes. However, it might not be possible to know the set of members in advance (static analysis). If this is not true, for instance for complex calculations, then the Create Cache technique might not be as beneficial. Adding members that are not used can be cheap in some cases, for example to add one more product in an already large specified set. Or it might be expensive, for example to include another large set of data that otherwise would avoid entire partitions. One will need to experiment and measure the result, and because sometimes it will be a win and sometimes a loss, we recommend considering several user parameters.
5. Work on Next Query
6. After working with all the main queries, combine the Create Cache into 1 or More Create Cache statements
The best number of Create Cache queries depends on their structure (which dimension members involved, which measure group) and member sets.
- Different measure groups should be separated unless the dimension members are consistent. Then, it is just a matter of convenience to combine, since under the covers the measure groups are physically separate.
- If one query is covered or almost covered by another one, they should be combined.
- If queries are disjoint or mostly disjoint, keep separate.
We have not experimented greatly, but here is one suggestion. Execute both scenarios, as separate and combined. Measure execution time and size of resulting DataCache (from perfmon counter). Use single combined Create Cache if the combination reduces time to 67% of separate commands and consumes space less than 3 times as much as separate.
7. Verify the combined Create Cache
Execute all queries, in this pattern:
- Clear Cache
- Create Cache (possibly many queries)
- User queries
Verify the set of queries in the same way as each single query.