Author: Carl Rabeler
Contributor: Scott Hulke
Reviewers: Richard Tkachuk, Thomas Kejser, Denny Lee
When designing your SQL Server 2005 (or 2008) Analysis Services solution, you may be faced with the design requirement to display multiple hierarchies in a parent-child dimension. A common implementation that we have seen is the use of custom rollups to accomplish this task. However, we have seen performance issues with this implementation when implemented at scale. For example, in one site that we worked with recently, certain MDX queries took almost a full minute to return results when executed against cold cache.
Marco Russo describes a different implementation in his Many-to-many revolution article—this implementation uses the many-to-many dimension relationship feature in SQL Server 2005 Analysis Services. Richard Tkachuk describes a third implementation in his Duplicate Members in Analysis Services 2005 blog. We implemented a variation of the many-to-many dimension design at the customer site to determine if its performance was significantly better. We discovered that its performance is dramatically better, particularly when the data requested is neither cached in Analysis Services nor cached in the file system.
Note: There are numerous approaches to addressing this problem space. This paper addresses only one particular approach. Other approaches could be superior based on various criteria.
As a result of this performance improvement, we decided to bring this implementation into the SQL Server Customer Advisory Team lab and test both implementations at scale to verify the results and to look more closely at the performance characteristics of each implementation. We were particularly interested in looking at the performance differences between the two implementations when the file system cache was cold because the performance difference seemed to be most dramatic under those circumstances. We discovered that as the size of the database increased, the many-to-many dimension design implementation outperformed the custom rollup implementation by a factor of more than two to one when the file system cache was warm, and by a factor of almost seven with the largest database we tested when the file system cache was cold.
Setting Up the Test Environment
For our test environment, we modified the Adventure Works DW sample database, creating two different size databases. To the first database, we added 100,000 new orders and to the second database we added 500,000 new orders. Each new order had an average of 16 line items. We then created two new Analysis Services projects for each database—one each for the custom rollup implementation and one for the many-to-many implementation.
Configuring the Adventure Works DW relational data warehouse
To configure each Adventure Works DW sample database, we performed the following steps:
1. Added the new orders to the FactResellerSales table. Furthermore, we distributed these sales across all 296 members in the DimEmployee dimension table (only about 20 have sales in the sample database).
2. Added an EmployeeID and a CustomRollup column to the DimEmployee table. The EmployeeID column holds the unique ID for each employee (formerly in the EmployeeKey field). The EmployeeKey field is then repurposed to hold a unique value for each employee hierarchical rollup as the table is being repurposed to hold a row-per-employee hierarchy rather than a row per employee. This is necessary for the many-to-many implementation. The CustomRollup column is used to hold the string that represents the dimension key to Analysis Services for use by the custom rollup implementation.
3. Added seven new employees to the DimEmployees table. These new employees had no sales in their own right, but were used to demonstrate alternate rollup hierarchies.
4. Added a new hierarchy for each employee who had sales in the FactResellerSales table. This was done by adding a new row to the DimEmployee table to represent the new hierarchy. The new row used a new, unique EmployeeKey (previous EmployeeKey for that employee plus 1,000,000), an EmployeeID equal to the previous EmployeeKey, and a parentEmployeeKey that was randomly selected from one of the new seven employees added in step 3. That gives each employee with sales two different rollup paths—the original one from the sample plus a new one where they roll up to one of the seven new employees. For example, the employee with the EmployeeKey of 272 generated as a new employee with an EmployeeID of 1,000,272, might have a parent with the EmployeeKey of 300, would have an EmployeeID value of 272, and would have a CustomRollup value of [Employee].[Employee].&[272].
5. Defined a view (vFactEmployee) consisting of the EmployeeKey and EmployeeID for each employee. This view will be used as the intermediate fact table for the many-to-many implementation. This view identifies to Analysis Services the alternate rollup path (instead of the CustomRollup column used for the custom rollup implementation).
Configuring the Custom Rollup Analysis Services Project (AWDW-CR)
To configure the custom rollup Analysis Services project, we performed the following steps:
1. Defined an Employee dimension identical to the Employee dimension in the Adventure Works sample project.
2. For the Employees attribute (the parent key), we set the value for the CustomRollup property to DimEmployee.CustomRollup (WChar).
3. Defined the Adventure Works cube with a single measure group—the Reseller Sales measure group. We defined the Reseller Sales measure group to be similar to the Reseller Sales measure group in the Adventure Works sample project (removing measure expressions to keep our cube simple).
4. Deployed the project and processed the database.
Note: When using the custom rollup implementation, data that is rolled up by using multiple paths is double-counted at the top level.
Configuring the Many-to-Many Analysis Services Project (AWDW-M2M)
To configure the many-to-many Analysis Services project, we performed the following steps:
1. Defined an Employee dimension identical to the Employee dimension in the Adventure Works sample project.
2. Defined a fact dimension based on the vFactEmployee view (Fact Employee). This dimension has a single attribute, the EmployeeID value. This dimension serves as the intermediate dimension in the many-to-many relationship.
3. Defined the Adventure Works cube with two measure groups—the Reseller Sales measure group and the Fact Employee measure group. We defined the Reseller Sales measure group similar to the Reseller Sales measure group in the Adventure Works sample project (removing measure expressions to keep our cube simple). We defined the Fact Employee measure group with a single measure, the EmployeeCount measure. This measure group serves as the intermediate measure group in the many-to-many relationship.
4. Defined measure group to dimension relationships as follows:
· Fact Employee dimension > Fact Employee measure group = Fact relationship
· Fact Employee dimension > Reseller Sales measure group = Regular relationship
· Employee dimension > Fact Employee measure group = Regular relationship
· Employee dimension > Reseller Sales measure group = Many-to-many relationship
5. Deployed the project and processed the database.
Note: When using the many-to-many implementation, data that is rolled up by using multiple paths is not double-counted at the top level. Also, in some cases, you may need to add a calculation in the MDX script to roll up calculated cells to display the desired results. If you had to add many of these calculations, performance could be affected.
Testing the Implementations
To test the implementations, we used the following MDX query:
select
{[Employee].[Employees].&[297],
[Employee].[Employees].&[297].children} on rows,
[Measures].[Reseller Sales Amount] on columns
from [Adventure Works]
The first execution of this query was performed with a cold file cache. A second execution of this query was then performed after clearing the Analysis Services cache. This process was repeated 10 times for each of the two blown-out databases.
Results
The results of our testing of each implementation are as follows:
100,000 New Orders/Scale of 0 – 2500 ms
500,000 New Orders/Scale of 0 – 35000 ms
With either warm or cold file cache, the many-to-many implementation was the clear performance winner compared to the custom rollup implementation. For the database with 100,000 new orders, query performance doubled with the many-to-many implementation compared to the custom rollup implementation. This held true regardless of whether we tested with warm file cache or cold file cache. However, for the database with 500,000 new orders, query performance increased by a factor of six with the many-to-many implementation when the query was executed against cold file cache.
Conclusion
When your design requirements call for multiple hierarchies in a parent-child dimension, consider using a many-to-many dimension implementation for best query performance. The benefit of this implementation increases as the size of your fact table increases. Furthermore, the performance impact of the many-to-many dimension implementation is most dramatic with larger fact table sizes and when the data requested to resolve the query cannot be found in the file system cache. Data may not be in file system cache for a number of reasons. It can be because the data requested by a particular query has never been retrieved into the file system cache. However, data may not be in the file system cache because the size of the file system cache is finite and stale data can be pushed out of the file system cache by newer data. The less the amount of memory available to the operating system for its file cache and the busier the system, the shorter the amount of time any data can remain in the file system cache.
So, for best performance with multiple hierarchies in a parent-child dimension, consider implementing your solution by utilizing many-to-many dimensions.