Author: Carl Rabeler
Reviewers: Mark Souza, Prem Mehra, Lindsey Allen, Mike Weiner, James Podgorski, Donald Farmer, Robert Bruckner, Sanjay Mishra, Lukasz Pawlowski, Jeff Bernhardt, Nicholas Dritsas
Microsoft SQL Server 2008 provides new functionality not found in previous versions and numerous performance and usability enhancements that are specifically designed for business intelligence (BI) solutions. These features are designed to improve performance, increase developer productivity, and enrich the end-user experience. The intent of this article is to provide a list, with some details, of the top performance reasons to use SQL Server 2008 for your new business intelligence solutions and to upgrade to SQL Server 2008 for your existing business intelligence solutions. The reasons begin with the performance enhancements that do not require modifications to your existing business intelligence solutions and then proceed to technology.
Important: For information about upgrading your existing Business Intelligence solutions to SQL Server 2008, see the SQL Server 2008 Upgrade Technical Reference Guide.
With SQL Server 2005 Reporting Services, the reporting engine has limited ability to exercise control over resource management. As a result, very large reports and heavy workloads by a large number of concurrent users can result in out-of-memory conditions. In addition, with SQL Server 2005 Reporting Services, long reports can take a long time to render, because every page is processed before the first page is returned to the user.
With SQL Server 2008 Reporting Services, the Report Engine has been redesigned to enable Reporting Services to control resources (threads, memory and state) to ensure that sufficient memory is always available to execute very large reports and heavy workloads from a large number of concurrent users (by paging and releasing memory in response to memory pressure). This is accomplished by:
In working with SQL Server 2008 Reporting Services in our lab, SQL Server 2008 Reporting Services was able to respond to 3–4 times the total number of users and their requests on the same hardware without HTTP 503 Service Is Unavailable errors compared with SQL Server 2005 Reporting Services, regardless of the type of renderer. In stark contrast, SQL Server 2005 Reporting Services generated excessive HTTP 503 Service Is Unavailable errors as the number of users and their requests increased, regardless of the report renderer.
Our tests clearly demonstrated that the new resource management architecture of the report server enables SQL Server 2008 Reporting Services to scale very well, particularly on the new four-processor, quad-core processors. With our test workload, SQL Server 2008 Reporting Services consistently outperformed SQL Server 2005 with the PDF and XLS renderers on the four-processor, quad-core hardware platform (16 cores) both in terms of response time and in terms of total throughput. As a result, we recommend with SQL Server 2008 Reporting Services that you scale up to four-processor, quad-core servers for performance and scale out to a two-node deployment for high availability. Thereafter, as demand for more capacity occurs, add more four-processor, quad-core servers. With SQL Server 2005 Reporting Services, we recommend that you scale out above four processors due to resource bottlenecks. Our tests clearly demonstrate that SQL Server 2008 Reporting Services utilizes your hardware resources more efficiently than SQL Server 2005 Reporting Services, particularly with the new four-processor, quad-core servers. For more information, see Scaling Up Reporting Services 2008 vs. Reporting Services 2005: Lessons Learned.
The SQL Server Analysis Services formula engine devises a query execution plan to retrieve data requested by a query. Each node of the query execution plan utilizes one of two types of evaluation modes: a subspace computation mode or a cell-by-cell evaluation mode. If the subspace computation mode is used, the Analysis Services formula engine operates only on cells with data that contribute to the result. If the cell-by-cell evaluation mode is used, the Analysis Services formula engine operates on each cell that theoretically could contribute to the result, regardless of whether it contains data. In a sparse cube, which is typical, retrieving data using the cell-by-cell evaluation mode is inefficient for calculation-intensive queries, because many null cells (which do not contribute to the result) are operated on.
In SQL Server 2005, before Service Pack 2 (SP2), the query execution plans devised by the Analysis Services formula engine primarily contained nodes that utilized the cell-by-cell evaluation mode. In SP2 of SQL Server 2005, a small number of MDX functions support the subspace computation mode. In SQL Server 2008 Analysis Services, the vast majority of MDX functions use the subspace computation mode. In working with our customers with SQL Server 2008 Analysis Services, we have observed tremendous performance gains with the following:
These performance gains with SQL Server 2008 Analysis Services are achieved without modification of your business intelligence solution. In addition, you can frequently further increase the performance of cubes with complex calculations by performing additional tuning of your MDX calculations in the cube. For more information, see Performance Improvements in MDX in SQL Server 2008 Analysis Services and the SQL Server 2008 Analysis Services Performance Guide.
In SQL Server 2005 Integration Services, each execution tree in a DataFlow task within a package is generally assigned a single worker thread, and, under certain conditions, each execution tree actually shares a thread with other execution trees. This approach has the following benefits:
However, this approach has the following downsides that frequently outweigh the preceding benefits, particularly with long or branched execution trees:
Note: Inserting an asynchronous transformation in the data flow to increase parallelism by creating a second execution tree (a performance design trick on high-performance multiprocessor computers) has performance overhead, because all data must be copied for the new execution tree.
In SQL Server 2008 Integration Services, pipeline parallelism has been enhanced with architecture improvements to the data flow engine. Now worker threads are dynamically assigned to individual components from a common threadpool. As a result, the data flow engine automatically utilizes all cores on multicore computers without the need to resort to design tricks. The increased pipeline parallelism in SQL Server 2008 Integration Services from threadpooling increases the speed at which packages can execute, and it makes better use of your existing resources, provided there are no other resource bottlenecks (such as I/O or memory). In working with our customers with SQL Server 2008 Integration Services, we learned that packages with long chains of synchronous transformations and most packages running on multiprocessor computers saw significant performance increases.
Note: For more information about SQL Server 2008 Integration Services from the SQL Server Integration Services team, see the SSIS Team Blog. For additional performance information, see ETL World Record and Top 10 SQL Server Integration Services Best Practices.
In SQL Server 2005, backups of large Analysis Services databases have limited scalability, and Analysis Services supports only a single location for all Analysis Services databases within an instance of Analysis Services, although measure group partitions within a database can be distributed across multiple drives to distribute the I/O. Furthermore, after processing a single database for an Analysis Services instance on a processing server, you cannot detach the processed database and then attach it to a query server for increased scalability and availability. In addition, in SQL Server 2005 Analysis Services, attaching read-only copy of an Analysis Services database to multiple servers for increased scalability and availability is not natively supported; this capability is supported only through the use the SAN snapshot technologies (see Scale-Out Querying with Analysis Services Using SAN Snapshots).
In SQL Server 2008 Analysis Services, these issues are resolved with the following scalability and availability enhancements:
Note: With large databases in which history partitions do not change, you will want to use high-speed SAN copy capabilities to clone the detached database or use Robocopy. For information about using Robocopy, see Sample Robocopy Script to customer synchronize Analysis Services databases.
The following diagram illustrates the use of these new features with a single processing server attached to a database in read/write mode and four query servers attached to a single read-only database. A load balancer distributes queries among the four query servers.
Figure 1: Query Scale-Out using Read-only Databases
You design SQL Server Analysis Services solutions in Business Intelligence Development Studio (BIDS). In SQL Server 2005, when you design your Analysis Services solution in Business Intelligence Development Studio, it is too easy to design a solution that does not follow best practices performance guidelines and as a result have performance issues. For example, the failure to design appropriate attribute relationships is a significant design and performance issue, but SQL Server 2005 Business Intelligence Development Studio does not notify you if you fail to create such relationships. In addition, with SQL Server 2005 Analysis Services, you frequently have to manually design aggregations to achieve optimum performance.
In SQL Server 2008 Analysis Services, the Business Intelligence Development Studio development environment includes the following improvements to promote best practices and help developers design high-performing solutions:
In SQL Server 2005 Reporting Services, you can create powerful and functional charts and graphs that enable you to hyperlink or drill down to other reports and Web sites. You can also integrate these reports and graphs into dashboards. However, making these reports and graphs pop and sizzle is sometimes difficult and time-consuming.
SQL Server 2008 Reporting Services offers many authoring enhancements that increase developer productivity. In addition, full-featured versions of the Dundas chart and gauge controls are now included, giving you the ability to create beautifully rendered charts and gauges.
In addition, with SQL Server 2008 Reporting Services, there are many additional authoring enhancements. These include:
Chart enhancements include scale breaks, annotations, custom color palettes, merged charts, and multiple axes. New chart types include the following types of charts:
The new gauges include the following gauge types:
For more information, see Reporting Services and What’s New (Reporting Services).
The disk I/O subsystem is one of the most common bottlenecks for many relational data warehouse implementations. Additional disks are frequently added to reduce read/write latencies. This can be expensive, especially on high-performing storage systems. At the same time, the need for storage space continues to increase due to rapid growth of the data, as does the cost of managing databases (backup, restore, transfer, and so on).
In SQL Server 2008, data compression addresses some of these problems. With data compression, you can selectively compress any table, table partition, or index, resulting in a smaller on-disk footprint, smaller working-set memory size, and reduced I/O. Working with customers, we have seen that enabling data compression has resulted in a 50-80% saving in disk space in many environments. This savings in disk space directly translates into less money being required for high speed disk storage. Relational data warehouses with I/O bottlenecks and without processor saturation may also see an increase in overall performance.
SQL Server supports two types of data compression: row compression, which compresses the individual columns of a table, and page compression, which compresses data pages using row, prefix, and dictionary compression. The compression results are highly dependent on the data types and data contained in the database; however, in general we’ve observed that using row compression results in lower processor overhead but saves less space. Page compression, on the other hand, results in higher processor overhead, but it results in much larger space savings. Page compression is a superset of row compression, implying that an object or partition of an object that is compressed using page compression also has row compression applied to it. Compressed pages remain compressed in memory until rows or columns on the pages are accessed.
Both row and page compression can be applied to a table or index in an online mode without interruption to the application availability, but this compression can take a long time. Partitions of a partitioned table cannot be compressed or uncompressed online, however. In our testing we found that using a hybrid approach where only the largest few tables were compressed resulted in the best overall performance – this approach saved significant disk space but had a minimal impact on performance. Because additional disk space is required during the compression process, we found that it is best begin by compressing the smallest of the objects that you choose to compress. By doing so, you utilize the least disk space during the compression process and avoid running out of disk space during the compression process.
To determine how compressing an object will affect its size, you can use the sp_estimate_data_compression_savings system stored procedure. Database compression is supported only in the SQL Server 2008 Enterprise and Developer editions. It is fully controlled at the database level, and it does not require any application change.
Maintaining a consistent level of service by preventing runaway queries and guaranteeing resources for mission-critical workloads has been a challenge for the SQL Server relational engine. In the past there was easy way to guarantee a certain amount of resources to a set of queries and to prioritize access to SQL Server relational engine resources – all queries had equal access to all the available relational engine resources.
For example:
In SQL Server 2008, Resource Governor addresses these issues by enabling users to differentiate workloads, allocating resources as they are requested, controlling parallelism for particular workloads (for example, by user name or computer name), and substantially increasing concurrency with resource intensive queries. The Resource Governor limits can easily be reconfigured in real time with minimal impact on currently executing workloads. The allocation of the workload to a resource pool is configurable at the connection level, and the process is completely transparent to applications. This new functionality is particularly useful on systems with large memory and many cores.
The diagram below depicts the resource allocation process. In this scenario, three workload pools (the Admin workload, the OLTP workload, and the Report workload) are configured, and the OLTP workload pool is assigned a high priority. In parallel, two resource pools (the Admin pool and the Application pool) are configured with specific memory and processor (CPU) limits as shown. Finally, the Admin workload is assigned to the Admin pool and the OLTP and Report workloads are assigned to the Application pool.
Figure 2: Using Resource Governor to Allocate Resources
Below are some other points you need to consider when you use Resource Governor:
While bulk load operations have substantial performance benefits over row-by-row operations, the logging of these bulk load operations (typically inserts) in high-volume data loading scenarios is frequently the cause of disk I/O bottlenecks. Unlike fully logged operations, which use the transaction log to keep track of every changed row, minimally logged operations keep track of extent and page allocations and metadata changes only. Because much less information is tracked in the transaction log, a minimally logged operation is often faster than a fully logged operation if logging is the bottleneck. Furthermore, because fewer writes go the transaction log, a much smaller log file with a lighter I/O requirement becomes viable. Minimally logged operations are available only if your database is in bulk-logged or simple recovery mode. For more information, see Operations That Can Be Minimally Logged. Note that performing a bulk operation in a bulk-logged database has impact on the backup strategy for the database. Additionally, the rollback of a minimally logged operation is fully logged, which can result in the rollback of a minimally logged operation taking significantly longer than the originally logged operation. For more information about the implications, see Backup Under the Bulk-Logged Recovery Model .
SQL Server 2008 introduces INSERT…SELECT as a new way to perform minimally logged insert operations, allowing Transact-SQL based INSERT statements to be minimally logged under certain circumstances:
Example 1: You have a table clustered in a key that contains even integer key values 0-16. The table has four leaf pages, the pages are not full, and they can hold two more rows on each page. You bulk load eight new rows, with uneven key values 1-15. The new rows fit in the existing pages. The illustration below shows how this table will look before and after the load operation.
Figure 3: A fully logged insert under trace flag 610
In this example, no new pages are allocated and trace flag 610 will not give you any minimal logging.
Example 2: Consider an alternative scenario: The table initially now has two pages, both full, containing the key values 0-7. You bulk load rows with key values 8-16.
Figure 4: A minimally logged insert under trace flag 610
In this example, the pages holding key values 8-15 (in light blue above) will be minimally logged with trace flag 610.
For more information on using this new functionality to improve data loading performance, see The Data Loading Performance Guide.
SQL Server 2008 introduces the following additional features that improve the performance of business intelligence solutions.
Most data warehouse queries are designed to follow a star schema and can process hundreds of millions of rows in a single query. By default, the query optimizer detects queries against star schemas and creates efficient query plans for them. One method the optimizer can use to generate an efficient plan is to use bitmap filtering. A bitmap filter uses a compact representation of a set of values from a table in one part of the operator tree to filter rows from a second table in another part of the tree. Essentially, the filter performs a semi-join reduction; that is, only the rows in the second table that qualify for the join to the first table are processed. In SQL Server 2008, bitmap filtering can be introduced in the query plan after optimization, as in SQL Server 2005, or introduced dynamically by the query optimizer during query plan generation. If the filter is introduced dynamically, it is referred to as an optimized bitmap filter. Optimized bitmap filtering can significantly improve the performance of data warehouse queries that use star schemas by removing nonqualifying rows from the fact table early in the query plan. Without optimized bitmap filtering, all rows in the fact table are processed through some part of the operator tree before the join operation with the dimension tables removes the nonqualifying rows. If optimized bitmap filtering is applied, the nonqualifying rows in the fact table are eliminated immediately. Optimized bitmap filtering is available only in the Enterprise, Developer, and Evaluation editions of SQL Server. For more information, see Optimizing Data Warehouse Query Performance Through Bitmap Filtering.
SQL Server 2008 improves query processing performance on partitioned tables for many parallel plans, changes the way parallel and serial plans are represented, and enhances the partitioning information provided in both compile-time and run-time execution plans. SQL Server 2005 uses a single thread per partition parallel query execution strategy, but SQL Server 2008 can allocate multiple threads to a single partition. This improvement is of particular importance to data warehouse environments; fact tables are often candidates for partitioning, because they typically contain a few columns with a very large number of records. For more information, see Query Processing Enhancements on Partitioned Tables and Indexes.
In SQL Server 2008, you can perform multiple data manipulation language (DML) operations in a single statement by using the MERGE statement. For example, you may need to synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table. Typically, this is done by executing a stored procedure or batch that contains individual INSERT, UPDATE, and DELETE statements. However, this means that the data in both the source and target tables is evaluated and processed multiple times—at least once for each statement. By using the MERGE statement, you can replace the individual DML statements with a single statement. This can improve query performance because the operations are performed within a single statement, minimizing the number of times the data in the source and target tables are processed. However, performance gains depend on having correct indexes, joins, and other considerations in place. For more information, see Optimizing MERGE Statement Performance.
In SQL Server 2008, this new feature provides an easy way to capture changes to data in a set of database tables so these changes can be transferred to a second system such as a data warehouse. Change data capture is designed to capture insert, update, and delete activity applied to SQL Server tables, and to make the details of the changes available in an easily consumed relational format. The change tables used by change data capture contain columns that mirror the column structure of a tracked source table, along with the metadata needed to understand the changes that have occurred. Change data capture is available only in the Enterprise, Developer, and Evaluation editions of SQL Server. For more information, see Tuning the Performance of Change Data Capture in SQL Server 2008.
In SQL Server 2008, the Reporting Services database stores additional information that enables you to analyze report execution log data to help answer questions about performance, such as:
This detailed performance information appears in the ExecutionLog2 view. For more information, see ExecutionLog2 View – Analyzing and Optimizing Reports.
SQL Server 2008 is a significant release that delivers new functionality not found in previous versions and numerous performance and usability enhancements that are specifically designed for business intelligence solutions. For a full list of features and detailed descriptions, see SQL Server Books Online and the SQL Server Web site.
Pingback from Top 10 Performance and Productivity Reasons to Use SQL Server 2008 … | BH-Server
Pingback from Top 10 Performance and Productivity Reasons to Use SQL Server 2008 … « Productivity
Neue und verbesserte Funktionen im SQL Server 2008 in deutsch und auf einem Blick? Immer wieder gern
Pingback from Top 10 Performance and Productivity Reasons to Use SQL Server 2008 for Your Business Intelligence Solutions | Kasper de Jonge BI Blog
Pingback from Optimizing Performance in SQL Server?? Analysis Services « [Business Intelligence].[ALL]
Pingback from Using Trace to Monitor and Audit Analysis Services | MarkTab Data Mining
Pingback from Resource Monitoring | MarkTab Data Mining
Pingback from Compare sql server 2005 and 2008 by SharePoint 2010 « Bugra Postaci's Blog
Pingback from SQL Server 2008 « Prayag Upd
Pingback from BISQL #21:All White Paper For SQL Server R2 Review and Better Understanding « SQL Server Mentalist