Author: Robert Dorr, Keith Elmore, Lindsey Allen
Have you ever asked the question: “If I upgrade or apply a fix how will it affect my applications?” Based on the experiences cumulated from years of supporting customers, the Microsoft SQL Server support team developed several utilities to help you answer this question and much more. The utilities are collectively called the RML Utilities for Microsoft SQL Server. Prior to upgrade or apply a fix, you can capture and compare the necessary information and achieve predictable performance. The RML Utilities can help you evaluate a new application release or deployment. The “Precision Performance for Microsoft SQL Server” is the companion document which provides you the processes, knowledge and tool set to identify performance bottlenecks and to optimize your SQL Server application. The complete version of “Precision Performance for Microsoft SQL Server” can be downloaded here along with the latest RML utilities 9.0: http://blogs.msdn.com/psssql/archive/2007/12/18/rml-utilities-for-microsoft-sql-server-released.aspx.
This release of the utilities supersedes the previous version, as described in KB 887057, and requires SQL Server 2005. This release contains important bug fixes, new features and performance enhancements over the prior release. The utilities are provided as free downloads, and there is no obligation or guarantee that a fix or change will be made for a reported issue. Formal support is not available but an email alias dsdbtool@microsoft.com is available for questions and problem reports.
Anyone who has tackled a performance problem knows it is an iterative process. Just when you think you have every possible variable considered someone makes a change. The trick is to understand the environment and the variables that affect the performance of your SQL Server. Establishing baselines and known workloads enables you to test changes in a controlled environment before you implement the changes. By establishing a baseline you can predict and discover the affects of parameter adjustment. A performance life cycle is depicted here.
Capture, Analyze, Adjust and Generate Load make up the evaluation cycle. The quick start reduces each of these actions into specific RML Utility commands to help you understand how the RML Utilities can be an effective tool set for precision performance.
The starting point of any performance related activity is baseline. The following checklist can be used to baseline your SQL Server database configuration. You will repeat the same performance data capture tasks (during the performance tuning phase).
√The output from sp_configure
The sp_configure output is important because it outlines the running values. SQLDiag.exe can be used to capture sp_configure and other data points needed for performance evaluation and replay.
When establishing a baseline for SQL Server 2008, you should include the following configuration data:
SQL Server 2008 introduces the Resource Governor feature letting the administrator to make additional configurations to resources
SQLServer 2008 introduces the FileStream feature.
Recommendation:
We recommend use of these actions to establish the same starting point for all testing.
√The output of MSINFO32.exe
This utility collects system details, including memory, cpu, driver versions and other hardware/software information about the system. With MSINFO32 you can establish an exact testing computer match.
√A Performance Monitor Capture
The performance monitor capture is important to establishing a baseline. Having key counters (all counters are preferred) collected during the data capture provides key data points for evaluation. When you perform a replay the same counters can be monitored to help evaluate various aspects of the testing.
Make sure that you keep the size of the individual performance monitor log files around 300MB. Files larger than this frequently require significant resources and time to evaluate.
√User database backups
Capturing a backup provides part of the baseline needed to accommodate replay and stress activities. The database usually has primary keys, identities and other unique entities. Having a known state is required for most replay scenarios. The ultimate goal is to obtain a database backup on the server before capture is started. Making sure that no other activity occurs on the SQL Server before the capture begins. This is great goal but frequently not practical.
Instead capture a backup and start the trace right after the backup is complete. This can result in a jagged capture start. In a perfect world the capture would ensure that no activity or connections exist when the trace is started and before the trace is stopped all connections are closed. In reality the trace has to be started and stopped with existing activity on the server, making the ends jagged.
Another option is to start a trace and then capture the backup. Then filter the events that occurred before the backup completed.
The database backup is also useful to help secure the current metadata state, such as objectid’s. A stored procedure that is dropped and created is assigned a new objected. Without the backup, performance evaluation and other activities can be cumbersome.
These backups can also be very helpful for doing ‘what if’ analysis with the Database Engine Tuning Advisor (DTA). About 18% of performance issues reported to Microsoft SQL Server Support are corrected by adding index.
√User database clones
The SQL Server Management Studio Clone is a powerful tool. Cloning a database takes all database metadata. This includes statistical information and not the actual data.
The clone can be used with the captured trace. Performance evaluation, such as plan changes between SQL Server builds, can easily be achieved using the clone.
√Backup of master database
The master database contains instance wide data. The master database is frequently needed to replay.
√Backup of msdb database
In order to replay various SQLAgent job invocations the job definitions must be present. These are stored in the MSDB database. These jobs may also require extended entities such as DTS/SSIS packages to successfully execute.
√Representative Load Samples
Capture a trace that truly represents the common workload. One of the biggest mistakes made is capturing data during non-peak hours. This information is important but overlooks very important data points.
1. SQL Server trace capture should always occur with a server side trace and the files should be written to a high speed I/O path. The trace output should be treated with the same performance needs as the database log files (.ldf). When tracing for replay, every command is output to the trace file. Putting the trace location on a UNC path slows the whole server as the events coming from 100s or 1000s of clients have to be streamed to the UNC path file. When you use the SQL Profiler GUI, the trace streams all events over a TDS connection similar to the UNC path. All trace output paths, except those written by the server to a file, can drop events. Dropping events invalidates performance data evaluation and replay.
2. SQL Profiler trace:
3. Server side trace:
4. Start Time, End Time and Duration
√The output from the T-SQL command: SELECT * FROM SYS.DATABASES
The database names and ids may be needed to create the DBID_Map.txt file so OStress can establish the correct database context. SQLDiag.exe will collect the sys.databases output.
√An architecture rendering of the environment
The architecture document should include machine locations, network topology and database file placement.
This section introduces you to each of the tools by walking through a brief exercise and describes the correct way to evaluate performance data. It also shows you how to compare the data with replay, stress, database clones and other techniques. Extensive details for each component of RML Utilities are in the complete document of “Precision Performance for Microsoft SQL Server”.
The following files are installed by the RML setup.
As shown in the diagram below, after the data is captured the trace data is processed by ReadTrace. ReadTrace Produces .RML files for OStress replay and loads the PerfAnalysis database. After the database is loaded Reporter can be used to review the performance data. You can examine various trends in the data, find the most costly queries and perform other activities.
The quick start exercises use the database PrecisionPerformance.
Capture secures the environment and collects the data needed for evaluation. Make sure that you secure output locations. Captures may contain sensitive data. Traces and other data generated by SQL Server and SQL Server tools have ACLs applied. These may match the SQL Server service account and prevent you from direct access to the files. You may have to take ownership of .TRC files in order to process them.
You cannot user SD_Detailed.xml configuration to capture the trace, because the SQLDiag, sd_detailed.xml file was included with SQL Server 2005. It no longer contains the event capture configuration needed by the RML Utilities.
You need to wait to see the message ‘collection started’ before continuing. You should review the files that are added to c:\temp\PPSecure. They include SQL Server error logs, MSINFO32 and other configuration information. The key files are as follows: <<MACHINE NAME>><<INSTANCE NAME>>sp_sqldiag_shutdown.out <<MACHINE>>MSINFO32.TXT The collection of MSINFO data can take several minutes.
You need to wait to see the message ‘collection started’ before continuing. You should review the files that are added to c:\temp\PPSecure. They include SQL Server error logs, MSINFO32 and other configuration information. The key files are as follows: <<MACHINE NAME>><<INSTANCE NAME>>sp_sqldiag_shutdown.out <<MACHINE>>MSINFO32.TXT
The collection of MSINFO data can take several minutes.
The objective of the initial analysis is to understand the tools and the captured data. Review the queries with the most resource usage, the queries that executed the most, the application or logon that uses the most SQL Server resources and other key data points.
The captured .TRC files can be loaded with ReadTrace for performance analysis and broken out into .RML for replay.
When ReadTrace finishes processing the data Reporter will be started so that you can evaluate the data. You can also use SQLNexus to load the trace data and other information that was collected by the SQLDiag. SQLNexus provides additional reporting capabilities supporting the extended data collection. The data is loaded into the PPProd database and the Performance Overview report is automatically displayed. The following exercises will help you become familiar with the data Reporter reports. You should leave the tabs open, makes it easy to return to the main tab when you perform the various actions:
As you can clearly see there are various ways to analyze and evaluate the data. You can also open the performance monitor files (.blg) captured and review counters such as batches per second submitted to the SQL Server.
During the initial analysis ReadTrace will generate the .RML files. OStress uses these as input to replay the captured workload.
/* restore filelistonly from disk = 'c:\temp\PrecisionPerformance.bak' */ restore database PrecisionPerformance from disk = 'c:\temp\PrecisionPerformance.bak' with move 'PrecisionPerformance' to 'C:\temp\PrecisionPerformance.mdf', move 'PrecisionPerformance_log' to 'C:\temp\PrecisionPerformance_log.LDF', stats
/* restore filelistonly from disk = 'c:\temp\PrecisionPerformance.bak' */
restore database PrecisionPerformance from disk = 'c:\temp\PrecisionPerformance.bak'
with move 'PrecisionPerformance' to 'C:\temp\PrecisionPerformance.mdf',
move 'PrecisionPerformance_log' to 'C:\temp\PrecisionPerformance_log.LDF', stats
dbcc freeproccache dbcc dropcleanbuffers
These commands remove all plans and clean buffers from cache. We do not recommend this on a production server as it can have immediate performance impact. For replay purposes they provide a common starting point.
ostress –E –S(replay instance) –ic:\temp\PPBreakout\*.rml –cc:\temp\sample.ini –oc:\temp\PPBreakout\Replay –mreplay –T88 –dPrecisionPerformance –T33
Follow the analyze steps to load the newly captured .TRC with ReadTrace. Load the data in database named PPReplay. Reporter will be started to display the PPReplay data when the load finishes.
One window will show the PPProd data and the other will show the PPReplay data. The side by side display allows for comparison of the data to see how well the replay performed.
At this point you have now used all the RML Utilities to process, replay and evaluate the data. Use the clone to perform activities that do not require a full database backup.
1. Update the clone script by adding the following at the end of the script and altering the database name to PPReplayClone. These commands are necessary so SQL Server does not generate or update statistics from empty tables. The updates, inserts and deletes will not affect any rows so duration, CPU, reads and writes are not interesting data points. However, the plan selection operations are interesting.
ALTER DATABASE PPReplayClone set AUTO_CLOSE OFF ALTER DATABASE PPReplayClone set AUTO_CREATE_STATISTICS OFF ALTER DATABASE PPReplayClone set AUTO_UPDATE_STATISTICS OFF ALTER DATABASE PPReplayClone set AUTO_UPDATE_STATISTICS_ASYNC OFF
ALTER DATABASE PPReplayClone set AUTO_CLOSE OFF
ALTER DATABASE PPReplayClone set AUTO_CREATE_STATISTICS OFF
ALTER DATABASE PPReplayClone set AUTO_UPDATE_STATISTICS OFF
ALTER DATABASE PPReplayClone set AUTO_UPDATE_STATISTICS_ASYNC OFF
Merely set the clone to read-only will not work here, because it would fail modification commands and the statistics profile events are not produced.
2. Run the script and create the database PPReplayClone.
3. Begin a SQLDiag and Trace Event capture for the PPReplayClone activity.
4. Follow the steps in generate Replay section to replay the scenario against the clone and capture the outcome.
The Ostress workload command should change the database to –dPPReplayClone
5. Perform an analysis data load into PPClone1.
Follow the same steps outlined in Capture#1 but add the following step and load the captured trace data into PPClone2.
2a. Execute the following Transact-SQL command which updates all the statistics in the database. Because no data exists in the clone database, the statistics are empty. Different statistics should cause the query plans to change.
exec sp_updatestats
You now have PPClone1 and PPClone2 performance data loaded and expect to find plan changes. This section shows you how to find those changes. Cross database queries are used to discover plan changes. The normalized hash id values are used to match between captures. The hash id may change when you use a different version of ReadTrace. Execute the following Transact-SQL command to determine the unique batch hash id's which have plan variances. The full outer join is used so we can produce the 'exception' rows.
select * from (select p.PlanHashID as [PlanHashID_1], b.HashID as [HashID_1], count(*) as [Executes_1] from PPClone1.ReadTrace.tblPlans p inner join PPClone1.ReadTrace.tblBatches b on b.BatchSeq = p.BatchSeq group by p.PlanHashID, b.HashID) as vwClone1 FULL OUTER join (select p.PlanHashID as [PlanHashID_2], b.HashID as [HashID_2], count(*) as [Executes_2] from PPClone2.ReadTrace.tblPlans p inner join PPClone2.ReadTrace.tblBatches b on b.BatchSeq = p.BatchSeq group by p.PlanHashID, b.HashID) as vwClone2 on vwClone1.HashID_1 = vwClone2.HashID_2 where vwClone1.PlanHashID_1 is NULL or vwClone2.PlanHashID_2 is NULL or vwClone1.PlanHashID_1 <> vwClone2.PlanHashID_2 The resulting output resembles the following, showing the same batch hash id but a variant plan hash id. PlanHashID_1 HashID_1 Executes_1 PlanHashID_2 HashID_2 Executes_2
select * from
(select p.PlanHashID as [PlanHashID_1], b.HashID as [HashID_1], count(*) as [Executes_1] from PPClone1.ReadTrace.tblPlans p
inner join PPClone1.ReadTrace.tblBatches b on b.BatchSeq = p.BatchSeq
group by p.PlanHashID, b.HashID) as vwClone1
FULL OUTER join
(select p.PlanHashID as [PlanHashID_2], b.HashID as [HashID_2], count(*) as [Executes_2] from PPClone2.ReadTrace.tblPlans p
inner join PPClone2.ReadTrace.tblBatches b on b.BatchSeq = p.BatchSeq
group by p.PlanHashID, b.HashID) as vwClone2
on vwClone1.HashID_1 = vwClone2.HashID_2
where vwClone1.PlanHashID_1 is NULL
or vwClone2.PlanHashID_2 is NULL
or vwClone1.PlanHashID_1 <> vwClone2.PlanHashID_2
The resulting output resembles the following, showing the same batch hash id but a variant plan hash id.
PlanHashID_1 HashID_1 Executes_1 PlanHashID_2 HashID_2 Executes_2
------------------ ----------------- ---------- ---------------- --------------- ----------- 4302612084712464844 8878247664037717474 2000 4835681870838061185 8878247664037717474 2000 1. Start Reporter. With the unique batch, hash id open two windows in Reporter. One window connected to PPClone1 and the other PPClone2. 2. Drill through to any hash id's unique batch details report. (The hash id is not important to this step.) 3. Click the 'Hash Id:' link located in parameters area and type the unique batch hash id identified by the query. (Repeat this for both windows to do a side by side comparison of the same unique batch.) Various links on the reports could be helpful for accessing the extended capabilities provided by the Reporter: The following shows that the value link copies the Hash ID to the clipboard. Clicking the Hash ID label enables you to enter any valid Hash ID and open that specific batch details report. As soon as you have the side by side comparison, expand the 'Distinct Plans' region of the report to review the details. The plan hash ids are different because of the change in location for the sort operation. PPClone1 Plan PPClone2 Plan Cross database query technique can be used in many ways. For example, you could take a capture from the first day of the month and use it as your baseline checkpoint. Every day after that you could capture trace data from the same time window and load it into secondary databases. With a cross database query against ReadTrace.tblUniqueBatches you could locate new queries that did not execute on first day. By modifying this you could plot query patterns over time. For example, the unique hash id could be used to plot the cpu, reads, writes, execution count and other usage data points to discover trends. References and Links SQL Nexus: http://www.codeplex.com/sqlnexus SQL Server 2005 Service Pack 2, Performance Dashboard: http://www.microsoft.com/downloads/details.aspx?FamilyID=1d3a4a0d-7e0c-4730-8204-e419218c1efc&DisplayLang=en Troubleshooting Performance Problems in SQL Server 2005: http://download.microsoft.com/download/1/3/4/134644fd-05ad-4ee8-8b5a-0aed1c18a31e/TShootPerfProbs.doc RML Utilities 8.x Download: http://www.microsoft.com/downloads/details.aspx?FamilyID=5691ab53-893a-4aaf-b4a6-9a8bb9669a8b&DisplayLang=en RML Utilities 9.0 Download: http://blogs.msdn.com/psssql/archive/2007/12/18/rml-utilities-for-microsoft-sql-server-released.aspx
------------------ ----------------- ---------- ---------------- --------------- -----------
4302612084712464844 8878247664037717474 2000 4835681870838061185 8878247664037717474 2000
1. Start Reporter.
With the unique batch, hash id open two windows in Reporter. One window connected to PPClone1 and the other PPClone2.
2. Drill through to any hash id's unique batch details report. (The hash id is not important to this step.)
3. Click the 'Hash Id:' link located in parameters area and type the unique batch hash id identified by the query. (Repeat this for both windows to do a side by side comparison of the same unique batch.) Various links on the reports could be helpful for accessing the extended capabilities provided by the Reporter:
As soon as you have the side by side comparison, expand the 'Distinct Plans' region of the report to review the details. The plan hash ids are different because of the change in location for the sort operation.
PPClone1 Plan
PPClone2 Plan
Cross database query technique can be used in many ways. For example, you could take a capture from the first day of the month and use it as your baseline checkpoint. Every day after that you could capture trace data from the same time window and load it into secondary databases. With a cross database query against ReadTrace.tblUniqueBatches you could locate new queries that did not execute on first day. By modifying this you could plot query patterns over time. For example, the unique hash id could be used to plot the cpu, reads, writes, execution count and other usage data points to discover trends.
SQL Server 2005 Service Pack 2, Performance Dashboard: http://www.microsoft.com/downloads/details.aspx?FamilyID=1d3a4a0d-7e0c-4730-8204-e419218c1efc&DisplayLang=en
Troubleshooting Performance Problems in SQL Server 2005: http://download.microsoft.com/download/1/3/4/134644fd-05ad-4ee8-8b5a-0aed1c18a31e/TShootPerfProbs.doc
RML Utilities 8.x Download: http://www.microsoft.com/downloads/details.aspx?FamilyID=5691ab53-893a-4aaf-b4a6-9a8bb9669a8b&DisplayLang=en
RML Utilities 9.0 Download: http://blogs.msdn.com/psssql/archive/2007/12/18/rml-utilities-for-microsoft-sql-server-released.aspx
Pingback from Precision Performance for Microsoft SQL Server using RML Utilities 9.0 « IP Solutions
Pingback from sqlcat.com/.../precision-performance-for-microsoft-sql-server-using-rml-utilities-9-0.aspx
Following are links to whitepapers on SQL Server 2005 Best Practices and performance troubleshooting
Last May, I blogged about free troubleshooting tools for SQL Server and today I received a comment that
Here are some notes on “SQL Server 2008 DR, Testing, Waits” I took while attending an advanced class
Pingback from Precision Drill Press - drillpress.mnwifi.org
Pingback from Approach to SQL Performance issues « TroubleshootingSQL
Can you provide any additional detail around sp_resource_governor_configure, ostensibly a stored procedure to dump resource governor configuration? I can't find it anywhere online. Thanks