Precision Performance for Microsoft SQL Server using RML Utilities 9.0

Rate This
  • Comments 10

Author: Robert Dorr, Keith Elmore, Lindsey Allen

Introduction

Have you ever asked the question: “If I upgrade or apply a fix how will it affect my applications?” Based on the experiences cumulated from years of supporting customers, the Microsoft SQL Server support team developed several utilities to help you answer this question and much more. The utilities are collectively called the RML Utilities for Microsoft SQL Server. Prior to upgrade or apply a fix, you can capture and compare the necessary information and achieve predictable performance. The RML Utilities can help you evaluate a new application release or deployment. The “Precision Performance for Microsoft SQL Server” is the companion document which provides you the processes, knowledge and tool set to identify performance bottlenecks and to optimize your SQL Server application. The complete version of “Precision Performance for Microsoft SQL Server” can be downloaded here along with the latest RML utilities 9.0: http://blogs.msdn.com/psssql/archive/2007/12/18/rml-utilities-for-microsoft-sql-server-released.aspx.

This release of the utilities supersedes the previous version, as described in KB 887057, and requires SQL Server 2005. This release contains important bug fixes, new features and performance enhancements over the prior release. The utilities are provided as free downloads, and there is no obligation or guarantee that a fix or change will be made for a reported issue. Formal support is not available but an email alias dsdbtool@microsoft.com is available for questions and problem reports.

The Performance Cycle

Anyone who has tackled a performance problem knows it is an iterative process. Just when you think you have every possible variable considered someone makes a change. The trick is to understand the environment and the variables that affect the performance of your SQL Server. Establishing baselines and known workloads enables you to test changes in a controlled environment before you implement the changes. By establishing a baseline you can predict and discover the affects of parameter adjustment. A performance life cycle is depicted here.

image

 

Capture, Analyze, Adjust and Generate Load make up the evaluation cycle. The quick start reduces each of these actions into specific RML Utility commands to help you understand how the RML Utilities can be an effective tool set for precision performance.

Baseline Checklist

The starting point of any performance related activity is baseline. The following checklist can be used to baseline your SQL Server database configuration. You will repeat the same performance data capture tasks (during the performance tuning phase).

The output from sp_configure

The sp_configure output is important because it outlines the running values. SQLDiag.exe can be used to capture sp_configure and other data points needed for performance evaluation and replay.

When establishing a baseline for SQL Server 2008, you should include the following configuration data:

  • sp_resource_governor_configure

SQL Server 2008 introduces the Resource Governor feature letting the administrator to make additional configurations to resources

  • sp_filestream_configure

SQLServer 2008 introduces the FileStream feature.

Recommendation:
    1. A frequently overlooked factor is parallel query execution. When testing we recommend setting the max degree of parallelism to 1. The reason for this is to catch issues before they reach production. For example, a Q&A test may run within performance guidelines but only does so because of parallel activities. The table should have had an index but because it did not SQL Server decided to do a sort with multiple workers. The presence of the parallel activity hides a tuning opportunity. When it is run in production the resources to allow for parallel execution may not be available and the serial plan is used. This quickly shows a performance problem as a serial sort is slower. By setting the parallelism to a value of 1 these types of issues are caught in testing and corrected before production installation. For testing and evaluation you should also consider the following configuration activities.
    2. Fix the server memory to a known min and max server memory setting.
    3. Run activity on the server to make sure that the SQL Server buffer pool as grown large enough to use the min server memory.
    4. Before a test execute dbcc freeproccache to remove all plans and clear key caches from memory.
    5. Before a test execute dbcc dropcleanbuffers to remove all clean buffers from memory (cold data cache)

We recommend use of these actions to establish the same starting point for all testing.

The output of MSINFO32.exe

This utility collects system details, including memory, cpu, driver versions and other hardware/software information about the system. With MSINFO32 you can establish an exact testing computer match.

√A Performance Monitor Capture

The performance monitor capture is important to establishing a baseline. Having key counters (all counters are preferred) collected during the data capture provides key data points for evaluation. When you perform a replay the same counters can be monitored to help evaluate various aspects of the testing.

Recommendation:

Make sure that you keep the size of the individual performance monitor log files around 300MB. Files larger than this frequently require significant resources and time to evaluate.

√User database backups

Capturing a backup provides part of the baseline needed to accommodate replay and stress activities. The database usually has primary keys, identities and other unique entities. Having a known state is required for most replay scenarios. The ultimate goal is to obtain a database backup on the server before capture is started. Making sure that no other activity occurs on the SQL Server before the capture begins. This is great goal but frequently not practical.

Instead capture a backup and start the trace right after the backup is complete. This can result in a jagged capture start. In a perfect world the capture would ensure that no activity or connections exist when the trace is started and before the trace is stopped all connections are closed. In reality the trace has to be started and stopped with existing activity on the server, making the ends jagged.

Another option is to start a trace and then capture the backup. Then filter the events that occurred before the backup completed.

The database backup is also useful to help secure the current metadata state, such as objectid’s. A stored procedure that is dropped and created is assigned a new objected. Without the backup, performance evaluation and other activities can be cumbersome.

These backups can also be very helpful for doing ‘what if’ analysis with the Database Engine Tuning Advisor (DTA). About 18% of performance issues reported to Microsoft SQL Server Support are corrected by adding index.

√User database clones

The SQL Server Management Studio Clone is a powerful tool. Cloning a database takes all database metadata. This includes statistical information and not the actual data.

The clone can be used with the captured trace. Performance evaluation, such as plan changes between SQL Server builds, can easily be achieved using the clone.

√Backup of master database

The master database contains instance wide data. The master database is frequently needed to replay.

√Backup of msdb database

In order to replay various SQLAgent job invocations the job definitions must be present. These are stored in the MSDB database. These jobs may also require extended entities such as DTS/SSIS packages to successfully execute.

Recommendation:
    1. Any queries from T-SQL job steps will already be present in the trace and RML files. If you start the SQL Agent service these same jobs may also be scheduled by agent resulting in duplicate actions. You may want to use the application filter in ReadTrace to avoid SQLAgent output to the .RML files.

 

√Representative Load Samples

Capture a trace that truly represents the common workload. One of the biggest mistakes made is capturing data during non-peak hours. This information is important but overlooks very important data points.

Recommendation:

1. SQL Server trace capture should always occur with a server side trace and the files should be written to a high speed I/O path. The trace output should be treated with the same performance needs as the database log files (.ldf). When tracing for replay, every command is output to the trace file. Putting the trace location on a UNC path slows the whole server as the events coming from 100s or 1000s of clients have to be streamed to the UNC path file. When you use the SQL Profiler GUI, the trace streams all events over a TDS connection similar to the UNC path. All trace output paths, except those written by the server to a file, can drop events. Dropping events invalidates performance data evaluation and replay.

2. SQL Profiler trace:

    • If you trace to SQL Profiler and not directly to a local subsystem, events can be lost. If the network pipe to the SQL Profiler utility becomes full events are dropped. There are also limitations on the size of the events that can be streamed to SQL Profiler.
    • Profiler also limits use of .TRC format compression. Server side traces may reduce the size of the .TRC files.

3. Server side trace:

    • The server side file trace is C2 compliant and events cannot be dropped. The full array of trace events is captured. The downside of the server side trace is that the trace output can become a bottleneck.
    • Make sure that you locate the trace output files on a high speed disk subsystem. Generally, we recommend that the output capabilities for trace resemble that of the database log file. Do not send the output to a UNC path or other slow device of this kind or the SQL Server performance can decrease.

4. Start Time, End Time and Duration

    • The timing values are invaluable to performance analysis. They are also important to replay. The timing is output to the .RML files for use by OStress. OStress uses the timing values to determine when to cancel a query, the duration for the query and various other activities.
    • The duration may seem redundant but SQL Server 2005 introduced micro-second durational timings. The precision between the start and end time used to be limited to approximately 8 milliseconds. Queries that ran faster than this resulted in 0ms durations in SQL Server 2000 traces. In SQL Server 2005 the SQL Profiler display shows the value in milliseconds but the .TRC format contains the microsecond values.
    • The RML Utilities internally use the microsecond values. This provides better data analysis capabilities. No longer does a query that runs 8ms but executes 1 million times fall out of the performance analysis view. The microsecond values are based on the RDTSC counter in SQL Server 2005. This can result in incorrect results. ReadTrace is aware of the condition and reverts to using (End – Start time) calculations when the duration column appears to be incorrect. This does cause loss of precision and a ReadTrace warning. For more information, see: http://blogs.msdn.com/psssql/archive/2006/11/27/sql-server-2005-sp2-will-introduce-new-messages-to-the-error-log-related-to-timing-activities.aspx and http://blogs.msdn.com/psssql/archive/2007/08/19/sql-server-2005-rdtsc-truths-and-myths-discussed.aspx

 

√The output from the T-SQL command: SELECT * FROM SYS.DATABASES

The database names and ids may be needed to create the DBID_Map.txt file so OStress can establish the correct database context. SQLDiag.exe will collect the sys.databases output.

√An architecture rendering of the environment

The architecture document should include machine locations, network topology and database file placement.

 

Quick Start with RML Utilities

This section introduces you to each of the tools by walking through a brief exercise and describes the correct way to evaluate performance data. It also shows you how to compare the data with replay, stress, database clones and other techniques. Extensive details for each component of RML Utilities are in the complete document of “Precision Performance for Microsoft SQL Server”.

The following files are installed by the RML setup.

File Description
OStress.exe An ODBC based application that is used to process .RML and .SQL files for stress and replay.
ReadTrace.exe Transforms SQL Server 2000 and SQL Server 2005 profiler trace files (.TRC) into .RML format. It provides filtering, mirroring and performance analysis capabilities.
ORCA.exe OStress Replay Control Agent is a singleton COM object that is used to coordinate sequencing between one or more OStress instances. Provides replay sequencing in addition to DTC import and export functionality.
ZipPipe.exe Streams .TRC(s) files from a .ZIP over a secure named pipe so ReadTrace can process it directly from memory.
CabPipe.exe Streams .TRC(s) files from a .CAB over a secure named pipe so ReadTrace can process it directly from memory.
Reporter.exe Report Viewer based application providing interactive performance analysis.
XCeedzip*.dll Support for Xceed zip file compression
Samples.cab Sample RML Utility files.
Control.ini Template control.ini file for OStress replay needs.
RMLHelp.xps RML Utilities Help documentation.

As shown in the diagram below, after the data is captured the trace data is processed by ReadTrace. ReadTrace Produces .RML files for OStress replay and loads the PerfAnalysis database. After the database is loaded Reporter can be used to review the performance data. You can examine various trends in the data, find the most costly queries and perform other activities.

 

image

 

Setup

The quick start exercises use the database PrecisionPerformance.

  • Execute the setup.sql with Ostress to create a database named PrecisionPerformance.
    • ostress –E –S(local) –isetup.sql –oc:\temp\PPSetup
    • Review the c:\temp\PPSetup\setup.out and ostress.log files.
      The script will return an error about dropping the database when it is executed the first time. This is expected and demonstrates the OStress error output.
      [Microsoft][SQL Native Client][SQL Server]Cannot drop the database 'PrecisionPerformance', because it does not exist or you do not have permission.

Capture

Capture secures the environment and collects the data needed for evaluation. Make sure that you secure output locations. Captures may contain sensitive data. Traces and other data generated by SQL Server and SQL Server tools have ACLs applied. These may match the SQL Server service account and prevent you from direct access to the files. You may have to take ownership of .TRC files in order to process them.

  • Modify the SD_Detailed.xml configuration file. (If you have not executed SQLDiag.exe you may have to do this to extract the xml configuration files.)
    In the SQLDiag directory copy the SD_detailed.xml to PPConfig.xml.
    • copy "C:\Program Files\Microsoft SQL Server\90\Tools\Binn\SD_Detailed.xml" "C:\Program Files\Microsoft SQL Server\90\Tools\Binn\PPConfig.xml"
      If these do not exist execute 'SQLDiag.exe /P %cd%' to create them.
    • Open the PPConfig.xml in notepad.exe and disable the trace event collection.
      Locate the following: <ProfilerCollector enabled="true"
      Update to the following: <ProfilerCollector enabled="false"

You cannot user SD_Detailed.xml configuration to capture the trace, because the SQLDiag, sd_detailed.xml file was included with SQL Server 2005. It no longer contains the event capture configuration needed by the RML Utilities.

  • Capture a clone of the PrecisionPerformance database. You can do this with the built-in SQL Server 2005, Management Studio functionality.
  • Secure the current database backups by using SecureState.sql. (This step should always be done as close to the time of trace capture as possible to eliminate jagged data activities.)
    • ostress –E –S(local) –iSecureState.sql –oc:\temp\PPSecure
  • Start capturing with the information by using SQLDiag.exe. The PPConfig.xml file will collect computer and SQL Server configuration information in addition to performance monitor counters.
    • sqldiag.exe /Ioc:\temp\PPSecure /IPPConfig.xml

You need to wait to see the message ‘collection started’ before continuing.
You should review the files that are added to c:\temp\PPSecure. They include SQL Server error logs, MSINFO32 and other configuration information. The key files are as follows:
<<MACHINE NAME>><<INSTANCE NAME>>sp_sqldiag_shutdown.out
<<MACHINE>>MSINFO32.TXT

The collection of MSINFO data can take several minutes.

  • Establish the trace data capture.
    • Open a query window in SQL Server Management Studio and load the TraceCaptureDef.sql file. You should keep this window active until you are finished with the quick start exercise.
    • In the script, alter the sp_trace_create file location to the c:\temp\PPSecure directory. Save the TraceID reported so that you can stop the tracing as soon as the workload is captured.
  • Generate a workload with the workload.sql file. The following simulates two clients, executing the workload 1000 times.
    • ostress –E –S(local) –iWorkload.sql –oc:\temp\PPWorkload –q –n2 –r1000
  • As soon as the workload has completed stop the trace and the SQLDiag capture.
    • For SQLDiag use CTRL+C.
    • Stop the trace with the Transact-SQL, sp_trace_setstatus commands provided when the trace was started.

Analyze

The objective of the initial analysis is to understand the tools and the captured data. Review the queries with the most resource usage, the queries that executed the most, the application or logon that uses the most SQL Server resources and other key data points.

The captured .TRC files can be loaded with ReadTrace for performance analysis and broken out into .RML for replay.

  • Execute the following command to process the captured trace data into the database PPProd.
    • ReadTrace –Ic:\temp\PrecisionPerformance.trc –oc:\temp\PPBreakout –S(local) –dPPProd

When ReadTrace finishes processing the data Reporter will be started so that you can evaluate the data. You can also use SQLNexus to load the trace data and other information that was collected by the SQLDiag. SQLNexus provides additional reporting capabilities supporting the extended data collection. The data is loaded into the PPProd database and the Performance Overview report is automatically displayed. The following exercises will help you become familiar with the data Reporter reports. You should leave the tabs open, makes it easy to return to the main tab when you perform the various actions:

  1. Click a Show/Hide button to view the chart display changes.
  2. Click 'Application Name' and find the application that took the most cpu.
  3. Click Parameters and reduce the time window.
  4. Click a Show/Hide button to view the chart display changes.
  5. Click 'Application Name' and find the application that took the most cpu.
  6. Click Parameters and reduce the time window.
  7. Click Batch Top N to drill through to Unique Batch Top N details.
  8. From the Unique Batch Top N click a query to see the Unique Batch Details.
  9. Use the Tabs menu to Export all tabs.

As you can clearly see there are various ways to analyze and evaluate the data. You can also open the performance monitor files (.blg) captured and review counters such as batches per second submitted to the SQL Server.

Reply to Generate Workload

During the initial analysis ReadTrace will generate the .RML files. OStress uses these as input to replay the captured workload.

  • Make sure that the replay instance does not have access to the production resources to prevent unexpected data damage.
  • Restore the user database PrecisionPerformance
    • Alter the file locations in the following commands as necessary and restore the database

/* restore filelistonly from disk = 'c:\temp\PrecisionPerformance.bak' */

restore database PrecisionPerformance from disk = 'c:\temp\PrecisionPerformance.bak'

with move 'PrecisionPerformance' to 'C:\temp\PrecisionPerformance.mdf',

move 'PrecisionPerformance_log' to 'C:\temp\PrecisionPerformance_log.LDF', stats

  • Review the SQLDiag capture, sp_configure settings, and make sure that the new instance has a matching configuration.
  • Locate the sample.ini OStress control file in the RML Utilities samples compressed file.
  • Repeat the steps that you performed for SQLDiag and Trace Event capture against this replay instance. Make sure not to overwrite the initial capture files by changing the output location. Rename the 'c:\temp\PrecisionPerformance' to 'c:\temp\PrecisionPerformance2' and 'c:\temp\PPsecure' to 'c:\temp\PPsecure\SecondCapture'.
  • Issue the following Transact-SQL commands before you start the replay workload.

dbcc freeproccache
dbcc dropcleanbuffers

These commands remove all plans and clean buffers from cache. We do not recommend this on a production server as it can have immediate performance impact. For replay purposes they provide a common starting point.

  • Issue the following command to generate the replay workload. OStress will connect to the replay instance and execute the workload. This example uses the in-process ORCA and forces the database name provided on the command line.

ostress –E –S(replay instance) –ic:\temp\PPBreakout\*.rml –cc:\temp\sample.ini –oc:\temp\PPBreakout\Replay –mreplay –T88 –dPrecisionPerformance –T33

  • As soon as the replay is complete stop the capture.

Analyze (Replay Workload)

Follow the analyze steps to load the newly captured .TRC with ReadTrace. Load the data in database named PPReplay. Reporter will be started to display the PPReplay data when the load finishes.

  • Open the PPProd reports for side by side comparison.
    • Select the 'Report | Load …' command.
    • Update the database name to PPProd and press OK.
    • Use the Window | Tile menu to display the information side by side.

One window will show the PPProd data and the other will show the PPReplay data. The side by side display allows for comparison of the data to see how well the replay performed.

Using the Clone

At this point you have now used all the RML Utilities to process, replay and evaluate the data. Use the clone to perform activities that do not require a full database backup.

Capture 1

1. Update the clone script by adding the following at the end of the script and altering the database name to PPReplayClone.
These commands are necessary so SQL Server does not generate or update statistics from empty tables. The updates, inserts and deletes will not affect any rows so duration, CPU, reads and writes are not interesting data points. However, the plan selection operations are interesting.

ALTER DATABASE PPReplayClone set AUTO_CLOSE OFF

ALTER DATABASE PPReplayClone set AUTO_CREATE_STATISTICS OFF

ALTER DATABASE PPReplayClone set AUTO_UPDATE_STATISTICS OFF

ALTER DATABASE PPReplayClone set AUTO_UPDATE_STATISTICS_ASYNC OFF

Merely set the clone to read-only will not work here, because it would fail modification commands and the statistics profile events are not produced.

2. Run the script and create the database PPReplayClone.

3. Begin a SQLDiag and Trace Event capture for the PPReplayClone activity.

4. Follow the steps in generate Replay section to replay the scenario against the clone and capture the outcome.

The Ostress workload command should change the database to –dPPReplayClone

5. Perform an analysis data load into PPClone1.

Capture 2

Follow the same steps outlined in Capture#1 but add the following step and load the captured trace data into PPClone2.

2a. Execute the following Transact-SQL command which updates all the statistics in the database. Because no data exists in the clone database, the statistics are empty. Different statistics should cause the query plans to change.

exec sp_updatestats

Compare Analysis

You now have PPClone1 and PPClone2 performance data loaded and expect to find plan changes. This section shows you how to find those changes. Cross database queries are used to discover plan changes. The normalized hash id values are used to match between captures. The hash id may change when you use a different version of ReadTrace. Execute the following Transact-SQL command to determine the unique batch hash id's which have plan variances. The full outer join is used so we can produce the 'exception' rows.

select * from

(select p.PlanHashID as [PlanHashID_1], b.HashID as [HashID_1], count(*) as [Executes_1] from PPClone1.ReadTrace.tblPlans p

inner join PPClone1.ReadTrace.tblBatches b on b.BatchSeq = p.BatchSeq

group by p.PlanHashID, b.HashID) as vwClone1

FULL OUTER join

(select p.PlanHashID as [PlanHashID_2], b.HashID as [HashID_2], count(*) as [Executes_2] from PPClone2.ReadTrace.tblPlans p

inner join PPClone2.ReadTrace.tblBatches b on b.BatchSeq = p.BatchSeq

group by p.PlanHashID, b.HashID) as vwClone2

on vwClone1.HashID_1 = vwClone2.HashID_2

where vwClone1.PlanHashID_1 is NULL

or vwClone2.PlanHashID_2 is NULL

or vwClone1.PlanHashID_1 <> vwClone2.PlanHashID_2

The resulting output resembles the following, showing the same batch hash id but a variant plan hash id.

PlanHashID_1         HashID_1        Executes_1   PlanHashID_2     HashID_2      Executes_2

------------------ ----------------- ---------- ---------------- --------------- -----------

4302612084712464844 8878247664037717474 2000 4835681870838061185 8878247664037717474 2000

1. Start Reporter.

With the unique batch, hash id open two windows in Reporter. One window connected to PPClone1 and the other PPClone2.

2. Drill through to any hash id's unique batch details report. (The hash id is not important to this step.)

3. Click the 'Hash Id:' link located in parameters area and type the unique batch hash id identified by the query. (Repeat this for both windows to do a side by side comparison of the same unique batch.)
Various links on the reports could be helpful for accessing the extended capabilities provided by the Reporter:

  • The following shows that the value link copies the Hash ID to the clipboard.
  • Clicking the Hash ID label enables you to enter any valid Hash ID and open that specific batch details report.

image

As soon as you have the side by side comparison, expand the 'Distinct Plans' region of the report to review the details. The plan hash ids are different because of the change in location for the sort operation.


PPClone1 Plan

image

PPClone2 Plan

image

Cross database query technique can be used in many ways. For example, you could take a capture from the first day of the month and use it as your baseline checkpoint. Every day after that you could capture trace data from the same time window and load it into secondary databases. With a cross database query against ReadTrace.tblUniqueBatches you could locate new queries that did not execute on first day. By modifying this you could plot query patterns over time. For example, the unique hash id could be used to plot the cpu, reads, writes, execution count and other usage data points to discover trends.

 

References and Links

SQL Nexus: http://www.codeplex.com/sqlnexus

SQL Server 2005 Service Pack 2, Performance Dashboard: http://www.microsoft.com/downloads/details.aspx?FamilyID=1d3a4a0d-7e0c-4730-8204-e419218c1efc&DisplayLang=en

Troubleshooting Performance Problems in SQL Server 2005: http://download.microsoft.com/download/1/3/4/134644fd-05ad-4ee8-8b5a-0aed1c18a31e/TShootPerfProbs.doc

RML Utilities 8.x Download: http://www.microsoft.com/downloads/details.aspx?FamilyID=5691ab53-893a-4aaf-b4a6-9a8bb9669a8b&DisplayLang=en

RML Utilities 9.0 Download: http://blogs.msdn.com/psssql/archive/2007/12/18/rml-utilities-for-microsoft-sql-server-released.aspx

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post
Page 1 of 1 (10 items)
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post