Initializing a Transactional Replication Subscriber from an Array-Based Snapshot

Rate This
  • Comments 5

Author: Benjamin Wright-Jones
Contributors: Ken England, Prem Mehra, Mike Ruthruff, Greg Yvkoff, Qun Guo and Gopal Ashok
Technical Reviewers: Joseph Sack, Jimmy May, Paul Randal (SQLskills.com), Dennis Tighe and Glenn Berry (SQL Server MVP)

Overview

This article describes how to initialize a transactional replication Subscriber from an array-based snapshot rather than using the native SQL Server snapshot mechanism. Initializing the Subscriber using a SAN-based restore solution is particularly beneficial for very large databases. In this context, I use the term VLDB to mean a database that is typically multi-terabyte and requires specialized administration and management This is primarily because the standard transactional replication initialization process, which is typically restricted by either the network or storage I/O bandwidth, could take longer than the business service-level agreement (SLA) permits because of the time needed to initialize or recover the Subscriber. In contrast, initializing a Subscriber using an array-based snapshot utilizes the Virtual Device Interface (VDI) freeze and thaw mechanism, thereby minimizing recovery time. This procedure is also particularly beneficial in non-production environments that use transactional replication and require repeatable tests with large volumes of data.

Scope

This procedure was performed using Microsoft SQL Server 2005 Enterprise Edition IA64 with Service Pack 2 (SP2) and cumulative update 9 running on Windows Server 2003 Datacenter Edition IA64. The procedure is expected to be identical in SQL Server 2008; however, this was not tested during the exercise. The storage array was provided by Hitachi Data Systems (HDS), and HDS Split Second was used to manage the array-based snapshots (backup and restore of the databases). VERITAS Storage Foundation HA software was used for volume management.

It should be emphasized that even though the hardware listed above was used, the principle of initializing a Subscriber from an array-based snapshot can be performed using other storage array network (SAN) vendor technologies – specific implementation details will vary. Microsoft recommends that customers attempting this procedure work closely with an engineer from the storage vendor to ensure the solution are implemented correctly.

When Is This Technique Useful?

There may be situations where the Publisher, Distributor, and Subscriber need to be restored after data loss. This technique:

·

Minimizes the transactional replication setup time for the Subscriber through the use of the underlying Virtual Device Interface (VDI) storage mechanisms, which reduce the time required to back up and restore large volumes of data.

· Supports repeated benchmark tests to re-establish a test baseline.

· Provides rapid recovery of the Subscriber if data loss has occurred and the database(s) need to be recovered from a point outside of the distribution retention period.

Background

Transactional replication has been available as a feature in SQL Server since version 6.0. Available functionality has grown since this time to include tracer tokens to measure latency, concurrent snapshot processing, and peer-to-peer replication. However, the general premise has remained the same: to replicate a copy or subset of the data to another database. The Publisher, Distributor, and Subscriber terminology is used to describe the nodes within the topology. For more information about replication, see SQL Server Books Online.

A transactional replication Subscriber can be initialized using one of the following mechanisms:

a) Transactional replication concurrent snapshot processing

b) Database snapshot (this requires SQL Server 2005 Enterprise Edition with Service Pack 2)

c) Initialize from log sequence number (LSN) (SQL Server 2008 only)

d) SQL Server backup (initialize from backup)

e) Copy of the data or array-based restore

A summary of the pros and cons for each technique is presented in a table below.

Option (a) – Transactional replication concurrent snapshot processing does not require an outage, and it allows production activity to occur on the Publisher while the initialization process is copying the schema and data to the Subscriber. Concurrent snapshot processing does not hold shared locks during snapshot generation, thereby allowing production activity to continue. However, a schema modification lock (Sch-M) is taken for a brief period. In contrast to option (d) and (e), this procedure also has the benefit of only copying the schema and data that is required rather than making a complete copy of the database.

Option (b) – SQL Server 2005 Enterprise Edition with Service Pack 2 introduced a new snapshot mechanism that permits the initialization of the Subscriber from a database snapshot. The database snapshot functionality was introduced in SQL Server 2005 to provide a read-only static view of the database using sparse files. Before a page is updated in the source database, the original page is copied to the sparse file. Subsequent updates to the same modified page do not prompt a repeat of this procedure. That is, the pre-change copy of a particular page only pushes into the database snapshot once after the database snapshot is created. It is possible to initialize a transactional replication Subscriber from a database snapshot as described above. This procedure is similar to the concurrent snapshot processing described in option (a) in that it permits transactional activity during the initialization procedure. However, although a database snapshot utilizes the sparse file capability of NTFS, with a lot of concurrent updates, the database snapshot may grow. It is important to ensure that sufficient storage space is allocated for the database snapshot sparse files to store data pages that have been updated during the Subscriber initialization period, because the database snapshot stores the pre-updated page images.

Option (c) – Initialize from LSN was introduced in SQL Server 2008 to aid the configuration of peer-to-peer transactional replication topologies. Initializing from an LSN can also be used in disaster recovery scenarios; instead of performing a full re-initialization of the Subscriber database, you can initialize a Subscriber from an LSN. This means that the Distribution Agent will apply transactions (after the supplied LSN), from the distribution database to the Subscriber, as long as the distribution retention period has been set appropriately.

Option (d) - A Subscriber can also be initialized from a SQL Server backup. For more information, see “Initializing a Transactional Subscription without a Snapshot” in SQL Server Books Online (http://msdn.microsoft.com/en-us/library/ms151705.aspx). This approach restores the complete dataset on the Subscriber and does not require an outage for the initialization, as long as the distribution database stores the in-flight transactions that were taken after the backup. Post-configuration administrative procedures may be required to remove unwanted objects and data on the Subscriber database.

Option (e) - It is also possible to initialize the Subscriber using a copy of the data. The database can be provisioned using any mechanism that will copy the Publisher schema and data to the Subscriber, such as a manual network file copy of the data and log files, a native SQL Server restore, or alternatively, an array-based restore. While this approach does require an outage during the initialization process, an array-based restore is particularly beneficial for very large databases (VLDBs) in either production or benchmark environments where the setup time can be minimized by using the VDI mechanisms. This technique was used to initialize the Subscriber from a HDS-array based snapshot, which is the focus of this article, and is discussed in more detail in the rest of this article. For more information about VDI, see “SQL Server 2005 Virtual Backup Device Interface (VDI) Specification” (http://www.microsoft.com/downloads/details.aspx?familyid=416f8a51-65a3-4e8e-a4c8-adfe15e850fc&displaylang=en

The table below summarizes the pros and cons of the transactional replication snapshot techniques.

Initialization technique

 

 

 

Online

 

 

 

Pros

 

 

 

Cons

 

 

 

Use when

 

 

 

Concurrent snapshot processing

 

 

 

Yes

 

 

 

Available in Workgroup, Standard, and Enterprise editions, allows online processing, generates a snapshot only for published objects.

 

 

 

Duration of Subscriber initialization is impacted by size of database, network, and so on. DML statements can be expensive.

 

 

 

Online processing is a requirement, storage space is limited on the Subscriber.

 

 

 

Database snapshot

 

 

 

Yes

 

 

 

No locking on the Publisher database, in comparison to concurrent snapshot processing.

 

 

 

Enterprise Edition only. Requires sufficient storage for the snapshot sparse files.

 

 

 

Online processing is a requirement, concurrency is essential, sufficient storage exists for the sparse file.

 

 

 

Initialize from LSN

 

 

 

Yes

 

 

 

Full snapshot not required, allows online processing.

 

 

 

SQL Server 2008 Enterprise only, must be able to determine correct LSN.

 

 

 

Only consider if a full snapshot is not feasible and not necessary. May also be suitable for environments that use database mirroring and transactional replication.

 

 

 

SQL Server Backup  (initialize with backup)

 

 

 

Yes

 

 

 

Full snapshot not required, allows online processing.

 

 

 

Restores a complete copy of the database on the Subscriber (for example, requires more storage space) and persists copies of all objects unless removed.

 

 

 

Data volume is manageable and offline initialization is not an option. Speed of backup and restore is preferable to use of the BCP utility to copy data.

 

 

 

Copy of data /
array-based restore

 

 

 

No

 

 

 

Rapid backup/restore of large data volumes, replication snapshot is not required.

 

 

 

Offline processing, must stop transactional activity, requires a complete copy of the data on the Subscriber.

 

 

 

Volume of data is in the terabyte range and backup and restore times are a priority.

 

 

 

Pros and Cons for Transactional Replication Initialization Techniques

Setup Procedure – Initializing the Subscriber from an Array-Based Snapshot

The procedure to initialize the Subscriber from a copy of the data is similar to using the native SQL Server restore mechanism. However, the array-based approach is more useful for large databases, because hardware implementations of the VDI freeze and thaw mechanism allow large amounts of data to be copied quickly and in a transactionally consistent manner, thereby reducing the restore time.

High-Level Steps

The following procedure was used to initialize the Subscriber during a high-end benchmark with a multi-terabyte database. The backup and restore of the database was performed using the HDS array capabilities. This includes the use of HDS Split Second, which is a command-line utility developed by Hitachi Consultancy Services. We also tested a full database backup operation with SQL Server 2005. However, this was simply to benchmark the operation for a 17-terabyte database.

It is important to note that application activity must be paused during this operation, because any change in data at the Publisher will result in data inconsistency. This inconsistency will be raised as an alert in Replication Monitor. One can use tablediff, a command line utility that returns detailed information about the differences between two tables. It can also generate a Transact-SQL script to bring a subscription into convergence with data at the Publisher. This utility can be used to correct data inconsistencies; however, it is recommended that precautionary steps be put in place to ensure that the application or any other activity cannot write to the Publisher or Subscriber databases during this procedure. These steps are discussed later. Alternatives to this approach that allow transactional activity during the setup procedure include concurrent snapshot processing, initialization from backup, and initialization from a database snapshot. Generating a replication snapshot for this volume of data would take many hours, so we elected to initialize the Subscriber using an array-based restore in order to minimize the movement of data.

Initializing the Subscriber

The following steps were used to initialize the Subscriber from the array-based restore. The majority of the time was consumed by the backup and restore procedures using HDS Split Second. This was approximately two hours. The other activities were DBA operations to modify and validate the replication metadata.

1. Pause application activity.

2. Disable the application login(s) using ALTER LOGIN <login> DISABLE.

3. Ensure the database is in RESTRICTED_USER WITH ROLLBACK IMMEDIATE mode to clear user connections that may still be in the database. Please note that RESTRICTED_USER does not prevent access to applications run under the context of sysadmin, dbcreater, or sysadmin.

4. Back up the primary (Publisher) database using a storage array-based mechanism.

5. Restore the primary (Publisher) database on a separate instance using the storage array-based mechanism. This database will become the Subscriber.

6. Recover the Subscriber database and check the SQL Server error log to verify completion of the recovery operation.

7. Set the RESTRICTED_USER mode on the subscription database to prevent any user activity. This is simply an additional precautionary step to ensure that applications or users cannot access the database. Please refer to step 3 above, because the same caveat applies.

8. Enable the NOT FOR REPLICATION value on the Subscriber tables (and any other objects that will be published for replication). The NOT FOR REPLICATION option enables you to specify which database objects are treated differently if a replication agent performs a transactional operation. For example, the identity column value is not incremented if a replication agent performs an insert operation. For more information, see “Controlling Constraints, Identities, and Triggers with NOT FOR REPLICATION” in SQL Server 2005 Books Online (http://msdn.microsoft.com/en-us/library/ms152529(SQL.90).aspx) or SQL Server 2008 Books Online (http://msdn.microsoft.com/en-us/library/ms152529.aspx).

9. Create the publication(s) on the Publisher.

10. Drop any redundant columns on the subscription database using ALTER TABLE DROP COLUMN. For our tests, binary large object (BLOB) columns were dropped, because we did not replicate columns of this data type (primarily to reduce the storage requirement on the Subscriber). We did not drop any other objects, because there were scenarios where we wanted to be able to view or query copies of nonpublished tables on the Subscriber.

11. Optional: Reclaim storage space on the Subscriber by using DBCC CLEANTABLE in the subscription database and by specifying a batch size to reduce the impact on the transaction log. In previous smaller volume tests, we also reclaimed the BLOB storage space, using DBCC CLEANTABLE with a batch size of 100,000. If a batch size is not specified, DBCC CLEANTABLE processes the whole table in one transaction and the table is exclusively locked during the operation. This can require considerable transaction log space for very large tables.

12. Create the subscriptions using the replication support only sync_type of sp_addsubscription. This is a key component of the process, because it indicates that the Subscriber already has a copy of the schema and data and does not require initialization with a replication snapshot.

13. Perform DBA checks. For this exercise, we used Transact-SQL scripts to ensure that the number of objects marked for publication was consistent with the number of objects marked for replication at the Subscriber. For example:

-- Count the number of columns with the NOT FOR REPLICATION option set to 1.

-- Execute this on the Publisher and Subscriber to ensure the count is consistent.

-- Investigate further if there is a difference.

USE <Publisher or Subscriber>

GO

SELECT COUNT(*) NOT_FOR_REPL_IDENT_Tables

FROM sys.identity_columns

WHERE is_not_for_replication = 1

AND OBJECTPROPERTY(OBJECT_ID, 'IsMSShipped') = 0;

Similar statements can also be executed for foreign keys, triggers, and constraints that may require this property to be set.

14. Enable the publication and subscription databases for MULTI_USER activity.

15. Enable the application login(s); for example, use ALTER LOGIN <login> ENABLE.

It is advisable to launch Replication Monitor (Sqlmonitor.exe) to verify that the publications are healthy and that the Log Reader and Distribution agents are not reporting any problems. Similarly, posting a tracer token will also provide a good idea of latency between the databases.

Restoring the Databases for Repeated Benchmark Tests – Initializing the Subscriber from an Array-Based Snapshot

There may be situations where the transactional replication databases need to be restored to a previous point in time, for example, in a benchmark environment to re-establish the baseline for repeated tests. This eliminates the need to re-create the replication objects and also synchronise the Publisher and Subscriber, which can be time-consuming for large databases.

The procedure is similar to that shown above, with the added advantage that SQL Server transactional replication has already been configured and is functioning correctly.

The following steps were used to initialize the transactional replication Publisher, Distributor, and Subscriber from the array-based restore prior to each test cycle.

1. Pause application activity.

2. Disable the application login(s) by using ALTER LOGIN <login> DISABLE; on both the publication and subscription databases.

3. Ensure that the Publisher database is in RESTRICTED_USER WITH ROLLBACK IMMEDIATE mode to clear user connections that may still be in the database. Connections under the context of sysadmin, dbcreator, or db_owner will still be allowed.

4. Stop the SQL Server Agent services on both the Publisher and Subscriber, or disable the Log Reader and Distribution agent jobs to ensure that the array-based backup and restore mechanisms have exclusive access to the database.

5. Set RESTRICTED_USER mode on the subscription database to prevent any user activity.

6. Use the HDS array-based restore mechanisms to:

      a. Restore the Publisher database.

      b. Restore the distribution database.

      c. Restore the Subscriber database.

Note: For large OLTP databases with high transaction volume, you may observe intermittent I/O in System Monitor or Performance Monitor (perfmon.exe) after the restore and recovery. This is typically due to the GHOST_CLEANUP process, which is removing records that have been marked for deletion. This process can be observed in the sys.dm_exec_requests catalog view.

7. Start the SQL Agent Service on the Publisher, Distributor, and Subscriber.

8. Enable the Publisher and Subscriber databases for MULTI_USER activity.

9. Enable the application login(s); for example, use ALTER LOGIN <login> ENABLE.

As with the previous procedure, it is recommended that Replication Monitor be launched to verify the health of the publications. The Log Reader and Distribution agents should also be launched before the application logins are enabled.

The restore of the Publisher, Distributor, and Subscriber databases can be performed in parallel using the HDS array-based restore mechanism. This approach was used to re-establish the baseline for repeated benchmark tests.

Observations and Data Points

A number of observations were documented during the initialization of the Subscriber during the benchmark. This relates to both the initial setup as described above and also to repeated tests conducted during the benchmark. Figure 1.0 below provides a graphical illustration to accompany the following data points. Numbers denote phases in figure 1.0.

1. The data was first loaded into the primary OLTP database without using transactional replication.

2. During this data load phase, the data was shadowed to a second set of volumes. Owing to the volume of data, this was deemed the most appropriate mechanism to provision the Subscriber database due to the volume of data. Loading data with transactional replication enabled was not appropriate, because the operation is fully logged regardless of the database recovery model.

3. After the data load had completed, the volumes were dismounted from the Publisher and mounted on the Subscriber. Transactional replication was then configured using the ‘replication support only’ sync_type parameter to avoid the need for a replication snapshot.

4. The 17-terabyte Subscriber database was then paired with a set of volumes as a backup. This process took approximately 17 hours.

A full backup of the 17-terabyte OLTP database using the native SQL Server 2005 backup mechanism took approximately four hours using eight streams. Please note that this was conducted on SQL Server 2005 Enterprise Edition. SQL Server 2008 may offer additional gains due to the native backup compression. For the native SQL Server 2005 backup procedure, the following Transact-SQL script was executed.

USE [master]

GO

DECLARE

@DateStr nvarchar(50),

@Stmt nvarchar(1000),

@Filepath nvarchar(50),

@Quote nvarchar(1),

@Databasename nvarchar(50)

SET @Quote = char(39)

SET @Filepath = 'DISK=' + @Quote + '<directory>'

SET @Databasename = '<database>'

SELECT @DateStr = N'_' +

CONVERT(nchar(8),

getdate(), 112) +

N'_' +

RIGHT(N'0' + rtrim(CONVERT

(nchar(2), datepart(hh, getdate()))), 2) +

RIGHT(N'0' + rtrim(CONVERT

(nchar(2), datepart(mi, getdate()))), 2) +

RIGHT(N'0' + rtrim(CONVERT

(nchar(2), datepart(ss, getdate()))), 2)

SELECT @Stmt =

'BACKUP DATABASE ' + @databasename + ' TO ' +

@Filepath + @databasename + @DateStr + '_1.BAK' + @Quote + ', ' +

@Filepath + @databasename + @DateStr + '_2.BAK' + @Quote + ', ' +

@Filepath + @databasename + @DateStr + '_3.BAK' + @Quote + ', ' +

@Filepath + @databasename + @DateStr + '_4.BAK' + @Quote + ', ' +

@Filepath + @databasename + @DateStr + '_5.BAK' + @Quote + ', ' +

@Filepath + @databasename + @DateStr + '_6.BAK' + @Quote + ', ' +

@Filepath + @databasename + @DateStr + '_7.BAK' + @Quote + ', ' +

@Filepath + @databasename + @DateStr + '_8.BAK' + @Quote +

' WITH NOFORMAT, NOINIT, SKIP, NOREWIND, NOUNLOAD, STATS = 5'

EXEC (@Stmt)

You should modify the @Filepath and @Databasename variables if script reuse is intended.

5. New gold (baseline) backups were occasionally required due to schema changes like indexes or partitioning for performance reasons. This HDS Split Second backup took approximately 30 minutes to complete; this was also a function of how much data had changed since the last restore, because the data had changed during the test cycle. While this backup process typically occurs almost instantaneously in the background, we opted to wait for the completion signal from the array before commencing a new test. SQL Server 2005 differential database backup timings were comparable and took approximately the same amount of time.

6. Following a test run, a restore of the Publisher, Distributor, and Subscriber databases took approximately two hours. This was also a symptom of the data that had changed during the test. However, approximately one hour and thirty minutes of this duration was due to the import and export of the VERITAS volumes, which was sequential in nature.

The Publisher and Subscriber databases were restored in parallel, thereby reducing the restore time for the repeated benchmark tests. The Distributor could be restored in parallel; however, we opted to conduct this step separately, because we wanted more control over the procedure.

  repl

Figure 1.0. Benchmark Data Points

As previously mentioned, before commencing a test, ensure that the GHOST_CLEANUP process has completed on both the Publisher and Subscriber databases (following a restore), because it may affect I/O measurements taken during the test.

Summary

Initializing a transactional replication Subscriber from an array-based restore is beneficial when the data volume is very large and the restore and configuration time need to be minimized. Though this can still be achieved using any other restore or data copy mechanism, the benefit of an array-based VDI freeze and thaw, which allows large amounts of data to be copied quickly and in a transactionally consistent manner, significantly reduces backup and restore time.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post
Page 1 of 1 (5 items)
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post