While some companies may be equipped with an abundance of bandwidth between their production and disaster recovery sites, many others are limited with their site-to-site bandwidth. As such, many implement data replication technologies that also perform data compression, de-duplication, and even fast-write capabilities in IP and fibre channel protocols.
In this particular case study, I’m working with a customer with two data centers, New York and Washington DC with a 50 Mb/s line between the two data centers. EMC RecoverPoint is the replication technology of choice, and the customer is doing bi-directional replication. The Washington DC site has about 4TB of data that needs to replicate to New York, and the New York site had about 10TB of data that needs to replicate to Washington DC.
In a perfect world (no latency, no packet loss, 100% utilization of the link), it would take roughly 7 days to replicate the 4TB and roughly 18 days to replicate the 10TB. That’s almost a month to move the data with the link fully saturated. Unfortunately, that link was also used for other business uses, e.g. VoIP traffic, internal application traffic, server monitoring traffic, etc. Thus the CIO mandated that we find another method to perform the initial synchronization, as using the link (even throttled) was not an option for this duration.
The EMC RecoverPoint Release 3.4 Administrator Guide (P/N 300-012-256) documents a method for performing first-time initialization from backup. The primary kicker here though, is that the backup must be a block-level backup, not a file-level backup. This is because the target RecoverPoint image will be seeded with that block-level backup and then RecoverPoint will perform a full volume sweep to synchronize the incremental changes since the block-level backup.
Most companies, however, do not perform block-level backups of their servers. Rather, they perform file-level backups, which then gets catalogued for easy restores. Below is a summary of the process I used to perform the RecoverPoint initial synchronization using dd as the block-level backup.
- Downloaded dd on a Windows utility server
- This is the tool we will use for the block-level backups.
dd if=[vol_source] of=[vol_target] bs=512k
- Note: I did some very rudimentary performance tests to see what block size would be optimal for these backups. I found 512k to be the sweet spot.
- Configured clones for all volumes that will be seeded with RecoverPoint. The main reason for this is two fold:
- I didn’t want to impact the performance of the production volume while dd reads from the source volume to create the backup.
- dd cannot operate against volumes with open files. Thus, we’d need to bring down the applications for the duration of the dd backup. When performing a dd against a mounted clone and against the PHYSICALDRIVE address, I did not get open file errors. Below is an example of the errors you will see with dd if there are open files.
C:\Utilities>dd if=\\.\H: of=z:\testvolume.img bs=512k rawwrite dd for windows version 0.6beta3. Written by John Newbigin <email@example.com> This program is covered by terms of the GPL Version 2. Error opening input file: 32 The process cannot access the file because it is being used by another process
- The source volumes were on 15k drives and the target volumes were on 7.2k SATA drives. I was able to copy roughly 5 GB/min (+/-0.5) with this process.
- For new source volumes, confirm that all the data has first been migrated before proceeding.
- Configure the consistency group(s) for the volumes in scope
- When finishing the consistency group, do not start the transfer. Leave the transfer paused.
- Right-click the consistency group, select “Clear Markers”
- This will let RP know that the remote site is known to be identical to its corresponding production volume. Thus a full volume sweep is not required.
- When the dialog box pops up, select both copies.
- Note: had to do this via command line because the GUI was only letting me clear the markers in the DR location. The command line without the copy=XYZ option allows you to clear all markers on both sides.
- Create the block-level copy with dd
- Transfer the copy to the secondary site. In our case, we shipped the USB drives to the secondary site.
- Enable image access on the secondary volume
- Select the latest image
- After access goes to logged access, enable direct access
- Restore the backup to the secondary volume
- Remember, you already did the clear markers before you did the first dd copy. If you do it again, it will mess up tracking where the replication should resume.
- No need to give the drive a drive letter or format it, as you can access it via the \\.\PHYSICALDRIVE2 address.
- Disable image access and start the transfer
- Check the “Start data transfer immediately” checkbox to resume replication
- Monitor the consistency group. The traffic you see will be the changes to the source volume since the block-level dd copy was made. The duration should be significantly less than if it was a full copy, depending on how much data has changed since the original dd backup.
Below are some of the results from the initial synchronization process. Note that between the dd on the source and the reverse dd on the secondary volume, roughly two days elapsed.
- 330GB consistency group 1
- At 50 Mb/s, it would have taken roughly 15 hours to perform a full sync.
- Initial synchronization took 58 minutes, transferring roughly 21GB (6.36%).
- We saved roughly 14 hours and 309GB of transfer.
- 330GB consistency group 2
- Initial synchronization took 43 minutes, transferring roughly 10GB (3.03%).
- We saved a little over 14 hours and 320GB of transfer.
- 330GB consistency group 3
- Initial synchronization took 50 minutes, transferring roughly 18GB (5.45%).
- We saved a little over 14 hours and 312GB of transfer.
- 330GB consistency group 4
- Initial synchronization took 53 minutes, transferring roughly 19GB (5.76%).
- We saved a little over 14 hours and 311GB of transfer.
plink -l admin -pw admin 192.168.10.10 "enable_group group=RPSyncGroup start_transfer=no" plink -l admin -pw admin 192.168.10.10 "clear_markers group=RPSyncGroup" dd if=\\.\[PHYSICALDRIVE##] of=z:\[PHYSICALDRIVE##.img] bs=512k [transfer the images to the secondary site via USB drive] plink -l admin -pw admin 192.168.10.10 "enable_image_access group=RPSyncGroup copy=DR_RPSyncGroup image=latest" plink -l admin -pw admin 192.168.10.10 "set_image_access_mode group=RPSyncGroup copy=DR_RPSyncGroup mode=direct" dd if=z:\[PHYSICALDRIVE##.img] of=\\.\[PHYSICALDRIVE##] bs=512k plink -l admin -pw admin 192.168.10.10 "disable_image_access group=RPSyncGroup copy=DR_RPSyncGroup start_transfer=no" plink -l admin -pw admin 192.168.10.10 "start_transfer group=RPSyncGroup" [monitor initial synchronization traffic]