Experiences with Azure Site Recovery

As a provider of cloud solutions, our earliest use of Azure Site Recovery (ASR) wasn’t protection at all but rather migration.  We used ASR to replicate small sets of servers from customer premises to the cloud.  For customers running Hyper-V or VMWare with supported server images, ASR makes migration to the cloud almost trivially easy.

Strategies for Delivering Disaster Recovery

After our success with this limited use of ASR, we were interested in more challenging engagements.  We wanted to use ASR to deliver Disaster Recovery as a Service to customers.  Our partners at Microsoft were happy to help us plan our strategy.  For example, they told us that other business partners were focusing on the onboarding phase of a typical ASR engagement.  As a result, we designed offerings where onboarding and ongoing monitoring and management are individual options. For ongoing services we opted to give our customers a choice between monitoring their recovery solution themselves or paying us a modest fee to do so.

An Example DR Engagement

This strategy proved fortuitous.  One customer, Florida Surplus Lines Service Office, had a budget for the initial work but wanted to keep ongoing expenses down by doing all monitoring themselves.  We bid the onboarding a bit lower than initially planned.  That’s a decision I would second-guess myself on occasionally during the next few weeks.  In the end, even if our effective hourly rate was a bit less than we would have preferred, it was a valuable opportunity to learn and we delivered a solid solution.

A Successful Outcome

We did all the configuration on the Azure side, and worked closely with the customer on activities which had to be done on-premise.  Critical on-premise tasks were to run the deployment planner; set up a VPN gateway between local and Azure networks; set up a Configuration server for their VMWare environment; and update a critical Oracle/Linux server to a version supported by ASR.

Based on the output from the deployment planner, we divided their servers into three batches to initiate protection.  Each group took a day or two to reach protected status.  On the Azure side, we had two networks set up- a VPN-joined network hosting a secondary domain controller, and an isolated network for test failovers.  Our very first test was to fail over a domain controller to the isolated network.  We then promoted it to primary to have domain services available on that network.

Next we did a test failover of all protected infrastructure to the test network.   Test failovers do not impact protected workloads and can be used for non-disruptive DR readiness testing.  The customer confirmed that interactions between different parts of the failed-over infrastructure performed correctly.

Our final test was a true fail over and fail back of a test machine on their production network.  Their servers communicated effectively across a VPN gateway, and the failed-over server retained changes after failback.

At this point we and our customer were satisfied that their servers were properly protected.  Although they opted to monitor the solution themselves, we keep an engineer on their alert notification list.  We review the notifications from time to time, and as Microsoft continues to improve their monitoring tools we plan to keep them updated on features and practices that may be of use to them.