Have a ‘recovery’ plan before disaster strikes

Have a ‘recovery’ plan before disaster strikes

March 29, 2010 By sanovi
Have a ‘recovery’ plan before disaster strikes

Early in the morning, after a cup of hot coffee at home, when I meet Lakshman Narayanaswamy in Nageswara Rao park, the topic of discussion is not fitness or finance, but disaster recovery. Lakshman is the co-founder and VP-Products of Bangalore-based Sanovi Technologies (www.sanovi.com), which helps organisations ‘proactively manage disaster recovery (DR) environments.’


For starters, ‘disaster recovery’ is all about an organisation’s ability to recover and operate core processes/services that enable it to transact business with its customers and partners, explains Narayanaswamy during the course of a subsequent e-mail interaction with eWorld.


Excerpts from the interview.

What are the misconceptions and myths to watch for, in the DR space?


There are several misconceptions that managers must be aware of when planning for their organisation’s DR:

We have not been hit by a disaster, these things don’t happen here.

We are replicating our data, that is our DR plan.

My team of experts can bring up our infrastructure in case of need.

DR is for big companies; my company can postpone the decision.


It is a well-researched and documented fact that 90 per cent of outages are caused by banal happenings, such as someone tripping on a cable, wrong configuration, plugged into the wrong port, and patch upgrade not working as expected. Less then 10 per cent of the outages are caused by fire, flood, etc. The everyday ordinary happenings may be low impact but cause business disruption nevertheless, resulting in loss of productivity and customer satisfaction.


Is there a quick test to find out if a business needs to consider DR? Also, is there an easy way to do DR?

A quick test that determines the need for DR can be summed in two questions:

What are the critical business processes and related IT systems?

What is the impact to business if critical IT systems become unavailable?


Approximate answers to the above questions can reveal the dependencies of the business on critical process and system. A more thorough analysis can help understand the financial exposure and any other exposures such as regulatory, loss of reputation, loss of productivity that adversely impact the business.


In terms of IT recovery readiness, what do you see as lessons from successful organisations?

Organisations that have deployed DR have the following common traits:

DR readiness has executive level sponsorship.

Recovery readiness is well integrated into their business process; it is not an add-on after-thought.

Focus is on inspecting, measuring and reporting on recovery metrics on a regular basis.

Reduced dependency on people; increased level of automation and processes that can be followed.


Are there levels of maturity in disaster recovery preparedness?

Gartner has described a DR maturity model that is useful to understand where a company stands and how it can progress towards a more mature DR capability. The lowest on the maturity chart is when there is no documented DR capability in the organisation.


Stage I is when DR is taken up as a project, some of the critical business processes and IT applications are identified and a DR capability is built and demonstrated. After the project is completed, the DR capability cannot be relied on.


Stage II is when DR readiness is implemented as a business process. This means that critical processes and IT systems have been identified and DR for these processes has been set up. Further, DR readiness is also tested and areas that failed are identified and fixed. Also, in this stage companies work on including the business users as part of the testing routine to ensure that business processes and people are also accounted for in the continuity plan and their needs and roles are also accounted for in the plan.


The final stage of maturity is when DR becomes integral to key business process. The organisation has a strong focus on DR and it is part of a larger risk management group in the organisation.


Monitoring, testing and regular reporting on compliance with key recovery metrics are part of the reporting to the executive management. Further, the scope of testing includes business users and key partners and vendors.


The organisation as a whole approaches recovery readiness as part of their planning and operations; it is no longer an add-on that is done after the completion of business process.


What are the areas of DR that attract research and innovation?

DR readiness touches several facets of an organisation and has a life-cycle that it goes through. The key stages of the DR life-cycle are DR planning, DR solution design and provisioning, DR monitoring and validation, recovery, testing and reporting.


Typically a DR plan fits into a larger business continuity plan for the organisation. There are two key metrics that dictate recovery readiness. The recovery point is the amount of data that an organisation is willing to lose in case of an outage; and the recovery time is the maximum amount of time an organisation can wait for an IT system/application to come up.


There are many options and ongoing research on reducing the recovery point and recovery time. Traditionally, as recovery point came close to zero, the cost of the DR solution including the hardware and software became higher.


New technologies and data protection methods have reduced cost and made DR solution with close to zero recovery point quite viable. Examples of new technologies that are enablers are virtual tape libraries, CDP solutions, and asynchronous replication that is adaptive.


Given the complexity and the heterogeneous nature of the data centre, being dependent on people to recover complex applications needs diverse skill sets.


DR recovery and failover automation tools that are DR-aware and can coordinate the bring-up of the various dependencies of an application are now available and deliver over 80 per cent reduction in recovery time.


Monitoring and validation are recent innovations to help the IT manager increase their DR readiness. Customer now has the tools to monitor, on a real time basis, recovery metrics such as recovery point; this is very powerful as it allows the user a real-time view of recovery readiness as opposed to having to do a drill to measure recovery readiness.


Along with the monitoring of recovery metrics, validation of the primary and DR environment is a huge pro-active step that can make the difference between successful recovery and failure. IT managers find keeping primary and DR environments in sync to be a constant challenge.


Having, therefore, a tool that alerts them of changes in the various layers of the stack makes a dramatic difference to recovery readiness. (For instance, Sanovi DRM is a DR management software that takes a life-cycle approach to DR readiness; it provides capabilities to provision industry best-practice DR solutions, monitor RPO and RTO, automate recovery and DR drills and obtain comprehensive reports on compliance.)


Would you like to talk about the impact of cloud computing and other newer developments on DR?

Virtualisation and cloud computing are major developments that impact and influence how DR is done. Server virtualisation can eliminate some of the challenges in a traditional DR setup, such as keeping the OS environment in sync between the primary and DR since the complete machine is replicated to the DR on a periodic basis.


The definition and offering in a cloud model is a large enough scope to warrant a dedicated discussion. In summary, cloud computing approaches the DR challenge in a different manner. Irrespective of the underlying technologies, an infrastructure cloud is assumed to offer an in-built DR capability that can meet specific recovery metrics. This is a fast-evolving area and sure to offer innovative solutions at very attractive cost points.