By Carl Mazzanti, CEO, eMazzanti Technologies
Businesses disasters are usually classified in three categories: natural, such as hurricanes and earthquakes; technological failures; and human, either on purpose or by accident. But no matter what causes a disaster, the need to recovery and get a business back up and running is paramount.
Disaster recovery and business continuity today are often thought of in terms of recovery point objectives (RPO) and recovery time objectives (RTO). In other words, how much data is a company willing to lose if its network goes down.
For example, a company that continuously replicates all backups to separate data centers that are actively up and running 24/7 has created an architecture with a tight RPO and RTO. A business that allows data to be replicated off-premise asynchronously or backed-up only to tape, expects it will lose some of the data being transmitted at the time of failure and assumes it will take longer to restore systems.
As recent events such as Hurricane Sandy have shown, not only can catastrophic events devastate a natural environment, but they can leave a company’s IT infrastructure in ruins as well. That is why it is extremely critical for a business to initiate or enhance on a comprehensive disaster recovery plan (DRP) to protect its business and the livelihoods of employees in the event of a disaster.
1. Focus on prevention
Assess your risks and potential business impacts to determine ways you can minimize the potential for disasters in advance. Conduct regular audits and system checks of your fire prevention and safety systems.
2. Disaster Recovery Plan Needed
The Disaster Recovery plan needs to represent all functional areas within IT prior to, during, and after a disaster. It needs to include applications, networks, servers & storage. Contingencies, such as “what-if” scenarios should be considered as part of planning process. Decisions need to be made regarding levels of disruption that will constitute a disaster, downtime and loss tolerances.
As an example, a typical Disaster Recovery Plan (DRP) might include the following:
- Information Technology Statement of Intent — This sets the stage and direction for the plan.
- Policy Statement — Very important to include an approved statement of policy regarding the provision of disaster recovery services.
- Objectives — Main goals of the plan.
- Key Personnel Contact Information — Very important to have key contact data near the front of the plan. It’s the information most likely to be used right away, and should be easy to locate.
- Plan Overview — Describes basic aspects of the plan, such as updating.
- Emergency Response — Describes what needs to be done immediately following the onset of an incident.
- Disaster Recovery Team — Members and contact information of the DR team.
- Emergency Alert, Escalation and DRP Activation — Steps to take through the early phase of the incident, leading to activation of the DR plan.
- Media — Tips for dealing with the media.
- Insurance — Summarizes the insurance coverage associated with the IT environment and any other relevant policies.
- Financial and Legal Issues — Actions to take for dealing with financial and legal issues.
- DRP Exercising — Underscores the importance of DR plan exercising.
- Appendix A — Technology Disaster Recovery Plan Templates — Sample templates for a variety of technology recoveries; useful to have technical documentation available from select vendors.
- Appendix B — Suggested Forms — Ready-to-use forms that will help facilitate the plan completion.
3. Keep the Disaster Recovery Plan Current
Disaster Recovery planning needs to be part of the everyday operations of the IT environment. Once the Disaster Recovery plan is created, it needs to be maintained and updated every time an element within the IT environment changes. For example, when key personnel change or when insurance coverage changes. The dynamic nature of IT environment ensures that the Disaster Recovery plan will fail if the management of the plan is not part of change management.
4. Inventory all IT assets
Having a complete picture of what you have is essential. Inventorying all your assets allows you to structure your priority systems and ensure that each server has been recovered. Literally, being able to cross them off as they restored is a simple, systematic way to approach getting a business back up and running.
5. Appoint a disaster recovery team
Create a team of employees who know exactly what to do during an emergency and can assess damages and implement recovery plans in the aftermath of a disaster. Make sure you include someone from all areas of the business. Appoint a leader to be in charge of developing, managing and updating your disaster recovery plan.
6. Store System Passwords
Passwords should be stored in at least two separate secure locations, only one of which is in the same building as your IT equipment. At least two staff members should have access to them.
7. Document, document, document!
Make sure that the whole recovery process to get you up and running again is documented, and includes the locations of system recovery and other critical discs. Make sure key staff are familiar with these items.
8. Start Documenting Downtime Events
Most of these downtime events are minor and easily or quickly corrected. Think of it as an on-going snapshot of what’s going on with your network. All events provide valuable lessons. A record needs to be kept to help assess the status quo and to enable a fact-based discussion with management.
9. Automated Text Notification
Explore DRP capabilities that allow the disaster team to be automatically notified via text messaging. These staff members should be thoroughly trained so that they can perform basic disaster recovery/back-up tasks unsupervised. You may be able to do this through an arrangement with a third-party service provider.
10. Practice your disaster recovery plan
DRPs should be tested on a quarterly basis or more. This not only sharpens your disaster recovery team’s skills but it will also familiarize new staff with the procedure. This ensures that your disaster recovery strategy is kept up to date by revealing any issues with new equipment or software. The Disaster Recovery plan needs to be tested regularly to ensure the business can recover the operation successfully and in a timely fashion. Disaster Recovery testing is a major challenge for most IT departments, but if recovery has not been tested all the way to the application level, it is very likely that problems will occur. Even though a Disaster Recovery test is a major operational disruption it shouldn’t be treated as a pro forma exercise but needs to include true end-to-end testing all the way to production. For example, the focus needs to be on recovering applications rather than servers since with today’s complex applications, client server and web-based multi-tier applications, the components reside on multiple servers thus there are inter-dependencies between these. If disaster recovery has not been tested all the way to the application level, it is very likely that problems will occur.
11. Back it up
No matter how good your disaster recovery plan, it cannot recover data that’s not there if you neglect to back it up. Make sure there is a routine for backing up data regularly, and ensure it is done. Using at least Raid Level 5 (Raid Level 10 if the budget allows) to ensure data duplication ensures fault tolerance. Build as much redundancy in your system as possible to remove any single points of failure. This includes a multi-path data route to the system, so that you can still access your data if one or more path fails.
12. Maintain offsite data backups
A comprehensive tape archive strategy is critical. To minimize recovery times in situations where the physical assets of the primary data center are still operational, backup data has to be available on locally stored tapes.
In addition, it’s crucial to protect business operations from the risk of the destruction of the data center. That means backup tapes have to be available at a secondary location. Maintaining an up-to-date copy of backup data at an offsite location is worth almost any price. A local fireproof vault is not an adequate alternative because, depending on the circumstances, the vault may not offer sufficient protection or may not be accessible quickly after a disaster.
Another important point: pure online backup solutions can be helpful but only if there is an internet connection during the disaster and only if it’s fast enough to restore data in a timely fashion.
13. Prioritize Data and Applications
All data and applications are not created equal. Some will be indispensable in reestablishing the business and need to be restored first. Recovery of secondary applications and data can be deferred until the critical applications and data are restored. The data recovery plan should explicitly state the recovery order of data and applications to reflect these priorities.
14. Don’t omit standalone data from the recovery plan
Increasingly, business-critical data and documents are stored on laptop and desktop computer disk drives. The data recovery plan should include details on how this data will be backed up and recovered if lost.
15. Spare Hot Hard Drives
Arrange to have spare hot hard disk drives already in the system, or at least physically available in the same room as your storage system.
16. A tape archive strategy is crucial
Tapes used on a daily basis should be replaced every six to nine months to avoid deterioration – backups are no use if they cannot be recovered. Other tapes should be replaced on a regular, less frequent, schedule based on the frequency of use. Being able to back up to a remote location is worth almost any price, a fireproof vault is not an alternative to an off-site location.
17. Try a restore
The biggest problem in DR is when you recover all your backup data only to discover that you don’t have everything you need to bring your application back to life. Try a complete restore.
18. Secure the Best UPS
Get yourself the best, longest-life, most uninterruptable power supply you can. Then get an additional battery back-up for your cache to go with it.
19. Test your plan
Do an “ad hoc” tabletop exercise. Put some sticky notes on various pieces of hardware in your data center or on the monitors of your personnel indicating software or hardware failures. Then call your DR team into a meeting room and walk through the procedures to address the mock disaster scenario. This is a lot cheaper than scheduling a formal DR test event and it allows you to test procedures in a sequential way that provides a great rehearsal for recovery team participants
20. Protect Against Random Threats
Don’t neglect to protect yourself from random theft, vandalism and employee malice, as they can be just as disastrous as anything else. At the very least ensure that the door to your data/server room is locked, day and night.
21. Operational Fire Doors
An automatically closing fire door to the data/server room will keep fire and smoke out of the room for a surprisingly long time
22. Maintain Multiple Communication Channels
When staff has to be notified of a DR event, normal communication channels, such as email and phone, may be disrupted. Consider text messaging, personal email addresses and alternate phone numbers as alternative communication vehicles. In addition, there are third-party companies that can handle disaster communications.
23. Go Beyond Technology
Businesses both large and small often miss one major disaster recovery point. Disaster recovery involves more than just technology. You absolutely need to have extra copies of data, systems that keep running when a disk fails, and offsite backup. But all of that won’t do much good in the event of a major earthquake or hurricane if there is no water or power for five days.
This might include having a diesel generator on site with enough fuel for several days, or even something creative like having the boss’s (or an employee’s) motor home double as an Emergency Center. Such vehicles typically include their own generator and a wireless internet connection. The disaster recovery plan, therefore, could entail having key employees coming to work at that location and getting the computer systems up and running from there.
24. Hire a Managed Services Provider
For most small-to-medium sized businesses, not only is implementing a strong DRP cost prohibitive, but also the right technical skills are absent. Managed Service Providers (MSP) have emerged within the past decade to help businesses perform these tasks. MSPs have the technical personnel to design, implement, and manage complex DR projects. Additionally, they have the server, storage and network infrastructure to carry out a DRP. An MSP will also work closely with your business to keep costs at a minimum. That said, if your business is mortally wounded because of a disaster, there is no price tag that can make up for failure to plan.
Below are other articles that might be helpful as you consider disaster recovery solutions:
eMazzanti Sandy Disaster Recovery Story
About eMazzanti Technologies
eMazzanti’s team of trained, certified experts provide 24×7 outsourced IT support to help ensure business productivity, address challenges of growth, mobility, cloud computing and critical business continuity and disaster recovery demands. The consultant has special expertise in financial, architectural, engineering, construction, government, educational, legal services, accounting, marketing communications, and healthcare market segments. Flexible support plans range from fixed-fee, around-the-clock network management where eMazzanti functions as an extension of a businesses’ IT staff to a custom-solution provider on an as-needed basis. eMazzanti serves the Hoboken, NJ and NYC area markets as well as regional, national and international business support requirements . The IT firm is Microsoft’s 2012 Partner of the Year and on-going Gold Partner, a four-time recipient of WatchGuard’s Partner of the Year award and has achieved the Inc. 5000 list for the third year in a row. For more information contact: Carl Mazzanti 201-360-4400 or emazzanti.net. Twitter: @emazzanti , Facebook: Facebook.com/emazzantitechnologies.
Carl Mazzanti, CEO
[BM1]Hyperlinked to actual article or release on eMazzanti.net