| Disaster recovery planning (DRP) starts with a | | | | without prior approval and at any time of the day. |
| discussion that involves key management | | | | The configuration change doesn't work as |
| employees. It is important to get their support | | | | expected and it is 10 am while employees are |
| with any disaster recovery initiative. Explain what | | | | starting their day. Guess what, your day just got |
| disaster recovery is and why it is required for | | | | longer. Pro-Active fault and performance |
| business continuity, cost reduction, generating | | | | monitoring strategies will indicate when a device or |
| revenue and improving productivity. Disaster | | | | server is not operational or near capacity. Those |
| scenarios such as fire, flood, earthquake, cold | | | | situations will obviously affect network availability. |
| weather and employee sabotage should be | | | | The performance assessment will describe how |
| discussed. Alternate vendors should be discussed | | | | well the network is performing and whether there |
| as well as a potential issue with business | | | | are any capacity issues and what offices are |
| continuity. | | | | affected. The infrastructure assessment will focus |
| Risk Assessment | | | | on issues such as media mismatches, switch port |
| The Risk Assessment is a " what if analysis " that | | | | capacity, IOS version problems, router memory |
| describes the amount of risk associated with the | | | | shortages, application software versions and |
| current state of the network. The following are | | | | protocols. Facilities are considered with an |
| some things to consider before any disaster | | | | availability assessment and focus on rack space, |
| recovery strategy formulation. | | | | temperature controls, power availability and raised |
| • Average cost per/minute that your | | | | floors. |
| network is unavailable. | | | | Select Failover Strategies |
| • Cost of replacing servers, applications, | | | | 1. On-Line data synchronization between the |
| circuits and devices. | | | | production Data Center and a remote Data |
| • What if any disaster recovery plan exists | | | | Center facility. The cutover or convergence time |
| and how extensive it is. | | | | should be transparent to employees and all |
| • Have alternate vendors been identified | | | | current data would be available. This requires the |
| should primary vendors have their own disaster | | | | cost of a remote facility with routers, switches |
| recovery problems. | | | | and matching servers and applications to |
| Disaster Recovery Strategy | | | | synchronize the offices. Cisco distributed director |
| The disaster recovery strategy describes | | | | technology can be utilized to configure both Data |
| operational changes, design changes and failover | | | | Centers for concurrent operation if that is |
| strategies for business continuity. An action plan | | | | required. |
| document is created that describes all those | | | | 2. Configure the distributed director to redirect |
| strategies and a detailed escalation procedure | | | | sessions to the alternate Data Center once a |
| should the network become unavailable. It should | | | | certain percentage of TCP sessions were running |
| document employees, responsibilities, time frames, | | | | at the primary Data Center. It is still a good idea |
| event sequence, vendors and processes. | | | | to consider standby sites as described below since |
| The following describes recommended operational | | | | both on-line Data Centers could be unavailable. |
| changes: | | | | 3. Configure a 48 hour standby site for the |
| 1. Network Documentation | | | | company which is a remote facility that has all the |
| Automate the network documentation process. It | | | | equipment necessary for restoring a specified |
| is difficult to restore a network without having | | | | service level within 48 hours. This is a |
| current documentation of the network before it | | | | temporary strategy for continuing network |
| became unavailable. Running a network | | | | service for a short time frame before the |
| assessment will collect some information however | | | | problems are fixed or cutover to a 10 day site. |
| you need application and device configurations as | | | | This can be provisioned by company employees |
| well. Find a tool that will automate this process ! | | | | or contracted to a third party DRP vendor. |
| Document these items: | | | | 4. Configure a 10 day standby site for the |
| • Current Topology | | | | company which is a remote facility that has all the |
| • Infrastructure | | | | equipment necessary for restoring all specified |
| • Security Policies | | | | services within 10 days. This would be utilized in |
| • Management Strategy | | | | a situation where restoration of Data Center |
| • Application Configurations, Versions and | | | | services would require months. This can be |
| Patches | | | | provisioned by company employees or contracted |
| • Device Configurations, IOS Versions and | | | | to a third party DRP vendor. |
| Firmware | | | | Contingency Testing |
| 2. Regular Backups rotated off-site and tested for | | | | Test your disaster recovery (business continuity) |
| data integrity | | | | strategy utilizing the action plan document from |
| The following list describes recommended design | | | | the strategy phase. There should be a meeting |
| changes: | | | | with specific employees and vendors to discuss |
| Review and modify design, infrastructure, | | | | responsibilities, time frames, test event sequence |
| configuration, security and management for | | | | and processes. The company strategy and action |
| improved network resiliency and availability. It is | | | | plan should be changed as problems are identified |
| my contention that running a network | | | | from the testing phase or company requirements |
| assessment is an effective strategy for | | | | change. Plan on regular testing of the disaster |
| determining what changes should be made to | | | | recovery plan 3 - 4 times per year. |
| your network. The argument could be made that | | | | Recommendations |
| all assessment groups have some affect on | | | | The results from contingency testing will be |
| network availability and resiliency. The availability | | | | utilized to make sound recommendations for |
| assessment will collect most of the key | | | | improving the disaster recovery strategy and the |
| information however the security assessment | | | | testing process. The complexity of your |
| must be considered since problems with company | | | | organization will affect how difficult it is to build a |
| security will expose your network to attacks. | | | | workable disaster recovery plan. The |
| When your network is being attacked it isn't | | | | recommendations will streamline your DRP and |
| available! | | | | ensure it works when it is required. The on |
| Management strategy assessments are key as | | | | demand circuit is homed to the remote DR facility |
| well since the absence of effective management | | | | router where it converges with the company |
| policies and applications will create a tenuous | | | | network and employees can utilize the mainframe |
| situation. For instance without any change | | | | applications. The DR mainframe should be |
| management policies you will have employees | | | | synchronized with the company mainframe for |
| changing application and device configurations | | | | transactions during that period before service is |
| (assuming they have security authorization) | | | | restored. |