» Disaster Planning

Disaster Planning

The CU*Northwest Disaster Recovery is a complex document that includes policies, procedures, and checklists to assist us in handling operations in the event of a catastrophic disaster.

Why have a plan?

Let us demystify the plan a bit by providing a simple, broad overview of what our plan covers, what it doesn't, and explain some common terms.  Let's face it: if a disaster happens, things will be chaotic with or without a plan. Plans allow us to organize our thoughts and actions before the disaster strikes, which should make responses smoother, more predictable, and hopefully less chaotic than if no plan existed.

Chart showing the 'optimal balance' for resources devoted to DR planning.

Everyone tries to make their disaster plan as comprehensive as possible, but of course there's no way anyone can plan for every contingency. Had anyone in New York ever planned what to do in case terrorists drove fully fueled jumbo jets into the World Trade Center thereby destroying numerous city blocks? Few even imagined it would happen, let alone prepared for it with all the expense required to develop, audit, and test a plan. The probability of it occurring is so low, and the costs so high, it isn't a good use of taxpayer money.

Similarly, the best of use of your disaster recovery dollars is to anticipate and evaluate potential risks, plan for the most likely disasters, then audit those plans and test accordingly. A disaster plan is like an insurance policy—although a flood could conceivably happen anywhere, it is probably not cost effective to take out flood insurance if you don't live near a floodplain!

Regulations also require us, as a data processing CUSO to have a plan, which is designed to be reusable for your own credit union.

Does your credit union does have its own disaster recovery plan?

What is covered by the CU*Answers Disaster Plan?

There are really two ways to approach an answer to this question. First, a good disaster plan should cover everything you can think of that would or might need to be restored in the event of a catastrophe. The second answer has more to do with when a particular thing would be restored, relative to everything else on the list.

For example, if you lose power at your house, your first concern is getting flashlights and candles so no one trips and falls. You don't start worrying about the food in the fridge until a little later! When you discover your basement is filling with water from a rainstorm, your first priority is to get everything that could be ruined by water off the floor. After your family photo album and prized collection of antique postcards is out, you can turn your attention to the plastic buckets and beach chairs. Prioritizing is necessary because you can't do everything at once.

In the Plan, we attempt to cover as many of our business critical services as possible, including iSeries, ARU, home banking, office networks, phone systems, etc. The Plan also encompasses regular testing of iSeries recovery at one of IBM's disaster recovery sites (or hot site). Part of our disaster planning involves contracting with IBM for access to the alternate site so if disaster ever does strike, we have a reserved iSeries with communications lines at IBM's Chicago facility on which we can recover our production system. The order in which other services would be restored has been prioritized to allow for a realistic timetable for all tasks to be completed with the resources available.

Of course to really appreciate the width and breadth of our Plan, you'll want to order a copy on CD-ROM and get the full story for yourself

What Is Covered By the Annual Disaster Recovery Test?

During these annual hot site recovery tests, we take backup tapes from our production system and load them onto a standby iSeries computer at the IBM site. We configure this computer to communicate with our MPLS network using our standby core router. GOLD sessions are proxy-tested by credit unions participating in the test.

Basically this means that we make the hot site computer look and act like our production iSeries. We check all of the communications connections to ensure that the standby computer can communicate with the same network components that talk to your credit union workstations. Then we work with credit unions participating in the test to verify that CU*BASE GOLD workstations can power up and connect to the standby iSeries.

The tests performed by our credit union test participants are meant to be a “proxy” for all other credit unions on the network, so that it is not necessary for every computer in every branch office to be involved.

Tests are conducted once per year. Additional tests, or expansion of the test criteria, may be scheduled for your credit union with at least 120 days advanced notice. Additional testing will be bid out on a case by case basis.

What is not covered in the annual test?

We do not test It’s Me 247, CU*SPY, CU*CHECKS, web or email hosting, DNS recovery, phone system recovery, or internal LAN server and office network recovery. Our primary focus as reflected in the Plan has been to restore iSeries and CU*BASE/GOLD access for all of our clients in as timely a manner as possible.

This strategy has been sufficient thus far, but CU*Answers and its clients recognize the need for ensuring a higher level of service availability, even in disaster situations. This is why we are aggressively pursuing a redundant data center capable of supporting applications in a redundant production environment.

Why Does CU*Answers Do These Tests?

There are many reasons we test our host system recovery plan, and it’s not just to satisfy regulatory compliance requirements. Tests allow us to identify areas in the plan where we can improve performance, or weaknesses in the plan where we need to make adjustments.

Just as important, the test lets us verify the ability and reliability of trusted partner vendors—such as the phone company, IBM Business Recovery support staff, and others—to respond. We use this information to fine-tune and optimize our processes to improve performance for the next test, or in the event of real disaster.

Is This Like A Real Disaster?

It’s important to note that this is not a disaster simulation: we will not be taking our production computer or communications systems offline during the test. This is a test of our host and related communications recovery processes.

Understanding CU*Answers' Availability Goals

Our goal is to have credit union critical services and applications available as close to 100% of the time as possible, even in the event our primary production facilities are unavailable. We realize to have everything available 100% of the time isn't cost effective, or even worth it, in some cases. Building, implementing, and managing highly available distributed networks is complicated, and expensive. Because our clients own us, we need to be sure we invest every dollar responsibly and in a fashion that provides positive returns the majority of the time.

Building highly available systems and networks requires a holistic approach. It isn't something that can be “slapped” on after the fact. It starts with the design of individual network components and moves out and up from there:

  • Redundant power supplies in servers means if a power supply fails, the server doesn't crash
  • Redundant hard disk drives provide extra protection for the data, and help prevent server crashes
  • System backups, stored online and off site provide extra data protection and faster service recovery in the event of system or data center failure
  • Meshed network architecture provides redundant paths
  • Redundant and hot standby communications links
  • Redundant hardware for select critical systems provides hot swappable standby or live load-balanced systems
  • Uninterruptible power supply battery backup unit provides uninterrupted power to the data center
  • Automated natural-gas generator provides long-term power to the data center in the event long term power failure

As an integral part of our redundancy plan, we have implemented and have been conscientiously testing and improving our High Availability rollover system. Read more about our High Availability rollover activities.

Will There Be Any Service Interruptions?

With the arrival of CU*Next Net, there will be no service interruptions during disaster recovery testing. We are now able to route communications without impact to our online clients.

What do you mean by roll-over?

You may hear us use this term when reviewing our availability capabilities. We have two identical iSeries systems that house our data and run our core application. These systems mirror each other at approximate real time (less than one second latency) and we are capable of redirecting (or rolling over) live GOLD traffic to the standby (or High Availability) system.

Rolling over is a labor intensive process and takes about 30 to 45 minutes to perform, during which time the GOLD and other applications are unavailable. Because of the time involved in performing a rollover, it isn't always a silver bullet for host system or application failure. Service interruptions are now evaluated on a case-by-case basis to determine if rolling over is a viable solution. If service can be restored on the production system in less than 90 minutes, it isn't worth it to perform a rollover. Why? Two rolls (there and back) require 90 minutes of downtime anyway, so they wouldn't be worthwhile doing if the problem can be fixed in that time.

How often do you update your Plan?

If all this talk of disaster testing is making you think of your credit union’s own disaster planning and testing, start thinking about the last time you updated your plan. Is your team aware of its own responsibilities for disaster planning? When was the last time your team did a test of your plan? Does your plan provide for CU*Answers’ recovery of your internally-hosted applications (such as your own Maxxar ARU)? If so, are we aware of your plans?

Your credit union’s disaster plan is a lot more than just your CU*BASE workstations, and your staff should be performing regular tests to fine-tune your plan – just as we do.


CU*Answers Processing Alliance. CU*Answers CU*South eDOC Innovations WESCO Net Xtend, Inc