Application vulnerabilities and risks must be weighed to identify resources, performance requirements and service level objectives to ensure business continuity. Using real-world case studies, this white paper examines Information Lifecycle Management (ILM) best practices for disaster preparedness.
July, 2007 A White Paper by Stratus Technologies
All Data is Not Equal
All Data Is Not Equal Popular opinion maintains that backup data doesn't have the same importance as the original information. But is that a true assumption? Perhaps the answer is "it depends on the business situation." That is, sometimes backup data is indeed just another form of business information - and sometimes it is mission-critical. Classifying the value of information to a business, understanding information service objectives, and then steering information to resources to achieve those objectives is what Information Lifecycle Management (ILM) is all about. No where is ILM more relevant than availability management and disaster preparedness. The difference between these two concepts is small but significant. Availability management focuses on all of the IT-related elements of keeping a business workflow going. A corrupted file (e.g. because of a logic error in a program) or lost data (e.g. hardware error or, more likely, an operational error) constitute issues of operational availability. The line between a "simple" availability problem and a disaster is usually the magnitude of the problem and the expected duration. Disaster preparedness identifies the large threats and vulnerabilities that affect a data center and ameliorates their effects through countermeasures. As planners, our objective is to develop countermeasures to as many vulnerabilities as possible . for as many workloads as possible . based on probability of occurrence and its effect on business. ILM and its emphasis on service level objectives enables planners to establish priorities and anticipate resource requirements during contingencies. Without a thorough understanding of the application and the information used by the application throughout its lifecycle, it is virtually impossible to set recovery points, recovery time objectives and meet availability requirements. Time is the Enemy Installations are frequently tempted to divide work into "mission-critical" and "other" categories. No one wants to allow their application to occupy the "other" category because that implies lesser importance with respect to business criticality. However that also ignores the fact that, as time passes and the duration of the outage lengthens, the pressure to restore normal service escalates to the point where even test and development work will become "mission-critical." The polar opposite of the two category philosophy is to treat all information the same. The problem here is that the "80/20 rule" really does apply to most installations. Only 20% (or less) of the information is really likely to be time-critical in most cases. Treating everything the same way is expensive and wasteful of valuable time. Time truly is the enemy of availability. In cases where access to data is especially time-critical, business continuity measures may even be established to sustain the processing environment throughout the disaster. It is no accident that the metrics associated with availability are all time-oriented.
All Data is Not Equal Page 2 Given all of the incidents that have occurred in recent years - tsunamis, hurricanes, terrorist bombings and monsoons - it hardly seems necessary to justify spending on disaster preparedness. However, studies continue to show that data are not routinely backed up, stored offsite and located well away from a potential disaster area. At least one offsite storage facility was submerged by the Katrina disaster, and it is all too common for earthquakes to affect data centers, offsite storage locations and the transportation resources needed to get media to a recovery location. Assessing Threats and Vulnerabilities Disasters take many different forms. Oftentimes the threat is the easy part to assess. The vulnerability is generally much more difficult to evaluate. For example most data centers have some warning of an impending hurricane. And several clients of the submerged storage site wisely noted that recovery media in the path of the storm was no better than being on-site and therefore pre-positioned their media at distant recovery sites, avoiding the "compound/cascading disaster." Recognizing vulnerabilities (e.g. the collateral damage associated with the storage location) is frequently difficult to imagine. Even those data centers that survived Katrina relatively unscathed suffered when roads we... [download for more]