|
Popular opinion maintains that backup data doesn’t have the same importance as the original information. But is that a true assumption? Perhaps the answer is “it depends on the business situation.” That is, sometimes backup data is indeed just another form of business information – and sometimes it is mission-critical. Classifying the value of information to a business, understanding information service objectives, and then steering information to resources to achieve those objectives is what Information Lifecycle Management (ILM) is all about. No where is ILM more relevant than availability management and disaster preparedness. The difference between these two concepts is small but significant. Availability management focuses on all of the IT-related elements of keeping a business workflow going. A corrupted file (e.g. because of a logic error in a program) or lost data (e.g. hardware error or, more likely, an operational error) constitute issues of operational availability. The line between a “simple” availability problem and a disaster is usually the magnitude of the problem and the expected duration. Disaster preparedness identifies the large threats and vulnerabilities that affect a data center and ameliorates their effects through countermeasures. As planners, our objective is to develop countermeasures to as many vulnerabilities as possible … for as many workloads as possible … based on probability of occurrence and its effect on business. ILM and its emphasis on service level objectives enables planners to establish priorities and anticipate resource requirements during contingencies. Without a thorough understanding of the application and the information used by the application throughout its lifecycle, it is virtually impossible to set recovery points, recovery time objectives and meet availability requirements. Time is the Enemy Installations are frequently tempted to divide work into “mission-critical” and “other” categories. No one wants to allow their application to occupy the “other” category because that implies lesser importance with respect to business criticality. However that also ignores the fact that, as time passes and the duration of the outage lengthens, the pressure to restore normal service escalates to the point where even test and development work will become “mission-critical.” The polar opposite of the two category philosophy is to treat all information the same. The problem here is that the “80/20 rule” really does apply to most installations. Only 20% (or less) of the information is really likely to be time-critical in most cases. Treating everything the same way is expensive and wasteful of valuable time. Time truly is the enemy of availability. In cases where access to data is especially time-critical, business continuity measures may even be established to sustain the processing environment throughout the disaster. It is no accident that the metrics associated with availability are all time-oriented. Given all of the incidents that have occurred in recent years – tsunamis, hurricanes, terrorist bombings and monsoons – it hardly seems necessary to justify spending on disaster preparedness. However, studies continue to show that data are not routinely backed up, stored offsite and located well away from a potential disaster area. At least one offsite storage facility was submerged by the Katrina disaster, and it is all too common for earthquakes to affect data centers, offsite storage locations and the transportation resources needed to get media to a recovery location. Assessing Threats and Vulnerabilities Disasters take many different forms. Oftentimes the threat is the easy part to assess. The vulnerability is generally much more difficult to evaluate. For example most data centers have some warning of an impending hurricane. And several clients of the submerged storage site wisely noted that recovery media in the path of the storm was no better than being on-site and therefore pre-positioned their media at distant recovery sites, avoiding the “compound/cascading disaster.” Recognizing vulnerabilities (e.g. the collateral damage associated with the storage location) is frequently difficult to imagine. Even those data centers that survived Katrina relatively unscathed suffered when roads were impassable for days – preventing the delivery of diesel fuel for generators – and when telecommunications networks were down. Probably one of the least recognized vulnerabilities to large threats like Katrina is the loss of IT personnel. In this case it was because they were needed to take care of their families. However, in other situations, such as a terrorist bombing, the IT personnel may be permanently incapacitated.
|