|
Primary storage volumes may be growing at a rate of 30% or more each year, but it is often secondary storage volumes that are causing organizations the greatest pain—devouring IT resources (both manpower and technology) along the way. For years, organizations could do little—if anything—to control this problem. Today, users have new options. Capacity optimized protection (COP) technologies that attack the secondary storage “capacity bloat” at its roots have emerged. Data de-duplication is one example. From a very high level, data de-duplication enables organizations to reduce back-end capacity requirements by minimizing the amount of redundant data that is ultimately written to disk backup targets. The actual amount of data reduction can vary significantly from organization to organization or from application to application, depending on the granularity of the data de-duplication technology being used (i.e., whether the de-duping is done at the file-, block- or byte-level) or the type of data being de-duped (e.g., Word .doc, .mpeg file or .dbf file). However, ESG has found that, on average, 10x to 20x reduction is realistic and greater than 40x reduction is definitely achievable. At these rates, data de-duplication has the power to change the economics of disk backup, making disk backup a much more affordable—and compelling—alternative to tape, and even eliminate the long-time cost delta between tape and disk (see Figure 1). Factor in the operational efficiencies of not having to move, store and manage redundant data as well as not having to deal with management headaches common with tape, and you’ve got a very compelling story in favor of disk backup. For these reasons and others, we believe data de-duplication is one of this decade’s most important—hence, most talked-about—new technologies. It has the power to revolutionize data protection from both a technology and an end-user adoption standpoint by simply making disk-based backup and recovery, as well as remote replication, much more efficient than it is today.
The Benefits of De-Dupe Data de-duplication has several significant—and immediate—benefits for users: - It can lower disk costs. Just consider the ability to store 20 TB of backup data on 1TB of disk. The costsavings are significant—not only in terms of actual disk costs, but also in terms of power and cooling. Fewer disks mean lower power and cooling costs. - It allows users to store more data on fewer disks for longer periods of time. While the actual capacity reduction will vary from organization to organization depending on a number of variables (e.g., the type of data that is being backed up, the change rate and the frequency of the backup, etc.), deduplication will reduce back-end capacity requirements significantly. Organizations can use this “newfound” space to 1) protect other backup data (i.e., data that wasn’t previously protected by disk) or 2) lengthen the retention periods of the data that is backed up to disk to better meet regulatory, corporate governance, eDiscovery or data protection SLAs. - It can improve RTOs and reliability. Simply put: the more data users backup to disk, the better able they are to meet RTOs, hence, data protection SLAs. The more they leverage disk, the less they rely on tape alternatives. - It enables and expands WAN-based remote replication options. By reducing the amount of data that is backed up, data de-duplication actually enables WAN-based remote replication. Data de-duplication can lessen the “cost” and/or “bandwidth” barrier of entry to WAN-based remote replication for many organizations, making it possible for some to do WAN-based remote replication for the first time and for others to cast a “wider net” of data protection around their remote data.
Getting Beyond Semantics Data de-duplication can be performed in various ways and at various points of origin. But, generally speaking, there are two distinct types of data de-duplication today: in-line and post-process. ESG defines in-line de-duplication as data de-duplication that occurs during the backup process before any data is written to the backup target (the backup target could be a VTL or some type of near-line disk) and post-process as data de-duplication that occurs after data is written to, or ingested by, the backup target.
|