Data de-duplication has the power to revolutionize the data protection process by significantly reducing capacity requirements. Plagued by media hype and vendor FUD, this message can be easily lost. This paper serves as a primer on data de-duplication.
DATA PROTECTION
BRIEF
Understanding the Power of Data De-DuplicationDate: October, 2007Author: Heidi Biggar, AnalystAbstract: Data de-duplication has the power to revolutionize the data protection process by significantly reducingcapacity requirements. Plagued by media hype and vendor FUD, this message can be easily lost. This paperserves as a primer on data de-duplication.
Easing the Pain of BackupPrimary storage volumes may be growing at a rate of 30% or more each year, but it is often secondary storagevolumes that are causing organizations the greatest pain-devouring IT resources (both manpower andtechnology) along the way. For years, organizations could do little-if anything-to control this problem. Today,users have new options. Capacity optimized protection (COP) technologies that attack the secondary storage"capacity bloat" at its roots have emerged. Data de-duplication is one example.
FIGURE 1. DETERRENTS TO DISK-BASED BACKUP
What factors do you believe would prevent your organization from replacingenterprise tape libraries with large-scale near-line disk solutions? (Percent ofrespondents, N = 94, multiple responses accepted)
Cost of new disk-based solution 74%
Lack of mature products available 47%
Too much investment in existing tape infrastructure 46%Lack of staff resources to evaluate, select andimplement solutions 34%Concerns with reliability of low-cost disk technologies(i.e., SATA) 32%
Lack of media portability 27%Current leasing agreement or depreciation cycle ontape infrastructure 26%Concerns about solution's ability to ensure regulatorycompliance (e.g., WORM capability, off-site data 24%
Believe it will take additional staff to manage 16%Concerned that disk-based solutions are difficult to 16%scale 0% 10% 20% 30% 40% 50% 60% 70% 80%
Source: Enterprise Strategy Group, 2007
From a very high level, data de-duplication enables organizations to reduce back-end capacity requirements byminimizing the amount of redundant data that is ultimately written to disk backup targets. The actual amount ofdata reduction can vary significantly from organization to organization or from application to application,depending on the granularity of the data de-duplication technology being used (i.e., whether the de-duping is done
Copyright ?2007, The Enterprise Strategy Group, Inc. All Rights Reserved.ESG BriefPage 2at the file-, block- or byte-level) or the type of data being de-duped (e.g., Word .doc, .mpeg file or .dbf file).However, ESG has found that, on average, 10x to 20x reduction is realistic and greater than 40x reduction is1definitely achievable .
At these rates, data de-duplication has the power to change the economics of disk backup, making disk backup amuch more affordable-and compelling-alternative to tape, and even eliminate the long-time cost delta betweentape and disk (see Figure 1). Factor in the operational efficiencies of not having to move, store and manageredundant data as well as not having to deal with management headaches common with tape, and you've got avery compelling story in favor of disk backup.
For these reasons and others, we believe data de-duplication is one of this decade's most important-hence,most talked-about-new technologies. It has the power to revolutionize data protection from both a technologyand an end-user adoption standpoint by simply making disk-based backup and recovery, as well as remotereplication, much more efficient than it is today.
The Benefits of De-DupeData de-duplication has several significant-and immediate-benefits for users:
? It can lower disk costs. Just consider the ability to store 20 TB of backup data on 1TB of disk. The cost-savings are significant-not only in terms of actual disk costs, but also in terms of power and cooling.Fewer disks mean lower power and cooling costs.
? It allows users to store more data on fewer disks for longer periods of time. While the actualcapacity reduction will vary from organization to organization depending on a number of variables (e.g.,the type of data that is being backed up, the change rate and the frequency of the backup, etc.), de-duplication will reduce back-end capacity requirements significantly. Organizations can use this"newfound" space to 1) protect other backup data (i.e., data that wasn't previously protected by disk) or 2)lengthen the retention periods of the data that is backed up to disk... [download for more]