Deduplication is becoming an essential tool to help data center managers control exponential data growth in the backup environment. The methods used to accomplish deduplication vary widely as do the levels of capacity optimization they can provide. Some techniques are well suited to small-to-medium sized backup environments, while others are optimized for larger enterprises. This report describes the various techniques used today to deduplicate data and highlights unique deduplication considerations for enterprise environments.
White Paper
Comparing Deduplication Approaches:
Technology Considerations for Enterprise
Environments
Deduplication is becoming an essential tool to help data managers control exponential data growth in the backup environment. The methods used to accomplish deduplication vary widely as do the levels of capacity optimization they can provide. Some techniques are well suited to small-to-medium sized backup environments and others are optimal for enterprise-class environments. This article will describe the techniques being used today to deduplicate data and will highlight considerations for choosing the best technology for your environment.
400 Nickerson Road, Marlborough, MA 01752 Phone: 866.SEPATON or 508.490.7900 | www.SEPATON.com
Table of Contents
Understanding Data Deduplication.......................................................................... 1
Defining Your Needs.............................................................................................. 2
Approaches to Deduplication.................................................................................. 3
Technology Considerations for Enterprise Environments .......................................... 4
Conclusion ............................................................................................................ 6
400 Nickerson Road, Marlborough, MA 01752 Phone: 866.SEPATON or 508.490.7900 | www.SEPATON.com Considerations for Deduplication | page 1
Understanding Data Deduplication
The volume of data generated by companies today is growing explosively. More powerful computing technology and the evolution to an information-based economy are causing companies to generate more data than ever before. The process of backing up all of this data leads to a completely new set of challenges. Companies typically backup the same data many times over its lifecycle. As a result, a single terabyte of new data can require 50 to 60 times that capacity to store it over its lifetime.
In addition, laws such as Health Information Portability and Accountability Act, and Sarbanes-Oxley require some types of data to be store for many years. They also require companies to be able to retrieve that data quickly and completely upon request.
To deal with this overwhelming data growth and related storage requirements, many companies are evaluating the use of data deduplication technology. Data deduplication technology is software that compares data in new backup streams to data that has already been stored to identify and remove duplicates. For example, if only 5% of the data in a current backup stream has changed since the previous backup, the deduplication technology will only store that 5%. A record is kept of the duplicate data so the files can be reassembled for data restores.
Changing the Economics of Data Protection Virtual tape libraries provide a level of performance and reliability that traditional physical tape systems cannot approximate. VTLs enable companies to back up data many times faster than tape, restore data quickly, and eliminate a variety of time-consuming manual tasks. However, without data deduplication, the cost of disk is higher than that of tape, forcing companies to use disk space carefully by keeping online retention times short and moving data to tape archive as quickly as possible. With data deduplication, this process is not necessary. When used with hardware compression on a virtual tape library (VTL), deduplication can deliver as much as 50:1 capacity reduction, making disk-based secondary storage and longer online data retention times cost-effective for the enterprise.
The methods used to accomplish deduplication vary widely as do the levels of capacity optimization they can provide. Some techniques are well-suited to small-to-medium sized backup environments and others are optimal for enterprise-class environments. This article will describe the techniques being used today to deduplicate data on VTLs. It will summarize the backup environments and data protection objectives for which each technology is best suited.
400 Nickerson Road, Marlborough, MA 01752 Phone: 866.SEPATON or 508.490.7900 | www.SEPATON.com Considerations for Deduplication | page 2
Understanding Your Needs
Amid the hype and... [download for more]