|
The year is 2007 and data continues to proliferate. By some estimates, the creation of a typical business file initiates a chain of events that causes that file to be copied well over 1,000 times in its lifetime. If the file is an image of a popular entertainer or a video clip of a sports figure making a heroic play, perhaps tens of thousands of copies will be quickly distributed around the globe. How do these examples affect storage in enterprise data centers? Have your users downloaded images and videos and completely forgotten about them? Do your users refuse to delete old files because “you never know if you might need them?” Are you— the system administrator—reluctant to purge data volumes because no one is quite sure who the owner of the data is, and what it’s used for? If you answered any of these questions in the affirmative, you are not alone. The majority of system administrators are grappling today with the constant creep of data throughout the data center. Unfortunately, there is no convenient trash can that you can throw your data rubbish into. There are, however, many ways to attack the problem of data proliferation. First, you could demand that your accountants, engineers, managers, technicians, and executive staff immediately delete all their old unused data files. Hmmm—that went over well, didn’t it? Well, you could implement a search and classify mechanism in your data center to automatically move “stale” files to a disk archival system. This frees up room on your primary storage arrays, but all that data still resides somewhere, and is still consuming large quantities of disk drive space. Finally, you could take advantage of new and interesting data virtualization techniques to manage the growth of data, allowing you to postpone the purchase of a new storage system, or to purchase a smaller system to begin with. This paper analyzes how NetApp can enable you to do just that.
In its purest form, data virtualization enables you to represent a single data object as many different objects. NetApp Snapshot™ technology, introduced in 1992, was arguably the first widespread adoption of data virtualization in enterprise storage arrays. Snapshot copies enabled system administrators to create many point-in-time copies of their entire data volumes, but consumed only a fraction of the space that would have normally been required to make multiple backup copies of these volumes. Snapshot copies were a disruptive technology; they changed the behavior of system administrators by allowing them to back up their volumes more frequently than ever before— once per minute, once per hour, once per day—it didn’t matter, because these backups were simply virtual copies that consumed very little disk space. Today, 15 years later, NetApp Snapshot technology has matured into an extensive suite of virtualized tools that enable system administrators to effectively provision their primary storage, manage test and development clone copies, reduce the size of point-in-time backup copies, replicate these copies across LANs and WANs, and reduce overall volume requirements by eliminating redundant data blocks.
Your data may not have a definitive time of death, but it certainly does have a birth certificate. All enterprise data begins its life on primary storage. Whether a database entry, user file, software source code file, or email attachment, this data consumes physical space on a disk drive somewhere within your primary storage environment. One of the first problems a storage system administrator faces is quota allocation. How much physical storage space should be assigned for each particular user or application? Knowing that an overflowing data volume has many unpleasant side effects, system administrators commonly overprovision their disk quotas. If they think that an application will require a single terabyte, they might decide to allocate 2TB to accommodate growth over time, or to adjust for a miscalculation of the storage space actually consumed by the application. But what if the application does not grow as expected, or the miscalculation was on the short side? The result is wasted space—space that cannot be used by any other application. By some estimates, an average 60% of primary disk storage remains unused simply because of this type of overprovisioning.
|