|
In the past 24 months, we have seen disk-based data protection and archiving become an essential component of most enterprises. With massive 100 TB+ disk-based repositories becoming increasingly common on near-term enterprise roadmaps, finding ways to optimize those disk capacities is now a top of mind concern. Even with widely available cost-effective disk, our research finds that capacities still account for over 50% of most data storage budgets, with disk-based secondary storage very commonly exceeding 50% year-over-year growth rates. Aggressively managing these capacities to keep them “lean and mean” is absolutely critical. The good news is that new technologies exist that can radically slim down the physical storage requirements for secondary disk storage. Taneja Group refers to these technologies as Capacity Optimization (CO) technologies. Every enterprise now needs a CO strategy, but beware, for not all solutions are created equal. In our opinion, one company on the very cutting edge of this CO wave is Diligent Technologies. Specifically, we call attention to Diligent’s HyperFactor technology. It stands out as extremely scalable, efficient, high-performance data de-duplication software that merits a deeper understanding. As the “secret sauce” in Diligent’s ProtecTier VTL solution, HyperFactor has achieved efficiencies that represent a true leap over 1st generation optimization approaches. Beyond the mind-bending 25:1 reduction ratio, the efficiency of this technology’s indexing architecture (and therefore, its performance) is second to none. In this technology brief, we will examine what IT teams should be looking for in a capacity optimization solution, and then share our perspective on the Diligent HyperFactor technology. This is a core new technology and we’re very excited by what it can mean for storage ROI, data protection performance, and reliability. Somewhat ironically, the latest growth boom in the storage industry is all about reducing capacities. Specifically, this innovation boom is about software technologies that promise to radically reduce the amount of physical storage capacity required to store a given amount of information. In fact, some of the most exciting companies in storage today are laser-focused on tackling precisely this challenge. Taneja Group has categorized all of these offerings under the umbrella of Capacity Optimization (CO) technologies. This entire CO category has gone from marginal to essential in the past 24 months. Why the intense interest in optimizing storage capacities? Two words: economics and manageability. IT teams need to reduce the amount they are spending to store a given terabyte of information, and they need to find ways to manage the insanely high growth rates common across the enterprise today. Note that the biggest culprits of capacity consumption have been data protection and archiving environments (collectively, “secondary storage”.) For that reason, significant vendor R&D attention has been focused on finding ways to optimize those capacities in order to improve storage ROI and ease of management for large repositories. Indeed, these CO technologies have already become very powerful. As we will explore below, the most advanced offerings can achieve reduction ratios of 25:1 or more on disk-based storage platforms, scaling into the petabyte range for single solutions, all with no impact to the pre-existing data protection workflow, or additional management requirements. With capabilities like this available, the question rightfully becomes: Why wouldn’t you optimize your secondary storage capacities? The Rise of De-Duplication Behind all advanced capacity optimization initiatives is some manner of de-duplication technology. At the highest level, all de-duplication technologies provide an intelligent means of organizing data being stored such that redundant data elements need not be stored more than once. We call this process “factoring”. To achieve efficient data factoring, some manner of indexing capability will maintain a running tally of all data being stored in a repository. The index is akin to a “table of contents” of the entire data repository, keeping track of where unique data elements reside. The existence of this index, or “table of contents”, then frees up the repository to only store single instances of each data element. Different de-duplication approaches will support varying levels of granularity for these data elements. In general, the more fine-grained the granularity, the more efficient the entire repository can become. All of this requires an ongoing, dynamic dialog between the repository and the index, as every new piece of data created immediately changes what will need to be stored in the future.
|