There is a lot of discussion around data deduplication for backup these days. (I wish I could deduplicate all the turkey I ate last week.) In fact, Gartner claims that “…by 2012, deduplication will be applied to 75% of backups.” And when asked “Why?” the response was “…deduplication is too compelling to ignore.” But I say “prove it”. So I put together some backup capacity numbers for storing data on tape (non-compressed and compressed) versus storing data, deduplicated (fixed block and variable block), on disk and the numbers show a dramatic savings in backup space which translates into cost savings.
As with any ‘analysis’ numbers can be ‘spun’ to make them say what you want. That said, I tried to be as straight forward as possible, so let me also show my methodology so you can see how my numbers were derived.
- I charted the amount of capacity created using a retention policy of:
- 14 Dailies
- 4 Weeklies
- 12 Monthlies
- I selected 10TB of primary storage capacity
- I did this for file system backups only
- I charted the data for 30%, 40%, 50% and 60% primary storage growth rates
- I charted traditional tape based backup (non-compressed)
- I charted traditional tape based backup (compressed, 2:1)
- I charted fixed block disk based deduplicated backup
- I charted variable block disk based deduplicated backup (3 to 5 times more efficient than fixed block deduplication)
The first thing to think about is the sheer number of full backup copies that must be maintained when utilizing the above retention schedule. The above retention policy leads to 17.2 copies of the primary storage (12 yearly’s + 4 monthlies + the equivalent of 1.2 with dailies = 17.2 copies) . Translation: one terabyte of primary storage becomes 17.2 terabytes of tape storage. This means, backup administrators need to pay for the physical tapes as well as the offsite transport and storage costs. Now 17.2 terabytes of tape doesn’t sound like much but keep in mind that is for 1TB of primary capacity. Ten TB of primary capacity yields 172 TB of tape capacity. Now add in year over year storage growth. At 30% primary storage growth, the backup storage growth grows 23%, at 40% primary storage growth, the backup storage growth grows 29%, at 50% primary storage growth, the backup storage growth grows 33% and at 60% primary storage growth and the backup storage grows 38%.
Figure 1 below shows, 10 TB of primary capacity growing at 30%, 40%, 50% and 60% along the x-axis respectively and the corresponding capacity of tape or disk consumed along the y-axis is.
The graph shows that compressed backup to tape obviously yields a 50% capacity improvement over non-compressed tape as one would expect. It also reflects that fixed block deduplicated disk capacity is only about 48% more efficient than uncompressed tape storage yet variable block deduplication is 81% more storage efficient than uncompressed tape storage.
Interesting as well, the chart reveals that fixed block deduplication is 3% less efficient than compressed tape whereas variable block deduplication is 62% more efficient than compressed tape. Typically, with the same data change rates, and equivalent data sets, variable block deduplication is 3 to 5 times more efficient than fixed block deduplication.
The moral of the story – if you’re going to do deduplication, variable block is the way to go. From a cost perspective, there is essentially no difference in the $/TB price however there is much more value in the long run with variable block deduplication. Vendors typically charge a $/TB price for their deduplication solutions. The difference between fixed and variable block deduplication comes down to the capacity of data that is stored in the backups which directly translates into costs. If you take a look at Figure 2, over time, starting with 1TB of primary capacity growing at 25% over the course of one year, IT will need almost 2TB of backup capacity with fixed block deduplication versus less than 1TB of capacity using variable block deduplication (assumes fixed block is 5x less efficient from imperial data that has been collected in the field.). The most important part of this graph is the slope of the blue and red lines. The greater the degree of slope (red line), the more frequently IT will need to purchase capacity to protect the given data set as well as need to pay for licensing as it pertains to deduplication software. IT wants the smaller slope.
*Note: Some companies will position their fixed block technologies as variable block by stating that you (the user) has the ability to set the block size to what ever you want, however, once set, it stays that way for all of your data. The difference is, true variable technologies adjust the block size on the fly using their algorithms to ensure maximum efficiency with no management.
Bang for the Buck
The most important benefit, as with most things in IT however is overall cost savings. Deduplicated disk solutions are anywhere from 2.5X to 3X more expensive than tape, however with the overall capacity savings, there can be significant cost savings. Figure 3 is representative of the overall costs of new deduplicating disk systems and traditional tape backup systems (including tapes and off-site storage costs). I will caveat this by saying every TCO and ROI has a ton of ‘what ifs’ that factor into overall costs including things like FTE for backup engineers and long term retention costs, but for the most part, disk systems reduce a good deal of these costs (with the exception of power and cooling) and increase the reliability, security and performance of backups and recoveries.
1 The chart above is based on a rough cost of $8,000 per terabyte of tape backup system costs (including media and off-site storage) and rough cost of $20,000 per terabyte of deduplicated disk backup system costs for the period of one year. Prices will vary depending upon your configuration and these estimates do not include space, power, cooling or human costs.
As I stated above there are only a few factors that are involved in this very raw calculation. There are a number of other factors involved with a backup process including WAN costs (if replacing tape with disk), remote office facilities, installation (professional services), and software and hardware maintenance to name a few. But no matter how you look at it, disk based backup with variable block deduplication wins over tape.
Backing data up to deduplicated disk not only saves the amount of backup capacity that is used, it also has other implications for a data protection environment. First, backing up to disk versus backing up to tape helps to reduce the reliance on tape and the inherent limitations, security concerns and reliability issues surrounding tape. Recovery of data from disk reduces the operational costs and decreases the recovery time objective. Additionally the reliability of disk with RAID is much higher than the reliability of tape.
New data protection technologies are evolving backup to a degree where the entire data protection process is getting easier manage by removing multiple points of management (backup servers, media servers, tape libraries and physical tape). As backup continues to evolve, this can help simplify the overall process and;
- Increase reliability of backups
- Reliability of recoveries
- Decrease backup times
- Decrease the time to recover data
The Bottom Line
New challenges in protecting information are arising every day, whether it is data growth, remote office data protection or virtualization, backup is getting harder not easier. Data deduplication is providing backup administrators with tremendous benefits around backup processes and cost savings. It is important to keep in mind that everybody’s environment is different and utilizes different methods and processes for managing and protecting information. It is also important to take a look at your data protection environment today and understand the use cases where it is time to make new investments. I encourage you to look at new technologies to help you with emerging challenges and weigh the overall solution including costs as well as benefits of disk based recovery. New backup technologies that leverage data deduplication can save IT a lot of money and put you on back on the Road to Recovery.