Aug 20, 2009

A Data Protection Reference Architecture – Part 2

Storage

Archive

The most fundamental part of developing a good data protection architecture starts at the base of the triangle with Archive. Archive is often an overlooked component of data protection - It’s not just for regulated business anymore. Archive essentially gives users 100% data deduplication efficiency. What I mean by this is that you have the ability to remove ‘stale’ data (and by 'stale' I don't mean unimportant data, I just mean data that is not accessed frequently) completely from your backup stream so you don’t continue to back it up. Let’s face it; the two most important commodities in backup are time and capacity. Both of these are interdependent of one another. The more capacity you have, the longer it takes to backup and the more money it costs to store. The longer it takes you to backup, the less likely you are to be meeting your business objectives. Data capacities aren’t shrinking, they are growing. According to the latest IDC data, capacity is growing at a staggering pace of 65% year over year and the digital pack rat in all of us is too afraid to get rid of anything, compromising backup windows and hence the business. By archiving data that hasn’t been touched in some period of time and removing it from the backup stream, you can relieve some of the pressure on your backups and possibly not have to make any significant changes to your backup infrastructure.

Also, you don’t have to backup to a special purpose device or appliance for archive. You can archive data to any file system. I would keep in mind however, that you want to archive to a platform that can keep costs low. Remember this data is not unimportant, just not highly used. Take into account your RTO and store the data on the most cost effective platform possible that also aligns to the business objectives. This may be tape, it may be optical or it may be disk. If it is disk, you want to store it on disk that is optimized for this type of data, optimized for capacity (deduplication, compression, single instancing), has low power and cooling costs, can replicate for availability and is highly reliable. You will also want to make sure that it is integrated to some extent with an application that lets you find the data pretty quickly when you need it and put you further down the Road to Recovery.

In my next post we will talk about what I call the ‘fat middle’. In this area most all of the data has a 24 hour RPO and is where traditional and next generation backup applications play. There are many use cases for data protection in this area and RTOs tend to drive the medium to which data is backed up to (disk or tape). Stay tuned for Part 3.

Tags:

Backup, Data Deduplication, Data Protection, Recovery, Restore