Well, now that the dust has settled and all of the predictions have been made regarding NetApp's acquisition of Data Domain, it is really time to talk about two topics that have a great deal to do with data deduplication. The first is backup and the second is capacity optimization (which leverages data deduplication technology).

Backup

"Everyone hates backup." "Tape is unreliable." "Tape is dead." "Archive first, then backup." "Do more with less." "It's not about the backup, it's about the recovery." "The backup process has to change." We have all heard these words many, many times and while most all of them are true, backup, its processes, and the medium we very frequently write it to has been consistent for decades. So if these phrases are true, and most people believe them to be true, what will it take to change the existing backup environment?

The answer starts with a question. What do customers want from their backups? Customers want simple copies of data, on inexpensive storage, that are easy to find, and easy to recover anywhere and everywhere. Basically they want their data, where they want it, when they want it.

The next question is; how can this be accomplished? Without trying to reinvent the wheel, it starts by shrinking the data as close to the source as possible. Data deduplication is a game changing technology that enables this capability. Given the amount of data growth year over year and the percentage of that data that is duplicate, if the data can be reduced at the source before I move it, and therefore move less of it, there can be a significant impact to:

  1. Backup times, because less data is moved
  2. Backup capacity, because there is less data to store

Data deduplication helps to reduce capital expenditures (driving the costs of disk and tape closer and closer and lowering network impact), as well as operational costs. Data deduplication also helps to facilitate the main objectives of backup, which are to get the data where I want it, when I need it and making it more accessible (disk vs. tape).

The fundamental flaw, as I pointed out in Betamax Redux, is that, as an IT vendor, if you don't own any IP at the front of the backup process (to optimize capacity before, during and after data is sent over the wire) then the choices of solutions that address both the process and infrastructure diversity in the customer's environment are limited and really don't provide the vendor with 'a seat at the backup buffet.'

So if we agree that the hardest thing to change in IT is process, not technology, it stands to reason that current approaches to solve backup challenges have been modest, incremental approaches. Customers often change out their target backup device from tape, to disk (that emulates tape), to deduplication targets, keeping their existing backup software infrastructure (which they have spent a good deal of money on) in place and having only incremental impact to the overall process. It becomes a burden of proof for a new technology to make dramatic improvements in order to implement a disruptive approach to data protection. Data deduplication does just this.

Capacity

Capacity optimization also plays a significant role in the data center, specifically for primary storage, and has a significant impact on secondary processes in IT such as backup. As we have learned over the last 24 months, there is no 'one size fits all' strategy when it comes to deduplicating data. There are also many techniques that can be used to deduplicate data. Hence there is a very complex matrix between the right data deduplication technique, data type and performance requirements.

Investments and acquisitions in technology are key to the growth of any technology company, but it is only one component of a successful growth strategy. The real trick is what can be achieved with the IP (short term and long term), vision, and a lot of hard work. Developing a capacity optimization strategy that rationalizes all of the capacity optimization technologies into a set of services that can be leveraged by each device in a portfolio, adds a great deal of value to customers. Optimizing storage capacity as close to the source as practical to achieve the proper balance between optimization and performance allows users to see benefits in primary storage right away and save on space (storage and footprint), power, and cooling. Next, if the devices in the environment all speak the same capacity optimization 'language' (a form of it) then passing data from one device to another can reduce the impact on the network and open up new use cases for the technology such as reducing the reliance on tape within a backup process. Finally, if the devices that sit at the next tier in the storage infrastructure can receive optimized data, and further optimize the capacity according to the SLA requirements in the next tier, then users can achieve maximum value.

This is not easy to achieve. It takes a lot of vision, planning, buy in, and hard work. EMC has a two year head start on this vision. The goal is to provide a pervasive and architecturally consistent set of capacity optimization services across the storage ecosystem that includes hardware and software. If you happened to go to EMC World, this was the premise of the presentation I delivered there. Capacity optimization leverages a set of technologies that can enable new and foster changing business requirements such as new data protection requirements, changing recovery point objectives, cloud storage and security.

As you speak with vendors who supply technology for your infrastructure it is important, especially in these hard economic times, that you are asking them the right questions, such as "What is your integrated capacity optimization strategy?" It will be interesting to see how NTAP rationalizes all of their technologies in this space. I know they can't say anything at this point but I will be paying close attention when they do.

IT is always fighting the latest fire, however, from a strategic perspective, if you really want to 'do more with less' and want to protect your infrastructure investments, think about how important technology features, such as data deduplication, can play a much larger role in your environment and help you achieve a much better TCO and get a much better ROI. Data deduplication and hence capacity optimization can enable new processes which can have a dramatic impact on the overall backup environment especially when they are comprehensive and cumulative in nature. Then and only then can it take backup beyond and put you on the road to recovery.

Tags:

Backup, Data Protection, Deduplication