Jun 07, 2010

Gravity Applies to Everyone!

Storage

There was an interesting announcement today regarding Permabit who is now providing primary storage optimization through OEMs and having their solution embedded into the storage system. This further drives home the point of where capacity optimization should live. I do have a couple of questions however:

What is the performance like? I see phrases such as “High Performance Data Optimization Software” but don’t see any performance metrics – such as ‘no performance degradation’ for customers utilizing the solution. Or testing metrics from their ‘partners’ (as it probably isn’t in production yet) – which brings up another question:
Why were none of the ‘design win’ partners quoted in this announcement?
Rehydration – Mr. Floyd states:
Permabit's Floyd claims Albireo can maintain data integrity because data written to disk isn't altered, and the reduction takes place out of the data path. When parallel processing is used, deduped data doesn't have to be rehydrated when it's accessed.

The question is – if it doesn’t need to be rehydrated, then how does the application read it? I can only assume that Mr. Floyd means the data doesn’t have to be rehydrated on disk, which is fine, the question become: a) how does the application know what the data is? (Ocarina uses an agent to help them understand the data, but this is another thing to manage) and b) What is the performance of the system looking up all of the hash keys to reassemble the data on the fly, so how much more storage resources will this consume?
Back to performance – Permabit states:
When done inline, data will flow to the Albireo library before going to disk. Post-process deduplication will write data to disk first, then scan and eliminate duplicated data. The parallel option sends data to disk while still in memory, and applies updates the same way as post-processing without having to read data off disk. Each method has different amounts of latency and reduction efficiencies.

Here the question is what is the difference between ‘inline’ and ‘parallel’? Additionally, If you review the description of how parallel deduplication works, it seems as if does not optimize writes and probably writes over blocks that were recently added and are redundant. This does not save in write activity as compression does (writing less to disk).

I commend Permabit for being very up front about the performance issues that come with doing something very complex. Deduplication for storage is very difficult. To do it inline and in real-time is even more of a challenge. As Floyd says, "It's been amazing to witness how fast deduplication went from a 'science experiment' to mainstream in the backup use case, and primary storage -- while perhaps not being adopted at the same rate -- is being considered more and more," he said. "Performance will get better and become less and less of an issue as most of the algorithms are limited by CPU, which is getting very inexpensive. But even in cases where memory and disk spindles play a role in performance, those issues are increasingly getting more cost-effective to overcome." I couldn’t agree more.

The one thing that customers should consider today however is primary storage compression, in real-time, without any performance degradation. This technology is available today from Storwize. Today, as an appliance in front of NAS environments, customers are saving anywhere from 50% to 90% of their capacity without performance degradation nor needing to change anything about their environment including application, networking, storage or downstream processes such as snapshots, replication or backup.

Technology is evolutionary. The first step starts today with Storwize. To see more about the Storwize solution or to spend 15 minutes to learn how to save 50% or more of your storage capacity go to Storwize/ROI.

Tags:

Capacity Optimization, Compression, data compression, Data Deduplication, Dedupe, Deduplication, Ocarina, Permabit, real-time compression, Storage, Storwize