Jul 01, 2010

A Blueprint for Primary Storage Optimization

Cloud

During the past three to four months the storage industry has seen a spike in the number of reports, white papers and news articles surrounding the evolution of primary storage technology, capacity optimization (it is 2010’s hottest technology). The reason this technology is getting a lot of ‘air play’ these days is due to the fact that this technology is so critical to help control the growth and costs of storage. In 2010 the EMC sponsored IDC Report “The Digital Universe Decade - Are You Ready?” was release and stated that:

In 2009, amid the “Great Recession,” the amount of digital information grew 62% over 2008 to 800 billion gigabytes (0.8 Zettabytes).
The amount of digital information created annually will grow by a factor of 44 from 2009 to 2020…

The folks at Wikibon also released an info graph that exposes the true explosion of data.

Via: Wikibon

When you combine storage capacity (and the foot print it takes up) along with the power it takes to run it and cool it as well as the human resource it takes to manage it, you soon realize we cannot keep ‘just adding more cheap disk’ in an effort to manage the storage demands. High Tech companies with high tech labs are also telling IT that ‘they are out of tricks’ when it comes to the ability to continue deliver disk drive that double capacity every 18 months. It is for these reasons that primary storage optimization technologies have stepped into the ‘lime light’ as it serves as a means to help control the growth of primary storage including the foot print, power, cooling and man power required to manage it.

However, as we all know in IT, no two environments are the same and what may be good for one may not be good for another. When looking at primary storage optimization there seem to be a number of available technologies and ways to deploy these technologies and the key question is what is right for ‘my’ environment.

The first things to consider are:

1) What is the primary objective of the storage system(s) in my environment (it may be different for different systems)?

2) What are the primary characteristics I look at when I purchase a storage system?

3) What are my current business objectives surrounding my storage?

It is important to remember why you acquired your storage in the first place, and leverage all of the decision making processes that surrounded the acquisition of that storage. For example, if performance was a key characteristic for acquiring a particular storage system, then it should stand to reason this is something that can’t be sacrificed when looking to add capacity optimization into the mix.

There are also a number of ways to optimize storage capacity. The two technologies are compression and data deduplication. The methodologies in which they can be deployed are inline or post-process.

Compression technologies can reduce storage footprint anywhere from 50% to 90% depending upon the data type. Compression technologies have been around for decades and are trusted technologies. Compression can be deployed as a post-process (think of WinZip – zipping a file once it has been stored), or it can be deployed as a real-time application that does compression on the fly – this is the Storwize model.

Data deduplication technologies can reduce storage footprint anywhere from 10% to 50% on primary storage except its dependency is really on data usage not type. In environments where there isn't a lot of repetitive data the deduplication ratio will be low. In environments where there is a lot of repetitive data, the optimization ratio will be high. Today’s data deduplication solutions for primary storage all happen post process. (This is primarily due to the performance limitations when trying to deduplicate data in real-time on primary storage.) There have been announcements in the past few weeks that have mentioned data deduplication technologies becoming embedded into storage systems (which is where this technology should be) and this will significantly help with performance.

Now that there is a basic foundation for what these technologies can do, the real question is how do these fit into the overall requirements for specific storage needs. It is important to take a look at the storage within your environment and what the impact of each of these technologies and their deployment has on that environment. For example, if you don’t have any system resources left over in a day to perform a post process operation, then a real-time deployment (as long as it does not degrade performance) is the logical solution. If you have a great deal of repetitive data (VMware .vmdk files – without the data stored in the file) then a deduplication solution is the best fit. If transparency within your environment is important (not having to rearchitect applications, networks or storage) then a solution that allows you to optimize your capacity without having to change anything of these the right solution. Conversely if you have plenty of time to compress data once it is written, and no need to worry about human resources to compress or decompress the data (like WinZip) then this is a perfectly viable solution as well. I could go on, but I think you get the picture.

The other key variable is cost. As my grandfather once told me “You get what you pay for in life” and “Nothing is free”. Each of the different technologies outlined above come at different prices. One thing to keep in mind is that the value of the solution is directly proportional to the cost. Some solutions may say they are ‘free’ however, when you consider it takes horse power to run these solutions, they aren’t free and if it takes developing a bigger system to handle the optimization work load or reconfiguring your system to enable optimization to work properly or changing a recently acquired backup technology in order to make primary storage optimization be effective throughout the entire process, then the solution really isn’t free. These are all things to consider when looking at an optimization technology.

Remember, just like data deduplication for backup 5 years ago, it sounded too good to be true and now if you don’t use it for backups your missing out. Don’t let the same thing happen to you in your primary storage solution. Get ahead of the curve and if you have any questions – please ask away.

Tags:

blueprint, Compression, data compression, Data Deduplication, Data Protection Management, Dedupe, Deduplication, in line, Permabit, post process, random access compression, real-time compression, Storage, Storwize, Virtualization