Mar 13, 2013

"Big Enough" Data

Storage, Business, IT, Big Data, Storage Efficiency

Say the word “infrastructure” at Strata conference and you hear people start talking mostly about software. One of the biggest lessons I learned at the conference was that no one really cares about the hardware that “big data” sits on. Now, I am probably the biggest proponent of the fact that “spinning rust” doesn’t make a company money; it is the software that can extract the value out of the data that provides insights that makes corporations money. However, that being said, at some point, the infrastructure matters. If you don’t have any storage, you don’t have any data, in which case, you have no competitive advantage.

I see the world of “Big Data” today much like the world of “Cloud” 5 years ago. Five years ago, only the companies with a lot of money could actually build a cloud. Then they found out how difficult it really was, and then the definition of cloud got so murky, because everyone implemented their own version of cloud, that today there is no real definition of cloud. Cloud is nothing more than a robust, flexible, scalable, easy to manage infrastructure. Now isn’t that what you were trying to build when you built your original IT shop? Or were you trying to build a non-robust, inflexible, non-scalable hard to manage infrastructure? I don’t think so.

There is a real opportunity with Big Data today to truly define what it really is. The reality is, Big Data starts small. Big Data is NOT PBs and PBs of data. Big Data can be 5TB of data. The reality is, if your organization is not used to dealing with this much capacity, but now you are going to leverage the over 7,000 open APIs available to bring data into your data center to analyze to make better decisions, that is Big Data. Now, while at Strata I did learn about the infrastructure requirements for big data environments. Consumers of big data need few things when it comes to:

Scale – at a very unknown rate
Performance – in order to keep up with the data stream and the analytics
Data Protection – more recently – snaps, clones, replication, backup, etc…
Low cost

Most big data environments today are servers clustered together with internal storage. However we have all seen this picture play out before. Direct attach storage was “good enough” until you needed storage services to help businesses provide these four fundamental characteristics of storage. A networked attached storage environment is the predominate way to ensure businesses are getting the most out of their data.

In order to scale, without requiring additional CPU (which by the way is the biggest complaint), network storage is the only way to go. In order to ensure the best performance for not only the analysis, but for the influx of data as well as the movement of data, network storage is a must. Lastly, in order to make sure businesses are getting the data protection services they need, at the right performance levels, such as snaps, clones and etc… again, you need network attached storage.

Other storage services that you can gain by leveraging network attached storage are:

Compression
Tiering
Thin provisioning
Virtualization – allowing you to have larger storage pools

By leveraging these capabilities, you actually hit the 4^th and main characteristic – low cost.

So, while a number of folks are talking like “Big Data” is really multi 100’s of TBs to PBs – the reality is, it is not. If your business wants to start looking at data and doing analytics to make your business more competitive, start planning your infrastructure now. Don’t make the same mistakes you made a decade ago. You’ll find that just a little bit of data can transform your business and utilizing a networked storage infrastructure can give you all the benefits you will require as you grow.

Tags:

Backup, big data, cloning, Cloud, data, Data Protection, IBM, network attached storage, open api, PB, Replication, snapshots, Storage, strata, TB, Virtualization