I have decided information today, is like a group of friends. If you look at my LinkedIn page or my Facebook page you see that I have over 600 connections and over 180 friends respectively. What does this really mean? Obviously don't stay in touch with all of these people. So why do we have these connections? I think it is because we believe that in the future, each one of these connections will offer some kind of value to us. It may be that they will be a friend to us, they may share common experiences to help us through a personal issue, and they may help us find a mate or even a job. We just don't know so we hang on to the connection.

This is not unlike information. We are all tired of hearing that "data is growing at an exponential rate" but we never look at why. It is simple. We believe that ‘someday' we may need that ‘valuable' piece of content so we better not delete it. More importantly, the people who are accountable for managing that data (IT) are one step removed from the ‘value' discussion (usually) so rather than delete anything and be responsible for "loosing data" they save and protect everything.

Recently I spent 4 hours on my Facebook page ‘categorizing' my friends. I created a number of categories, friends from high-school, friends from college, colleagues from work (current), colleagues from work (past), industry connections and relatives. As you can imagine there are some friends that belong in more than one category - so how do I choose which one they should go in? Also, what happens if I change jobs? Where do the ‘colleagues (work)' friends go? When do I move them? Do I remember to move them?

I have often said when presenting to customers, "EMC can help you with all aspects of you data except for one thing. EMC will never know the value of a piece of your content to you. You have to tell us, and then we can manage it properly." Typically when customers hear that statement, they agree, but they also agree that the process of classifying data is a daunting task. You can see the challenge of just organizing friends in Facebook. There are so many permeations of how data can be classified that IT chooses the path of least resistance, store and protect everything.
While storing and protecting everything is easy, it also hits at the three biggest challenges IT are faced with; cost, complexity and compliance. These three vulnerabilities are the toughest to balance because not only are they important in their own right, they also are interdependent. As data grows, the inability to protect it grows which means IT either needs to spend more money or be out of compliance.

The cycle is only broken when new processes are introduced. These processes are a part of a key message when it comes to data protection; assess (classify), archive, backup, manage. Only when customers believe that the struggle of trying to keep cost, complexity and compliance in check happens when a new process is introduced, can the cycle be broken. Once new processes are in place, the data center can become more efficient.

Consider this analogy: In July 1936 Henry Philips received a patent on a new type of screw and screw driver he had invented. This new "technology" changed the world of mass production and machine repair.

He didn't set out to make the life of hand tools easier, he was trying to solve an industrial problem. The new screw and screwdriver was designed for use with power tools and more specifically power tools on an assembly line.

The slot in the screw allowed itself to seat itself in the tool automatically when contact is made which saves a second or two and if you have 100's or 1000's of screws like in cars or airplanes then it saves a great deal of time.

In 1938 Henry was able to get the American Screw company to spend a $500,000 to develop a manufacturing process around the new screw. By 1940 nearly all of the American manufactures had switched to the new process and the new screws. It made all the assembly of military air craft and jeeps much more efficient. Having these vehicles made faster and more efficiently contributed to a competitive advantage.

So, it's like I say when talking to customers; "The hardest thing to change in the data center is not technology it is process ". Once the psychological inertia of dealing with a new process is overcome, then progress can be made.

Once customers start to classifiy their information (assign value to it), they can begin to archive their 'old' data. This will still provide them access to it, just not as quickly. Once this data is removed from the backup stream, backups will then run much more efficiently. Additionally, deploying new technologies such as deduplication for specific data types (realized during a proper classification effort) allows IT to more efficiently backup specific data types in specific areas for much lower cost. Now that all the work has gone into establishing a new set of processes, IT will want to continue to manage this new set of processes to ensure that all the hard work they have done has tangible business capabilities. New processes can help IT attack cost, complexity and compliance but it all starts with information classification.

Posted by Steve Kenniston

Tags:

Archive, Backup, Classification, Data Deduplication, Data Protection