History truly does repeat itself. We are talking about the history of data storage. Every once and a while a new technology comes along that requires a new way to think about infrastructure. Notice I said “infrastructure”. I’d like to paint two analogies:

Analogy 1: RAID – Prior to RAID users stored their data on disk and if they could afford it, they backed that data up to have a protected copy of their data. When RAID came out, users were able to store their data on multiple disks appearing as one device. The benefits to this were, increased data reliability, better performance. This new technology however, fundamentally changed how disk was sold, but the questions were the same:

  1. How much capacity do you need?
  2. What type of performance does your application require?

The sales reps point of view changed. There were a number of new considerations that needed to be taken into account. First, the age old question, “Will I sell less storage “stuff?” Remember the person, at the time, selling the disk was probably also selling the backup tape and software to protect that information. If the disks are more reliable, maybe the customer won’t need as much tape? Second, when the capacity question came up, the seller also needed to know what type of RAID the customer wanted to ensure they sold them enough drives. It was no longer as simple as asking the capacity requirements and dividing it by the drive capacity at the time. Now depending upon RAID levels there was a new set of math that needed to be done. Third was the notion of performance and more spindles meant more performance so now that the capacity equation was solved for, you also needed to know the I/O requirements in order to make sure the right number of drives were sold to solve for the capacity as well as the performance.

Guess what, we figured it out and the industry never looked back. RAID is a defacto standard in all storage subsystems today, I even run RAID in my home. The business benefits of having RAID far outweighed the costs. In fact, it is probably one of the first times in storage history that the question of, “how can you afford not to have it”, came up.

Analogy 2: Virtual Machines – When VMware came out the value proposition was, do more work, with less physical infrastructure. And again, the business benefits far outweighed the technology hurdle of implementing the new solution.

Keeping in mind that it is much harder to change process in IT than it is to change technology, IT decided that this new way of serving up processing power to applications was well worth all of the process changes that it would require. One example, backup would need to change when implementing virtual server technology. The data would grow 4x and the processing of that information for backup would take longer, in a world where time was all to valuable. However the business benefit justified the change.

Again, the sellers questions were consistent:

  1. How many virtual servers do you need? (Capacity)
  2. What type of performance do you need for each virtual server?

The answers to these questions allowed a sales rep to configure the right number of physical systems to handle the right number of systems to make the line of business successful. Additionally, some of the same considerations came up. “Will I sell less server and make less money?” Now that there was new server technology (more processors, the ability to handle more memory) systems could be bigger, and more expensive. Sellers also needed to know a bit more about “capacity”, how many virtual systems could a physical system run successfully? They also needed to have an understanding of performance. Now sellers were configuring systems to run the equivalent of 20 to 100 servers on one system.

Today I would suggest that we are at a cross roads in history. New technology has come along that will have a significant impact on the storage world. First, research from IBM reflects the fact that disk drives can no longer keep getting two times as dense for half the cost as they had been throughout the late 90’s and early 2000’s. The technology doesn’t exist today to make the drives spin faster, stay cool and not loose data. Until now. Real-time compression is a game changing technology that will add significant value to the storage industry without having to change the way IT thinks about the deployment of their storage.

Data is growing at such a significant pace today and with the latest IBM research about disk capacities, something needs to change. Data centers are just running out of space and more customers want to keep more data on line for reasons such as competitive edge or compliance, but no matter the reason, they want access to their information. Enter real-time compression. Now there is a fundamental difference between real-time compression and other compression technologies and compression implementations but I am not going get into it here, but it is safe to say that post process and in-line compression are very different than real-time compression and users can’t get the benefits of improved primary storage capacity, transparently, with no performance impact with anything but real-time compression technology.

Again, real-time compression, like other game changing technology, doesn’t require any new questions; there are just simply a new set of math equations.

  1. How much capacity is required?
  2. What is the performance requirement?

In time, real-time compression will be as ubiquitous as RAID, and just like users don’t think that much about RAID, users won’t need to think about compression. Compression will become an expected feature of the array. It doesn’t matter that it now takes fewer drives to satisfy the original question around capacity and performance. With data growing as fast as it is and with disks not being able to keep up their growth pace, something needs to change and that something is real-time compression. Soon, it won’t matter what the physical disk capacity is of a disk drive, it will be about a disks virtual disk capacity, what it has the capability of storing that matters. It is time we all started thinking this way.

Tags:

Compression, data, data compression, EMC, IBM, NTAP, real-time compression, Storage, Storage Efficiency, storage optimization, Virtualization, vmware