I don’t know about you all, but I am really tired of hearing about tape being “dead”. As the picture shows, folks have been trying to kill off tape since 1984, that’s 30 years ago. I know for a fact that customers have tapes that are 30 years old that they can read data from. Bring me a HDD that is 30 years old and “show me the data”. You can’t!
In reality, the real apocalypse to be concerned about is the data apocalypse. Since the days when I was an analyst (2001) we have used every conceivable adjective that describes “large” and “growing to describe the very large size and growing volume of data. Today we hear things like “The growth of the digital universe continues to outpace the growth of storage capacity”, to scare us into believing we are so overwhelmed with data, well, its just out of control. We hear about new capabilities and new technologies that allow us to store more and more capacity in the ever-shrinking footprint, things like compression, data deduplication, tiering, virtualization, and copy data services. We are even adding helium to drives “6 TB helium-filled hard drives take flight, bump capacity 50%” to cram more data in the same footprint. At the end of the day, adding software capabilities to these disk drives makes data management more complex and more importantly, the chance of being able to read these drives for years to come is becoming less and less.
I see two real issues. First is the data management problem. It is kinda funny to think that we are storing all of this data on HDDs or even HDDs with special software and we probably wont even be able to read it in years to come. This begs the question, “Why are we storing it anyway?” I mean if we aren’t going to be able to read it, why keep it. It is very very expensive to do so. In addition, migrating the data to another storage device in 3 to 5 years just to have it around when you don’t use it is very costly. So what is the challenge? Is it that we just really don’t know how to throw data away? Are the tools for data management so bad they don’t allow us to get rid of stuff? Is it we don’t know how to use the data we have efficiently enough to be able to throw data away? Here is an example that I know I am guilty of. I am doing a presentation for an executive. Of course they are going to change their mind 900 times before it is right. This means I will have saved quite possibly 30 copies of the presentation (that’s what it was the last time I did this exercise). So, I say to myself “well, don’t throw these others away because there is a slide in one of these decks that has a picture or some text I may want”. So, if I were to go back to the presentations I have done, there are probably 20 sets of presentations all with 10 ore more copies. All for what? One slide that I may, or chances are, may never use. There are similar cases where analytics is run on a set of data once the analytics are run, the data could be thrown away. However this data is kept and what for? Analysts will say "in case I want to run a new set of analysis on the data". Who knows if it ever happens?
Deduplication helps with this, on backup storage, but how does it help primary storage? Okay, so we say we can dedupe primary storage now (lets say it is even useful or doesn't case performance issues on the primary array). Great, then when the disk system (all of my files are on a share) gets “old” (3 to 5 years) and I need to migrate all of that data to a new system, because of course I am going to tell the admin that I must have all my data, it is going to take time to rehydrate all that data, move that data (the most expensive thing to do in the data center), then dedupe that data again and at the end of the day who knows IF I will be able to read it when I need it or IF I will even ever need it again. Seems like a lot of expense for a lot of IFs.
The second issue is the notion of data copies. Now the example I just described does happen, but that scenario is not really related to data protection. Data copies for data protection come from just that, the ability to feel comfortable while properly protecting the business. The reality is that end users still want multiple copies of their data. It is a way to safeguard themselves in the event of any data loss. The whole notion of belts and suspenders exists for a reason. The comfort you get from knowing that at any time there is a copy of your data available to recover to helps folks sleep better at night. Having copies of your data is important and necessary. You may not need 40 copies, but you will need a few. So if that is the case, you probably want the most cost effective and most reliable way of storing that data. The Clipper Group has done some outstanding work discussing the shear cost savings tape can provide.
There was an excellent article in the Economist at the end of November 2013 that talked about Magnetic Tape to the Rescue. The article pointed out that it is crystal clear that:
- Reading data from tape is 4x faster (once in the library) than reading from disk
- Reliability / Data Integrity – with a snapped tape you may only loose a few gb of data, with a broken disk drive you could lose a TB of data
- Power / Cooling and Floor space is much more economical w/ Tape
- Tape is a cheaper medium by over 2x – especially as capacities grow
- Tape has greater longevity than disk
In addition, there are more and more use case that are becoming more applicable for tape. Utilizing tape for the cloud is really the only way cloud providers can meet the cost objectives they need to hit in order to
1) Meet customer cost objectives
2) Meet their own financial margins
There is another new concept that is getting a lot of traction these days also called “FLAPE”. It is a combination of Flash (SSD) disk and Tape. The point is, customers need the performance of flash for their most immediate data analysis processing. Then, as the data ages or gets “stale” it can be pushed off to tape, a low cost medium of storing the data because as we said before, who knows IF we are going to need it again? Or when or how we will use it.
Wikibon has done a piece that talks about the "Rebirth of Tape" due to new technology called LTFS (Liner Tape File System) allowing IT to leverage tape in a much easier way, especially in the Flape scenario. Users can simply migrate data off to tape with out the cost of an expensive backup solution and the challenges that come from restoring data through a backup system if you need the data. With a simple policy, say, last touch time of the data, you can migrate the data, on a file system that is easily accessible, to tape.
There is a new category of data storage that is popping its head up and that is this notion of “long-term data retention (preservation)”. (I don’t necessarily like the word “archive” because when you say archive, it instantly conjures up the word “lawyer” in any IT conversation." But the reality is customers need to be able to store a lot of data for very long periods of time, cost effectively AND have access to it as new analytics are used. Tape is an excellent solution for this new use case. IDC believes that by 2017, we will need 39 Exabytes of capacity to store this data. Today, they believe that this data would be stored on disk. That said, they have also said that as LTFS matures, a lager percentage of this data will be and should be stored on tape. Santa Clara Consulting Group (a consulting / analyst group in Santa Clara, CA that does a lot of research on tape) also believes that the growth of tape (media) will be approximately 26% in 2013. This is primarily due to the added use cases and evolving role of tape in the data center from backup to things like:
- Long-term Data Retention/Preservation
- Scale-out NAS
So, as the data zombies continue to eat away at your IT budget, in 2014, tape can be your secret weapon to combat them.