Today I listened to the podcast by one of my absolute favorite industry influencers Mark Twomey (@storagezilla). The title of the podcast was The Copy Data Cortex. It featured another industry insider with a great deal of knowledge Stephen Manley, CTO of Core Technologies at EMC, and they were talking about “Copy Data”, and as Mark put it “…it isn’t the wave of the future, it is a present thing…”.
Stephen had just come from speaking to a customer where they wanted to talk about where they see the value of copy data and where is it going. Marked talked about the $44B market space called copy data (which has just been upgraded by ‘those industry analysts’ to $51B by 2018). Stephen went on to say that the customer understood the value of new technologies. They talked about the value they were able to abstract from deduplication for example. He also mentioned the customer is buying into the premise that leveraging data copies for different recovery methodologies; operational recovery, version recovery and DR all makes sense. Where the break down is, is how to leverage these copies for other business use cases such as Dev/Test or Analytics.
Mark and Stephen go on to make some good points about workflows, security and applications but let’s take a quick step back for a moment to see how the different lines of business obtain access to data copies to meet their business need today.
Typically, a line of business owner or application developer will reach out to IT to ask for a copy of production data in order to do their job. Let’s take Dev/Test for example. Development asks IT for the copy of production data they need. IT then goes to their backup catalog to see if they have the data that is being requested. If they do, they go to the backup and perform a recovery and make that data available in order to meet the data request from the line of business. The business is using these backup data copies in order to meet their needs today.
The challenge with this is that performing this recovery has many steps, is labor intensive, leads to data sprawl (as it is never cleaned up) and takes a long time. There has to be a better way.
The podcast talked about an important piece of the whole process highlighted above. The catalog. Having a full catalog of the data in your environment, including your snaps and replicas, with the ability to correlate the data to the application and understand the lineage of that data, unlocks the power of your data. The ability to leverage your catalog as a metadata store to gain visibility and insight of the data in your environment can help you better take advantage of your data.
The one topic that didn’t come up as a part of this piece was automation and orchestration. If users have a catalog of all the data in the environment, and can have that catalog talk to the storage and hypervisor API’s, the catalog could then do the copy creation for you. Take that to the next level and have those copies orchestrated to be made available for any business use case, recovery, automated DR, Dev/Test, analytics or any other data access need. The creation of these workflows, created in conjunction with the application owners enables IT to now meet the data access SLAs of the business and puts the power of the data in the hands of the business. Stephen points out that there should be a relationship between the IT team and the app team and if they can leverage storage tools, “great”! – Today, you actually can do this.
The other interesting note Stephen mentioned was the fact that clients are buying technologies like XtremIO to “flip up” as many data copies as you want and get the performance you need then tear them down. A couple notes. First, no one ever tears them down, this is a big problem and causes copy data sprawl. (Mark, you wonder how ‘copy data’ becomes a $51B problem for IT? Have your expensive flash storage have too many copies of “stale” data copies on it.) Second, I would 100% agree with Stephen that the great thing about flash is the ability to spin up multiple copies and have application developers leverage the performance of flash but again, at the end of the day, how is that data made available? With a copy data management solution that can leverage your existing assets, (storage hardware, services and hypervisors) you can spin up application consistent copies of only the data you need to develop (test) against or run analytics against, and then have the same technology tear them down eliminating data sprawl on that expensive storage. (Yes flash prices are coming down, but not THAT much.)
This was a great podcast that takes the next step in highlighting some of the key challenges in data management as data continues to grow. It also highlights the coming together of the IT teams and the application teams to drive a more competitive business. Copy data management goes a long way to helping the business get more out of their data. I look forward to more podcasts from @storagezilla on The Data Cortex. Thanks Mark, nice work.