One of my favorite shows is Criminal Minds. I think most people that are like me, like those sorts of crime shows. However, I like Criminal Minds, and being in the data business, I like how Penelope Garcia has access to a TON of data that helps her help her team to solve crimes. (And yes, I know it is a TV show.) I am curious to know what the infrastructure looks like that sits behind all of her fancy applications (I’d like to know what those are too) that stores the data so she can help her team catch the bad guys. You don’t see that in the show. Lots of screen shots of data flying by, but no storage arrays. (I’d say no servers either but they are all virtual of course.) In the latest episode I saw, the team was looking for a killer who would jump trains to different cities in Northern California and in each city he would kill someone. Penelope was able to figure out, that the locations the killer stopped at, where the crimes were committed, were in the same locations where certain crops were harvested, meaning that this killer, traveled with a group of people who harvested crops up and down this “train alley”. By tying harvest dates to kill dates she was able to narrow down “who” may be doing the killing. Pretty amazing. Then the team just swooped in, figured out the rest of the story and captured the killer. Good stuff.

However, I do want to go back to the big question for me. I really would like to know what the infrastructure looks like, that supports the ability to do all of this “real-time” data analytics. This week I am at Strata, a conference that talks all about “Big Data”. So far, there hasn’t been a lot of conversation around the infrastructure piece. Sure, lots of application vendors, talking about using their application to look for answers in piles of information, but I’d really like to see the data flow and where it ends up living. We keep talking about this tsunami of data, but the reality is, if it really does just live “in the web” do we need to store it? If we do need to store it, what does that infrastructure look like in order to be able to access it quickly, when we want, in order to get real value out of it?


The reality is, like I always say, the infrastructure that supports the application, in this case, Big Data, is much less important that the value Big Data can bring your business. So, I pose this question to you: Since the storage and access mechanisms that one uses for Big Data are secondary to what one actually DOES with the data. What workloads or business processes in your organization use Big Data, and what exactly do they do with that data? Is there something about what they do that requires using object access protocols rather than block or file access protocols?

As you can tell, since there isn’t a lot of conversation about it, the real question is based on how the data is used and does it make sense to have true object based storage and if so why? I’d be very interested in your opinions.

Tags:

analytics, big data, criminal minds, data, IBM, penelope garcia, Storage, strata