Welcome back my friends to the show that never ends

Kudos if you recognise the lyric. It already seems that the buzz around big data never ends and I’ve only been writing about it for 2 years!

Anyway, what do you think of appliance-based approaches to Hadoop? If the choice is between something that’s robust, enterprise-ready and pre-built, without requiring any complex set-up processes versus something that is flaky, has single points of failure and needs you to spend considerable resources to configure in the first place, then there’s a good case to be made for an appliance. Yes, you’ll probably spend more money on the appliance but you’ll have something that is going to be ready to run straightaway and will give you value much earlier and, in any case, will need less manpower to manage and run.

However, distributions of Hadoop are becoming more robust and single points of failure are being eliminated. And the trend is towards providing more management capabilities as built-in functions. So where does that leave the appliance? Effectively it comes down to ease of installation versus cost. How much does the former save you and how much extra does the box cost? Does anyone have any figures? I’d like to see some and I’m sure the market would too.

However, in a sense that doesn’t matter because the big boys who put together the appliances will make it cost-effective. Of course the market is in its early days and right now we really only have Oracle and Teradata in this space but what price EMC, IBM, HP and SAP? Get enough of the big boys introducing appliances or at least something that means DIY is just too much bother and that will be that.

The other implication of this is that the distributions of Hadoop that don’t get picked up by the appliance providers will die, along with any of the rest of the Hadoop menagerie that they decide not to use or choose to replace. Given that there are several potential suppliers that have not yet emerged with appliances it’s too early to work out who will be winners and losers, but my guess would be that we’ll certainly know who they’ll be by the end of the year and, quite possibly, within six months or so.

The other interesting question is whether we’ll see any of the major vendors adopt a similar approach to other NoSQL databases such as Cassandra. I think that remains to be seen: it’s not as if there aren’t plenty of people interested in and/or deploying Cassandra (let alone MongoDB) but I don’t get the feeling that the 800lb gorillas get the same sense of buzz around these as they do for Hadoop. Which is interesting: it suggests that the market for Hadoop may commoditise before that of the other NoSQL databases.