Cray and Hadoop: not (at first sight) an obvious partnership

In over 20 years as an analyst I don’t think I have written about Cray before. You don’t normally think of supercomputers and information management in the same breath. But big data and the Internet of Things potentially changes all that. Anyway, Cray has made a couple of interesting announcements recently.

The first is that YarcData, the graph database/appliance vendor, which was a Cray subsidiary, has now been folded back into the main company and the product re-branded to Urika-GD with, of course, the Cray logo. The company’s position is that YarcData proved the viability of the big data analytics market and now the company wants to engage in that space directly.

The second announcement is that Cray has launched Urika-XA. This is a Hadoop platform.

Think about that for a moment. What comes to mind when you think about Hadoop? Low cost commodity hardware is what comes to my mind (plus a few other things). Tell me, do you associate low cost commodity hardware with Cray? No, I thought not. And it isn’t what they are offering. What Cray is offering is competitively priced high quality hardware running Hadoop, with Cray as the single point of support for the bundled hardware and software components. Cray designed the Urika-XA system to handle multiple types of analytics, meaning that workloads that would typically be spread across multiple clusters can be consolidated onto a single platform and without the data movement and replication that would otherwise be required. Having multiple analytics workloads centralised in one place also means less movement and replication of data is required. These efficiencies mean that you can have a smaller cluster providing the same power, capacity and performance (or better) at roughly the same price but using up less floor space with lower overheads. One caveat though: Cray is targeting clusters beyond the piloting and POC stage, likely in the 50-node range.

One thing to note is that Urika-XA is not an appliance even though the software is pre-installed for you. The company argues that if you license an appliance and Cloudera, for example, comes out with an update then you will have to wait until your appliance provider supplies you with that update before you can implement it. More critically, the big data landscape is constantly evolving, and if your business needs dictate the use of a complementary analytics tool that isn’t already installed as part of the vendor’s stack, then you are also out of luck until your appliance vendor relents. Cray, on the other hand, takes the view that you should be able to implement that new release or analytics tool of choice whenever you want to and should not be constrained by your supplier. I actually think this may be taking a harsh view of (at least some) appliance vendors but I take the point made by Cray, especially when it comes to using analytics and others tools that are not part of the appliance vendor’s stack.

The very existence of Urika-XA raises interesting debating points. One of the initial ideas behind Hadoop was that you could leverage out of date hardware that just happened(?) to be lying around somewhere in the organisation. And that’s valid if you are just trying things out or are in a skunkworks group. But is that really satisfying if analysing all of this data is supposed to be so important for your business? In production systems do you really want to be relying on cheap hardware with short meantime to failure that doesn’t necessarily fit into the enterprise data centre? I don’t think you do. In which case Cray’s approach, backed by Cray’s reputation and customer support, makes a lot of sense. But it also transforms one of the major concepts behind Hadoop. And when even Cray can build something that is competitively priced for what is no more than a medium sized cluster, then I think that changes the way that we have to look at and think about Hadoop.