The warehouse has gone missing – damn!

We will shortly be publishing my latest opus on data warehousing or, rather, on analytic warehousing. More about that when the time comes. But, of course, there’s always stuff that you discover when it’s too late. Despite the fact that we’re covering over 20 products in the report anyway, there are always others that turn up. Two, in particular, escaped our attention. The first, Xtremedata, I can forgive myself for, as they only launched last month (way after our research began) but the other, Vectornova, I should probably have known about as it launched into the general market last year. Oh well, no point crying about spilt milk.

Vectornova appears to be based in Mexico and its products are only available in Europe and Latin America. It is open source with developers in a number of other places such as Brazil. It is currently available in two versions: a software only offering and an appliance with 2Tb. There is a fault tolerant version of the software-only product in beta and another appliance, with 8Tb of data, scheduled for release in Q1 2010. The product is described as having a vectorised columnar architecture with either a shared nothing or a shared-disk (SAN) architecture. The interesting thing here is the support for vectors: you can either use SQL or VectorSQL to access the data. Bearing in mind the latest announcement by VectorWise and Ingres (see further article to follow) I suspect we are going to hear a lot more about vectors over the next few years.

The other notable point about Vectornova is its support for the statistical language R, and the multi-dimensional array processing language J.

As far as XtremeData is concerned, this is specifically an appliance vendor with its DB^x appliance. It is based in the States with a development office in India. Its appliance uses a shared-nothing MPP-based architecture with a head node and multiple data nodes. Each data node has a multi-core CPU, twelve 1Tb disks, and an FPGA (field programmable gate array) that XtremeData describes as an in-socket accelerator. That is, the FPGA attaches to the motherboard and has direct access to its resources. The company claims that its solution will scale from 8 to 1,024 nodes but initial offerings support 30, 60, 105 and 225Tb of user data, going up to 60 nodes in a 4 rack system. On the software side the company has based its development on PostgreSQL.

Of these two vendors, it seems to me that Vectornova is better positioned. It has some existing customers, has been around for some time and is operating in a geography that is not the first port of call for most competitors (the biggest exception being illuminate). XtremeData, on the other hand, is going to have a hard time convincing people that it is not just a Netezza clone. Even if it isn’t there are a lot of similarities (same underlying architecture, same underlying database, both using FPGAs), especially now that Netezza has launched TwinFin. At least Vectornova is doing something interestingly different with its vector processing, which gives it a story to tell, but in an already overcrowded market I am not sure that XtremeData has the same sort of differentiation.