Data Warehousing update, part 1
A lot has been happening on the data warehousing front lately. Earlier this summer, Oracle announced its Information Appliance initiative and, in particular, a partnership with PANTA Systems. Then, more recently, ParAccel has announced a product re-positioning, Calpont is finally talking about its entry into the market, and both DATAllegro and Netezza have announced new product releases. In this article I am going to discuss Oracle, ParAccel and Calpont (for reasons which will become obvious) while in the second part I will direct my attention towards DATAllegro and Netezza.
Let me start with ParAccel. The company offers a software-only column-based approach to data warehouse appliances and had initially focused on the fact that you could deploy this in conjunction with Microsoft SQL Server. However, it was always the case that the product could also be used in stand-alone fashion and experience in the market has led the company to putting equal emphasis on this offering, as opposed to the SQL Server accelerator. In addition, the company has found that there is also significant demand for supporting Oracle data warehouses and the product is also now offered as an accelerator for Oracle environments.
Calpont is also targeting the Oracle market in the first instance (or it can operate stand-alone). However the product, which has not yet been released, is significantly different from when I last wrote about it. Long time readers may recall that Calpont was originally designed as “SQL on a chip”, then it evolved into a hybrid hardware/software solution using, again, a column-based approach. However, during the development process the software team built a hardware emulator so that its developments would not be slowed down by the hardware guys and, lo and behold, they found that this worked so fast that they decided that they didn’t need to bother with the hardware, so now the product is software only. Without going into too many details at this point the big difference between Calpont and other appliance vendors is that it has placed a major emphasis on concurrency, which can be a limiting factor for other suppliers, as well as configuration flexibility. In the first release the product will offer a shared everything environment but the company also intends to add a shared nothing capability in a future release. It will also add support for DB2. Note that, like Dataupia, Calpont will allow existing Oracle applications to run without change against the Calpont database.
As far as Oracle itself is concerned, it announced its Information Appliance initiative back in April. This is somewhat similar to IBM’s balanced warehousing approach (pre-tuned, pre-configured database pre-installed on the relevant hardware) except, of course, that Oracle has to work with various hardware partners. Perhaps more interesting is its partnership with PANTA Systems.
PANTA Systems is a five year old company that originally focused on the high performance computing market as a server and storage provider. However, it identified data warehousing as a potential market and started working with Oracle in early 2006. This led to a performance busting measure for the TPC-H benchmark towards the end of last year. This was impressive. However, regular readers will know that I have never been very sanguine about benchmarks in general; the system used for the benchmark was only 1Tb, which is pretty small for a data warehouse; and, while TPC-H may (arguably) be representative across a breadth of data warehousing requirements, I am certainly not sure how useful it is for measuring the sort of workloads (typically more depth than breadth) that data warehouse appliances are usually deployed for. Nevertheless, credit where credit is due, this was a genuinely impressive performance by PANTA Systems.
As far as the system itself is concerned, what you get is a RAC (Real Application Clusters) based appliance based on Linux, which uses PANTA’s own blade servers that each support up to 6 InfiniBand planes, as well as a native InfiniBand-attached storage array. The idea behind this is to scale aggregate sustained I/O bandwidth into the tens of GB/s range. Standard appliances are available up to 96Tb (but bear in mind Oracle’s compression capabilities) and bespoke appliances are available for requirements greater than this.
Perhaps the most interesting thing to note is just how many vendors are offering add-on acceleration capabilities to Oracle. This is not surprising given Oracle’s leadership position in the database market but it does pose an interesting question. If you decide not to replace an Oracle data warehouse with one from someone else, and you figure that native Oracle capabilities are not enough for you, then who do you choose to extend your Oracle environment with: PANTA, Calpont, Dataupia or ParAccel? I don’t know the answer to that question yet, but I hope to find out.