Investing in an Oracle data warehouse

Written By:
Published:
Content Copyright © 2007 Bloor. All Rights Reserved.

Regular readers will know that I am a
fan of the data warehouse appliance approach and also,
historically, an advocate of column-based processing. However, none
of that means that there aren’t good reasons for investing in data
warehousing technology from traditional vendors. In this article I
will discuss some recent innovations in Oracle’s data warehousing
and some good reasons for considering the use of its
technology.

The first thing to note is Oracle’s
optimised warehouse initiative, which has involved the company
working with the various hardware manufacturers to produce
reference architectures which are then pre-configured, pre-tuned
and pre-installed in much the same way that IBM has done with its Balanced Configuration Unit. The difference in Oracle’s
case, however, is that it is working with Dell/EMC, Sun, HP and IBM
to this end, whereas IBM’s offering only runs on IBM hardware
(either AIX or Linux based). To date, Oracle has published
reference architectures for each of these vendors, and the first
Optimized Warehouse currently available is running on Dell servers
with EMC storage. This comes in building blocks of 1Tb
each.

In terms of reasons why one might
consider the use of Oracle for a data warehouse there are perhaps
five that I would like to pick out.

The first is with respect to operational
BI. Such environments require constant updating of the database and
the ability to support many short (often just look-up) queries
alongside more conventional warehouse requirements. This sort of
environment has many of the characteristics of OLTP (as well as the
need for strong workload management), so the merchant database
vendors have an advantage in this area when compared to appliance
vendors, though the likes of Sybase and Teradata do have these
sorts of capabilities, as does HP NeoView.

Secondly, there is the fact that Oracle
embeds data mining within its database. The big advantage of this
is that you don’t have to move data out of the warehouse to process
it, so there is a significant performance benefit. This looks like
it may only a temporary advantage however, as SAS, for example,
has recently announced that it will be implementing comparable
capabilities in conjunction with Teradata and it is already working
on this sort of functionality alongside Netezza.

Thirdly, there is the introduction of
advanced compression in Oracle Database 11g. This will
significantly reduce the size (and cost) of data warehouses using
Oracle in the short term and should also improve performance.
Again, however, this is a diminishing advantage as a number of
other vendors already have this capability and those that don’t soon
will.

Fourth, there is Oracle’s use of
partitioning. One of the aspects of this is that it enables support
for multi-tier storage configurations. That is, you can have one
partition on one type of storage and another partition on a
different type of storage. So you can put, say, your most recent
data on your fastest disks for rapid retrieval but older data on
slower disks in order to support information lifecycle management
policies and also to enable more cost effective retention of online
data.

Finally, there is Real Application
Clusters (RAC). One of the commonly used purposes for this is in
the Oracle Daily Business Intelligence (DBI) solution, which allows
you to do transactional processing on one set of nodes while a
second set not only acts as back-up for the first but is otherwise
used for query and reporting against the transactional data. Now,
this has lots of performance (and other) advantages when compared
to doing both of these things on a single system, but it is not
data warehousing. However, Oracle now has customers going one step
further. Talk America, for example, has a RAC implementation that
has four nodes dedicated to OLTP and two nodes dedicated to data
warehousing. This has significant benefits. For example, it means
that the customer table in the OLTP part of the system is directly
accessible from the data warehouse. It also means that you do not
require ETL processes to move the data from one place to another.
Of all these features it is perhaps the last that is most exciting.
While the Talk America warehouse is relatively small at less than
10Tb, if this sort of capability can scale up and still provide
good enough performance then it could provide a significant new
direction for the market.