Resurgence of the database
Our background, years ago, includes a period in IMS mainframe DBA. So, we were interested to hear Steve Brobst (Teradata’s CTO) talking about partitioning, query prioritisation and so on at Teradata Partners in Las Vegas. He was talking about designing a data warehouse to cope with conventional strategic queries on huge volumes of data and near-real-time “active data warehouse” style queries (on a single customer, say) at the same time.
This is all hugely familiar from those old mainframe IMS days but Brobst is talking about decision support; and with IMS, we were dealing with OLTP (On-line Transaction Processing) and the issues with delivering consistent response times for both short, random access, update transactions and long-running, database-scanning, reporting queries at the same time. To complete the picture, we also had a Nomad data warehouse, loaded overnight in batch, with few of the governance and resilience characteristics of the OLTP database.
The obvious thought arising from this is whether Teradata’s current approach – virtualisation of the hardware, prioritisation, partitioning etc – could eventually deliver a single database that could handle data warehousing and OLTP at the same time.
Teradata is entirely uninterested in this idea (so Brobst says), not least because it has optimised its architecture for read rather than write access – and has built a nice EDW (Enterprise Data Warehouse) business on this. Nevertheless, it seems to be where IBM and Oracle are headed (and noted BI consultant and writer Mark Whitehorn tells us that it was part of the late Jim Grey’s vision for SQL Server). IBM, in particular, potentially has the history and technology to put together a database that supports analytic and OLTP views of a single datastore without excessively compromising either. Oracle even has a customer (Talk America) that seems to be doing just that thing even today (albeit on a smaller scale); and Intersystems is currently doing something similar with Caché, although that product probably won’t scale to compete with Teradata at the extreme high end either.
But perhaps there’s another way of looking at this. As Donald Feinberg (VP and Distinguished Analyst at Gartner) pointed out to the audience at Teradata Partners, Gartner has already predicted the “death of the database” with transaction processing data existing only for the duration of the transaction before being persisted in a data warehouse. As usual, the devil is probably in the detail, but this does tie in with an established approach to transaction processing tried in financial systems – using a OODBMS (Object Oriented database management system) as, essentially, a cache for data kept in a relational “persistent data store” for long term storage (e.g. at Nomura).
In a world with pre-emptive, prioritised scheduling of transactions, physical and logical partitioning of different data record types, virtualisation of hardware and optimisers that really work, we could have fast, low-overhead OLTP databases that move committed data records, in low priority background transactions, to a normalised relational data store; which then keeps a time series of data changes for near-real-time and point-of-time query, as appropriate. “Master data” is just shared company data (as it always was) and the “master data management” silo then goes away (although data quality issues are as important as always, of course). Accessing a “single source of truth” for use in OLTP and query is now straightforward and data duplication is eliminated (apart from automated replication for performance, if appropriate). However, we should point out that we don’t really expect to see the complete merger of OLTP and data warehouse processing databases in the near future – it may never happen, although not necessarily for technical reasons. Even so some organisations, perhaps with relatively undemanding applications, are finding that some transaction-processing databases can support enough decision support and analytic processing to meet their needs with a single database, even today.
But, if it does happen, would merging OLTP databases and the data warehouse really be “the death of the database”? We don’t think so because this hypothetical persistent data store with the logical characteristics of both an OLTP transaction store and a data warehouse is, surely, just a database as originally envisaged (whether it happens to be made by a data warehousing specialist such as Teradata or a more general database supplier such as Oracle). It is simply a resurrection of the single corporate database but including time series data archives – with the underlying physical storage partitioned appropriately and only accessed through virtualised views. The old database virtues of data abstraction, normalisation, access path analysis and so on, often neglected these days, will regain their importance for the general developer. The split into specialised OLTP databases and data warehouses in the 1990s can then be seen as simply a glitch in the evolution of the database caused by the limitations of the technology available at the time. Death of the database? We think we might be seeing the resurgence of the database and the death of specialised data warehouses and OLTP database products, more like!