The impact of MDM on data warehousing
There hasn’t been a lot of discussion about the impact of MDM (master data management) on data warehousing. But there should be.
What has been discussed is where to house MDM implementations. The argument here is that if you need to synchronise operational applications such as CRM systems then what you are doing most of the time is to create, read and update records, which are essentially transactional activities that are best hosted on an OLTP database rather than in a data warehouse. Conversely, if all you want to do is support analytics and business intelligence as opposed to operational applications, then the warehouse would be fine. If you want to do both then it is probably better (unless you are Teradata or Kalido, in which case you have an axe to grind) to host the system outside the warehouse and then federate the master data with the warehouse by means of an EII (enterprise information integration) platform.
However, this doesn’t really impact on the data warehouse per se. What does is a consideration of MDM as the “system of record”. Traditionally, this has been one of the ideas behind an enterprise data warehouse (EDW): that it represented the system of record and that data could safely be extracted from there to populate data marts against which you could run various forms of analytics. However, when MDM is implemented then that becomes the system of record instead of the EDW. Even if that is held within the warehouse the system of record is still logically within the MDM system and not the EDW.
So, what is the implication of this for the EDW? Put simply: you don’t need one. Being the system of record was the raison d’être for the historic concept of the EDW though there is a movement in favour of what is known as EDW 2.0 (ugh!). However, that’s another story.
So, if you don’t need an EDW how do you do data warehousing? And the simple answer to this question is that you have federated data marts with a virtual platform, such as that of Composite Software, to link the data marts together for making cross-data mart queries. And, for that matter, queries that span operational systems and data marts. Nor is this pie in the sky. I know a couple of major organisations that are doing, or planning to do, exactly this.
But, you may ask, how does this environment link to the MDM system of record? And the answer is again simple: you use the same federated platform for linking between the MDM system and the warehouse environment.
Funnily enough I have run across a number of companies recently that have federated MDM deployments (usually either by domain or geography), all of which makes it look as if the federated platform is becoming an essential part of the whole environment. Indeed, it is not difficult to come up with an architecture in which the federated software is the hub for the whole environment.
Further, MDM systems can cater for structural change to data, which means that the warehouse doesn’t require the same level of capability in this regard (for example, slowly changing dimensions). And this doesn’t just apply to analytic MDM providers like Kalido but also to more general-purpose vendors such as Siperian.
There may be many implications for data warehousing arising out of the uptake of MDM but one will be a move away from traditional EDW-based architectures, at least by some companies, while a second will be an upsurge in the market for federated query capabilities.