Figure 1 – The data warehouse lifecycle
Compose for Data Warehouses is a data warehouse automation solution. It allows you to create and populate data warehouses and data marts in a highly automated fashion using a variety of wizards. Compose can help you automate every part of the data warehouse lifecycle, as shown in Figure 1. Moreover, options for customisation are available at every step, from creating your data model to populating your data marts. A variety of visualisation tools are provided, allowing you to graphically view, for example, the structure of your data model. This is all done without any manual coding or scripting.
Compose for Data Lakes, on the other hand, is designed to let you create a governed data lake providing curated and analytics-ready data. Curation in this case could refer to standardisation, formatting, and subsetting, among other things. There are two stages to this: first, Compose will merge all incoming ingested data into a continuously updated historic data store; second, it will allow you to provision and enrich that data before exposing it for further use. The historic data store retains a full change history and can replay this history at any time. In addition, Compose maintains a centralised ‘mini’ data catalogue for your data lake. This is not intended to be used as a standalone catalogue – although that’s certainly possible – but to support dedicated data cataloguing products by feeding your metadata to them via a REST API.
Figure 2 – An Attunity data pipeline
Having created and curated your data lake, Attunity offers a variety of options for utilising the data within it. For starters, Attunity offers specialised solutions for integrating Compose with Apache Hive, as well as Amazon RedShift. Moreover, Compose for Data Lakes is also compatible with Compose for Data Warehouses. This forms part of an architectural pattern, as seen in Figure 2, that allows you to use the combination of Replicate and both versions of Compose to ingest data into a data lake, curate and provision it, then expose it for consumption in a selection of data marts.
This architecture extends the advantages, such as a high degree of automation, offered by Replicate and Compose to the entirety of your data pipeline. Notably, all three constituent products update continuously, the former via Change Data Capture and the latter two via the historic data store, which is kept continuously updated. This means that the whole of your data pipeline will continuously synchronise with its data source to keep your data marts up-to-date, with all the benefits that that implies with respect to timely business insights.