Fig 02 - Dashbord for the Talend Trust Score
There are several noteworthy features of the individual components within the Talend Data Fabric. Perhaps the most fundamental is that its data integration technology is based on a code (Java) generating engine. This has both advantages and disadvantages. On the one hand, it should perform better than using SQL. Moreover, you don’t need to use, or pay for, an intermediate server. On the other hand it means you need to regenerate your code whenever something changes. And even though you may be able to automate a lot of this process it will still mean manual intervention at least sometimes. More generally, Talend is well endowed with native connectors and other components, with more than a thousand of these altogether. However, automation is generally limited and in-built machine learning capabilities for things like recommendations, metadata discovery and identifying sensitive data are still at an early stage of development: there are some capabilities provided but they need building out, which is what the company plans to do. Indeed, the company’s ultimate goal is to provide a platform that is (almost) completely autonomous, as illustrated in Figure 2.
Fig 03 - Talend Data Fabric Architecture
One innovative idea that the company has introduced is what it calls the “Talend Trust Score”. This is presented as a single score – see Figure 3 – that is calculated based on data quality and popularity metrics plus any user-defined criteria. It’s a nice concept to help business users understand the trustworthiness of their data, based on the “5 Ts”. That is, that data should be thorough, timely, transparent, tested, and traceable. This is a function of Talend’s Data Inventory module, which runs on either AWS or Azure. We haven’t seen anything else quite like Data Inventory, which provides a “single pane of glass” to support collaboration, self-service, and exploration about datasets. In other products, where they have comparable features, they tend to be spread across different products.
Finally, we should mention the fact that the company is planning to implement data masking as a capability alongside data integration (currently it is only available within Data Preparation) and it also intends to further build out its policy management capabilities to support data governance.