Figure 2 – Connectivity and integration
The data catalog has a quite full set of functionality, including a business glossary, connectors to capture metadata, data discovery tools, data lineage and an AI context engine. There is also some data quality and observability functionality, via their acquisition of Mighty Canary in May 2023. At the heart of the product is a semantic model and knowledge graph architecture.
Users are presented with a shopping-like experience, with data assets grouped in collections of related material e.g. there might be a collection of data around “customer information”. The product has a full search and discovery capability with keyword search, and can also show the data quality scores of the data being presented, as well as the sources of that data, and relationships of data to other data. This search capability can be embedded via an API in other tools, so for example, a user of Tableau could invoke this discovery capability without leaving Tableau. The meaning of phrases like “net revenue” can be accessed from the business glossary within the tool.
The technology uses proprietary AI to allow users to populate descriptions of data objects. Since this AI has been trained on the internal data catalog, its descriptions are more accurate than a general-purpose public large language model would be. The model can explain itself, including showing the SQL that it generated and tables that it accessed in order to build up the basis for its descriptions. There are also tools to automate the creation and import of metadata, and the completeness of the metadata can be actively monitored.
As well as software, data.world has a built a large open data community with over 2 million users, dedicated to the sharing of publicly available datasets from, for example, governments and NGOs.