data.world was founded in 2015 and is based in Austin, Texas. It is funded by venture capital, having raised $50 million, the last series C funding round being in April 2022.
Its customers include Penguin Random House, Associated Press and WPP. In 2023 its revenues are estimated by Bloor to be more than $20 million, primarily in the US market but with some customers in Europe also.
Company Info
Headquarters: Capital Factory, 701 Brazos St. Suite 519, Austin, TX 78701 Telephone: +1 (512) 697 4897
Figure 1 – Tech partners and Ecosystem: 100+ integrations
The data.world technology has a semantic database and knowledge graph at its core, on top of which is a data catalog and a data governance application, as well as an artificial intelligence capability. This is a cloud-native product, and the company partners with a range of complementary technologies, such as Snowflake, Monte Carlo Data and Matillion. The core product is aimed at business users rather than technologists, and it is notable that many of their customers have unusually deep penetration of users of the software: several customers have literally thousands of end-users actively using the product, rather than just a small number of super-users and administrators, as can happen in many data catalog implementations. One client, a well-known global consultancy, has 35,000 users of the tool.
The data.world technology competes directly with leading data catalog vendors such as Collibra, Alation and informatica.
Customer Quotes
“Our business needs to be ready for change during these turbulent times, …we will be able to apply graph analytics to find bottlenecks and achieve operational excellence.” Luke Slotwinski VP, Data & Analytics, Prologis
“We are thrilled to have the opportunity to innovate with data.world. By taking full advantage of the knowledge graph capabilities of data.world’s data catalog, we are able to accelerate metadata enrichment and recommend complementary datasets, inspiring the creative uses of data.” Vip Parmar, Global Head of Data Management, WPP
The data catalog has a quite full set of functionality, including a business glossary, connectors to capture metadata, data discovery tools, data lineage and an AI context engine. There is also some data quality and observability functionality, via their acquisition of Mighty Canary in May 2023. At the heart of the product is a semantic model and knowledge graph architecture.
Users are presented with a shopping-like experience, with data assets grouped in collections of related material e.g. there might be a collection of data around “customer information”. The product has a full search and discovery capability with keyword search, and can also show the data quality scores of the data being presented, as well as the sources of that data, and relationships of data to other data. This search capability can be embedded via an API in other tools, so for example, a user of Tableau could invoke this discovery capability without leaving Tableau. The meaning of phrases like “net revenue” can be accessed from the business glossary within the tool.
The technology uses proprietary AI to allow users to populate descriptions of data objects. Since this AI has been trained on the internal data catalog, its descriptions are more accurate than a general-purpose public large language model would be. The model can explain itself, including showing the SQL that it generated and tables that it accessed in order to build up the basis for its descriptions. There are also tools to automate the creation and import of metadata, and the completeness of the metadata can be actively monitored.
As well as software, data.world has a built a large open data community with over 2 million users, dedicated to the sharing of publicly available datasets from, for example, governments and NGOs.
data.world has rapidly built up a base of prestigious customers and has achieved a deep level of penetration within many of those customers, something that is unusual within most data catalog implementations. Its pioneering of openly available datasets via its active on-line community is a useful and complementary capability. The technology has a modern, appealing user interface and uses artificial intelligence in a controlled manner in order to improve productivity.
The bottom line
The data.world technology is a modern and differentiated approach to the world of data governance and data catalogs. It has been adopted by some prestigious companies and appears to be widely used within those customers, which is not always the case with data catalog technology. If you are looking for a data governance solution then you should carefully consider data.world as an option.
We use third-party cookies, including Google Analytics, to ensure that we give you the best possible experience on our website.I AcceptNo, thanksRead our Privacy Policy