IBM Watson Knowledge Catalog
Update solution on July 14, 2020

IBM Watson Knowledge Catalog, itself part of IBM Cloud Pak for Data, is a data catalogue and data governance solution that promises to help you deliver ‘business-ready’ data across your enterprise: data that is meaningful, trustworthy, accessible, secure, and of high quality. In other words, data that is ready for consumption. What’s more, it does so through a single, unified governance experience.

Fig 01 – Services provided through IBM Watson Knowledge Catalog
It achieves this by positioning the catalogue itself at the centre of a range of relevant services, including AI-driven metadata curation; automated governance, data quality and policy management; and self-service data access, data preparation, and collaboration. The full extent of these services is shown in Figure 1, and the overall effect is to transform what might otherwise just be a data catalogue into a fully fledged data governance solution.
Watson Knowledge Catalog exposes your data assets for consumption by your users. From a user’s perspective, the product offers search access, asset recommendations, data previews, lineage information, and collaborative features such as commenting, reviews and ratings. In addition, users are able to bring data assets into their own projects, refine and enrich them, then use them to create visualisations or new data sets that can be shared both within and without the catalogue itself.
The product also features data virtualisation, meaning that all of your enterprise data can be accessed in the same way regardless of its physical location. The catalogue thus provides a single view for all of your enterprise data. This effect is enhanced by the ability to bring business assets, such as BI reports and data models, into the catalogue as well.
From the more administrative side of things, role management, role-based views, workflows and approval processes are all supported. Moreover, data assets can, in fact, be exposed to your users through multiple different catalogues that you can control access to individually. The intention is that each catalogue will contain a particular subset of your data assets, allowing you to provide information that is targeted at a particular selection of your users. This makes their lives easier by hiding information that is extraneous to them while improving data security by exposing information only as necessary.
However, before you deliver your data to your users, you will want to be able to curate and govern it. To this end, the product enables you to create and manage a foundational layer of business terms (in other words, a business glossary), rules, policies, data classes (a number of which are available out of the box) and reference data. These can be associated to your data assets, providing the latter with concrete business meaning and context that can be explored and searched on. Moreover, these associations are used to drive the product’s automated discovery and classification services, as well as its data protection rule framework.
A particularly notable example of the former is Watson Knowledge Catalog’s ability to automate the onboarding of new data. When the catalogue ingests new data, it can (at your option) automatically classify it into data classes, identify any common data quality problems within it, and assign business terms to it. This functionality is presently restricted to structured data, although there are certainly other IBM products that will handle discovery on unstructured data.
As for the data protection rule framework, Watson Knowledge Catalog actually categorises your rules into two distinct categories: governance rules and data protection rules. The former are purely informational, and hence primarily act as a way to describe and document your policies in easily understood and concrete terms. Data protection rules, on the other hand, are actionable: they can be enforced automatically by the catalogue wherever they apply. They are created using a rule builder, and are generally used to protect your sensitive data (for example, by masking it) before it is exposed to your users. They apply to data exposed within the catalogue as well as data exported from or otherwise shared beyond it.

Fig 02 – Data quality dashboard in IBM Watson Knowledge Catalog
Watson Knowledge Catalog also uses a variety of different data quality dimensions (such as data duplication, data type violations, and so on) to measure the data quality of your data assets and assign to each of them a data quality score. This is displayed visually within a data quality dashboard (see Figure 2) that can be drilled down into to view more detailed data quality information pertaining to individual data assets.
Watson Knowledge Catalog provides a comprehensive data governance solution built around the core capability of the data catalogue. It offers an impressive amount of automated functionality, with the fully automated discovery and onboarding procedure being particularly noteworthy. The data virtualisation provided by the product is also highly effective, and combined with the aforementioned onboarding process, you can easily ingest numerous data sources in a curated fashion and access all of the resulting data in a single location.
Data protection rules are another notable example of automation in service of governance, and an especially useful one in light of GDPR and other regulations that demand an enhanced degree of data privacy and data security. What’s more, splitting rules into two categories based on their use is a sensible way of making them easier to understand and comprehend as a whole.
The ability to create multiple fit for purpose catalogues may also be useful, especially in large organisations that may have departments with radically different data needs (although it does run the risk of creating or reinforcing data silos).
The Bottom Line
IBM Watson Knowledge Catalog is a mature and highly competent data governance solution with an impressive variety of automated capabilities.
Related Company
Connect with Us
Ready to Get Started
Learn how Bloor Research can support your organization’s journey toward a smarter, more secure future."
Connect with us Join Our Community