IBM InfoSphere Discovery
Update solution on May 8, 2013
IBM InfoSphere Discovery was previously known as Exeros before the company known with the same name was acquired by IBM. It is a data profiling and discovery tool. While it has always been technically a part of the InfoSphere portfolio it was originally marketed along with the company’s Optim solution to support data archival, masking and similar functions. However, there is another InfoSphere product that also offers data profiling, known as Information Analyzer. In our view Discovery is much the superior product (because it offers extended discovery capabilities) for processes such as data migration and in supporting master data management (MDM) initiatives. IBM needs to make clear that Information Analyzer is really only suitable in data quality environments where it is simply a question of profiling individual data sources and there is no requirement for cross-source analysis.
Major elements in the Discovery platform are the Discovery Engine and Discovery Studio. The former is the component (or components – you may have multiple engines for scalability purposes) that does the actual process of discovering business rule transformations, data relationships, data inconsistencies and errors, and so on. Where appropriate it generates cross-reference tables that are used within the staging database, it creates metadata reports either in HTML format or Excel, and it generates appropriate SQL, XML (for use with Exeros XML) and ETL scripts (for use in data migration and similar projects). Discovery Studio, on the other hand, is the graphical user interface employed by data analysts or stewards to view the information (both data and metadata – Discovery works at both levels) discovered by the engine; and to edit, test and approve (via guided analysis capabilities) relationships and mappings from a business perspective.
InfoSphere Discovery is an enabling tool rather than a solution in its own right so it is horizontally applicable across all sectors. It will be particularly useful where it is necessary to understand business entities (for example, a customer with his orders, delivery addresses, service history and so on) and process those business entities as a whole. Notable environments that require such an approach include application and database-centric data migrations, master data management and archival.
We have no doubt that IBM has many successful users of InfoSphere Discovery. However, you wouldn’t know that to judge by its web site, which includes just two case studies where customers are using the product – CSX and FiServ – but in neither paper is the use of InfoSphere Discovery discussed; the product is simply listed as one of the IBM products in use.
In addition to providing conventional data profiling capabilities (finding and monitoring data quality issues) Discovery supports the discovery of orphaned rows, scalar relationships (simple mappings, substrings, concatenations and the like), arithmetic relationships between columns, relationships based on inner and outer joins, and correlations for which cross-reference tables are generated. Cross-source data analysis is available both to discover attribute supersets and subsets, and to identify overlapping and unique attributes. In the latter case there is a visual comparison capability that allows you to compare record values from two different sources on a side-by-side basis. In addition there are automatically generated source rationalisation reports that compare data sources to one another. Further features include support for filtering, aggregations and if-then-else logic, amongst others.
There is also a Unified Schema Builder designed specifically to support new master data management, data warehousing and similar implementations that includes precedence discovery, and empty target modelling and prototyping. There are also facilities for cross-source data preview, automated discovery of matching keys (that is, a cross-source key for joining data across sources), automated discovery of business rules and transformations across two or more data sets with statistical validation, and automated discovery of exceptions to the discovered business rules and transformations.
This is what we wrote in our 2012 Market Report on Data Profiling and Discovery: “since our last report into this market IBM has acquired Exeros, which was market leading for discovery purposes at that time. It should therefore come as no surprise that IBM offers the best understanding of relationships of any product that we have examined. For pure profiling capabilities IBM InfoSphere Discovery is good without being outstanding but it is clearly one of the market leaders when discovery is required alongside profiling.” That view remains unchanged: in support of MDM, migration, archival and similar environments Discovery is clearly the leading product in the market.
In addition to the normal sorts of training and support services you would expect from any vendor, IBM offers business services (application innovation, business analytics, business strategy, functional expertise, mid-market expertise), IT services (application management, business continuity and resiliency, data centres, integrated communications, IT strategy and architecture, security), outsourcing services (business process outsourcing and IT outsourcing and hosting), asset recovery, hardware and software financing, IT lifecycle financing and commercial financing.
Related Company
Connect with Us
Ready to Get Started
Learn how Bloor Research can support your organization’s journey toward a smarter, more secure future."
Connect with us Join Our Community