Ab Initio
Last Updated:
Analyst Coverage: Philip Howard
Ab Initio is a privately owned software company with a 25-year pedigree. It was established in Boston in 1995, and still has its headquarters there. It boasts a global presence, with offices located around the world and customers in over 35 countries.
The company’s marketing strategy is almost solely focused on using customer recommendations to win new business. This means that its growth as a business depends heavily on generating success for its customers. The fact that it has successfully relied on this business model for 25 years should speak for itself.
Ab Initio Metadata Hub
Last Updated: 14th July 2020
The Ab Initio Metadata Hub acts as the data governance component for Ab Initio’s data management platform. It can be used as either a system of record or a system of reference, is able to govern technical, business, and even logical assets, and provides both business and technical lineage. It also offers data quality and reference data functionality, as well as role and responsibility management. Moreover, the Metadata Hub is closely integrated with Ab Initio’s other solutions, such as Semantic Discovery, which each provide significant additional capabilities.
The Metadata Hub separates your data assets into a number of categories. For data governance, the most important are business assets, technical assets, and logical assets, as well as reference data. In effect, these consist of business information surrounding your data (business assets), the physical reality of your data (technical assets), and logical data models that describe your data (logical assets). Each type of asset can be browsed through at your leisure, and this acts as the first of two primary means to access (and govern) your assets. The second is through one of the two lineage views that the product provides.
Both of these lineage views visualise the movement and impact of data and data assets through your system as a flowchart, and allow you to access your data assets directly by drilling down into them. Where they differ is in the perspective they take on this movement. Business lineage (shown in Figure 1) approaches it with a high-level, logical view, and illustrates how your business assets interact with and are processed by your system from a business perspective. In other words, it focuses on the interactions that matter to the business, without much regard for the physical reality. An initial view of your business lineage can be generated automatically from the relationships between your fields and your business terms (see below), but some manual effort will usually be required to make it suitable for consumption by the business.
Technical lineage, on other hand, takes the exact opposite approach, by examining how your technical, physical assets literally move through your system, how your files, fields and tables interact, and so on. What’s more, technical lineage is generated automatically from your metadata, and said metadata can be imported from a wide variety of third-party systems.
In fact, a sizable range of extractors are provided for importing metadata into the product, both for generating technical lineage and more generally. This includes support for a number of third-party products, including some direct competitors in the data governance space. It’s also possible to write your own extractors using the open documentation that Ab Initio provides. Extractors can be run either through Ab Initio’s UI or via the command line, and can therefore be scheduled by utilising the latter.
The product’s business glossary, as seen in Figure 2, allows you to centrally define and manage your business terms. Using the business glossary, they can be given a number of (configurable) types, such as ‘critical element’, and can be equipped with classifications such as PII (with the latter tying into Ab Initio Semantic Discovery, the company’s sensitive data discovery solution). Your terms can be hierarchical or otherwise related to other terms, as well as your physical data (again accelerated by Semantic Discovery) and other assets. These relationships are also available as a visualisation, which is generated automatically. Role and responsibility management is configurable for each term individually (and can have their own hierarchies), and a configurable workflow approval process is used to facilitate this. Search access to each term is also provided.
Your business terms can also be used to measure your data quality from a business perspective. Data quality rules can be created and added to your business terms, which will then contribute to an associated, user-configurable quality metric (typically examples include ‘accuracy’ and ‘consistency’). Each rule provides a historical performance summary as well as a list of associate terms, assets, and so on. Configurable thresholds drive data quality warnings, email notifications, or other actions if your quality metrics fall too far. Data quality checking can be run manually or automatically via scheduling. Notably, data quality information can be overlaid onto your business lineage to form a data quality heat map, allowing you to visually understand the health of your system. This can be seen in Figure 1.
The Metadata Hub offers a number of advantages as a platform for data governance. For instance, it positions lineage prominently as a part of its solution, and as you might therefore expect, its lineage capabilities are a significant draw. In particular, explicitly providing both business and technical lineage can prove very useful by allowing all of your users to easily comprehend and get what they need out of your lineage. The data quality heat map is also a notable feature. It’s unfortunate that preparing your business lineage for consumption is likely to take manual effort, but on the other hand, generating your technical lineage is completely automated, and can be accomplished using a selection of third-party – and even competitor – products, to boot.
Ultimately, though, the greatest strength of the Metadata Hub is not part of the product itself. Rather, it’s the product’s place in Ab Initio’s entire milieu that makes it so powerful. For instance, it will readily and closely integrate with Semantic Discovery, and hence add full-fledged data discovery and classification (and thereby sensitive data discovery and GDPR compliance) to your governance solution. What’s more, Semantic Discovery is far from the only integration available: Ab Initio offers a wide range of data management solutions, and in many ways the Metadata Hub acts first and foremost to bring those solutions together.
The Bottom Line
Ab Initio is a broad and highly regarded platform for managing your data. The Metadata Hub, as part of that platform, is an excellent way to bring different elements of it together in aid of data governance.
Ab Initio Semantic Discovery
Last Updated: 21st May 2020
Mutable Award: Gold 2020
Ab Initio Semantic Discovery is a data discovery solution offered as part of Ab Initio’s broader data management platform. It provides automated data discovery (including sensitive data discovery) against both structured and unstructured sources (although its capabilities on the latter are somewhat limited). What’s more, it can leverage this discovery process to drive downstream actions and outcomes, such as automated data quality and data masking.
The discovery process in Semantic Discovery begins by profiling your data using Ab Initio’s Data Profiler. This only needs to be done once on a given data set, no matter how many times you run the rest of the process, and Semantic Discovery will also augment this information via a classification process that provides additional analysis that is particularly useful for data discovery.
Next, the product uses four different metrics to test your data: business term matching, a metadata comparison of your field definitions against your business terms, which may include fuzzy matching, abbreviations, and synonyms; pattern tests, which compare your data against a selection of recognised patterns and values, using the classification mentioned above to determine which fields to test; keyword tests, which looks for specified keywords within your data; and fingerprints, which compare your data against known values. Each of these is used to determine the likelihood that a given field belongs to a given business term. They are then corroborated against each other to provide a final estimate, which in turn is used to recommend an action to take on each field: match, where the field unambiguously belongs to a particular business term; recommend, where a probable match has been found, but there is enough ambiguity to require human intervention; ignore, where no related term has been found; and investigate, where the results as a whole are highly ambiguous (for example, if several possible terms were found, but each had only middling likelihood).
The results of the discovery process are accessed via a metadata portal, allowing you to take action on each field that was discovered on. For example, if a match was recommended, you can accept it, reject it, or investigate the field further (and potentially specify a different match). Choosing to investigate provides more information on the field in question, including a clickthrough to the Data Profiler view for that field. Barring investigate, these actions can be done in bulk and trigger approval workflows when instigated.
The metadata portal is not just for reviewing your discovery results. Among other things, it can also be used for role management, activity monitoring, and managing the metrics that are used to discover your data. For the latter, in particular, you can manage your domains, the sets of known data used during fingerprints; your pattern tests, as well as thresholds for determining how prevalent each pattern needs to be within a field before it is recognised; your keyword tests; and finally, your business terms. These are all extensible, are populated out of the box, and each entry within them can be enabled or disabled individually.
Semantic Discovery also provides dashboards for central monitoring of your discovery processes (as seen in Figure 1), a variety of reporting options such as a decision log, and a visualisation of the relationships between your business terms (see Figure 2) with the option to click through to a detailed view for each term. This detailed view also allows you flag a term as PII (of varying levels, if necessary) and define which action to take (if any) when that term is discovered (for example, a masking function). If these options are used, any data that is discovered under that term will be a) flagged as PII and b) actioned on automatically. On the topic of PII, Ab Initio also provides subject access requests via Query>It, its solution for querying on distributed data sources. This is important for regulatory compliance (say, with GDPR).
Semantic Discovery has several qualities that recommend it. It provides a number of different metrics for discovering your data, which can be combined to help minimise false positives. The existence of the investigate and recommend categories provides nuance, allowing the product to request human intervention when appropriate. Investigate, in particular, can act as a way to identify undocumented business terms: if a field is flagged for investigation, it will quite often be because it doesn’t fit into any of your existing terms, which may suggest a new one should be created.
Semantic Discovery also benefits from Ab Initio’s wider product suite, and in particular its close ties to both the Metadata Hub (part of Ab Initio’s solution for data governance) and Ab Initio’s automated masking engine. This is what allows Semantic Discovery to automatically action on your data as soon as it is discovered. Offering this functionality is a significant advantage, and the ability to mask your data immediately after discovery is particularly helpful for protecting your sensitive data.
The Bottom Line
Ab Initio is a broad and highly regarded platform for managing your data. Semantic Discovery, as part of that platform, is an effective means for discovering the data, and particularly the sensitive data, within your system.
Mutable Award: Gold 2020
Ab Initio Test Data Management
Last Updated: 12th July 2021
Ab Initio Test Data Management (TDM) is a test data management application within the broader Ab Initio data management platform. Said platform can be deployed on-prem or in-cloud; operates on structured, unstructured, and semi-structured data; and features a highly portable ‘build once, run anywhere’ architecture. It also offers solutions for several additional areas within data, including data integration, data governance, data quality, and more. What’s more, all of these solutions are extensible and centrally managed.
TDM is presented as a simple, visual flowchart accessed within Ab Initio Express>It (the platform’s web interface, shown in Figure 1) that allows you to choose a data source (which could be anything from a file, to a database, to a data set derived from some other Ab Initio app: anything that the Ab Initio platform as a whole can read), augment it with generated data, and mask, subset and export it to create your test data set. You can also leverage Ab Initio’s data generation capabilities to create all of your test data from scratch. All of this can be done in bulk. We should also note that TDM as shown here is essentially a friendly UI layered on top of the platform’s underlying functionality for masking, subsetting and so on. This functionality is not restricted to TDM, and in fact can be used practically wherever you like within the Ab Initio platform.
Test data generation is accomplished either by manually specifying the fields you want to generate using which algorithm and in what quantity, or by reading in an Excel file and choosing the rows within it that you want to generate data for (again choosing which algorithms to use). In either case, you can add overrides if you want to exert additional control over your generated data, perhaps forcing a field to take a specific value or ensuring that all generated values are unique.
Masking is rules-based (see Figure 2), allowing you to apply out-of-the-box or user-created masking functions to your data. It is static, format-preserving, irreversible, and consistent across multiple systems and platforms. (Reversible) encryption is also offered. Instead (or in addition), you could shuffle – meaning randomly reassign – the values within your data set, possibly according to some constraints to maintain consistency or preserve the original distribution. You can also apply rules after shuffling.
Data classification takes place within Metadata>Hub, the platform’s data catalogue, via Semantic Discovery, the platform’s data discovery solution. It is used to automatically attach appropriate business terms to your data (including terms that indicate it is PII) and thence mask automatically on this basis. Ab Initio also provides access to the Ab Initio data profiler, which can be helpful for deciding how to approach masking, subsetting et al, as well as to validate these processes (particularly masking) once they’ve been applied.
Finally, when subsetting you have the option to create virtual fields (representing curated or amalgamated versions of existing fields) and to highlight any data that must be included or that should definitely be excluded. You then have two methods to reduce the size of your data set: sampling your records, by either a percentage or a count, and/or defining ‘field groups’ that (unsurprisingly) group fields together. Moreover, they allow you to specify the maximum number of different records that should exist in your subset for each unique combination of values within any given group. This is 1 by default, meaning that each record in your subset will necessarily contain (and hence allow you to test) at least one unique set of field group values. In essence, field groups exist to limit the number of conceptually meaningful combinations of fields (and field values) within your test data. This helps to ensure that all important combinations of values are represented and therefore that your test data set itself is meaningfully representative.
The above outlines the process for generating subsets that pertain to a single data source (a database table, for instance). You can also generate test data sets in acknowledgement of multiple sources (say, an entire database) by grouping these individual processes together as part of a subject area. This allows you to specify a root data set and effectively use it as a driver for creating subsets that are consistent and that maintain referential integrity, foreign key relationships and so on across every subset contained within the subject area. It can also be used to preserve bad data (non-matching key relationships, for example) if that is required for testing. This may well be the case, since bad data is a fact of life, and you will want to know whether your system can handle it appropriately.
As a test data management solution, Ab Initio is competent without being exceptional. It provides everything you need – and most of what you could want – in a flexible and easy-to-use package. On the other hand, it lacks some of the more advanced functionality found in its competitor products, and setting up its processes (creating your field groups, for example) is less automated than we might like. That said, TDM is only one part of Ab Initio’s much broader platform, and it performs well in that context. TDM’s masking functionality, in particular, can be leveraged directly using other parts of the platform, such as its data integration component.
Taking a broader view, the Ab Initio platform provides powerful and flexible raw ingredients for a wide variety of applications that are, consequently, highly scalable, performant, and customisable. Test data management is no exception to this.
The Bottom Line
If Ab Initio’s overall offering appeals to you, or if you are already an Ab Initio customer, its test data management capability will more than likely meet your needs. Although we would not recommend TDM as a point product (at least to customers who are not already – and do not want to become – more deeply invested in Ab Initio) we would certainly recommend the platform as a whole.
Commentary
Coming soon.