Informatica
Last Updated:
Analyst Coverage: Daniel Howard
Informatica was founded in 1993 as a services company specialising in helping its customers to migrate to a client/server environment. It was not until 1996 that it introduced its first product, Informatica PowerMart, which was followed by Informatica PowerCenter in 1998. The following year the company floated on NASDAQ but in 2015 the company went through a leveraged buyout whereby Permira and the Canada Pension Plan Investment Board acquired Informatica for a price of $5.3bn. Microsoft and Salesforce were also investors. In the year prior to the acquisition the company had revenues in excess of $1bn and net income of over $100m.
Since going private Informatica has not only been aggressive in introducing new products but has also been transitioning away from an on-premises, traditional licensing model to a more cloud-based subscription-oriented approach. In a number of the markets it serves, Informatica is also having to evolve from a company that traditionally marketed itself to IT and technical departments, to one that is more involved at business levels. This is arguably more difficult to do than a transition to the cloud.
Axon Data Governance became part of the Informatica portfolio when the company acquired the original developer, Diaku, in early 2017. In the time since, Informatica has worked diligently to integrate it with other Informatica products, particularly Informatica Data Quality, Enterprise Data Catalog and Secure@Source.
Data Quality with Informatica
Last Updated: 28th March 2024
Mutable Award: Gold 2024
Informatica offers an integrated data management platform, from data access and integration to data governance and catalog, data quality, master data management and a data marketplace. These functional areas share a common AI-driven metadata layer called CLAIRE as well as connectivity to a wide range of data sources. In the data quality area, Informatica’s technology covers the full range of functionality that you would expect to see from a full-function data quality product. The technology carries out data profiling, anomaly detection and potential data duplicates, data validation and cleansing, merge matching and data enrichment. The product is cloud-native and runs on AWS, Azure and GCP, as well as, most recently, the Oracle Cloud.
Customer Quotes
“We’ve seen a lot of return since we’ve implemented the solution. The business is happy and the data is flowing.”
Fauzan Ahmed, IT Manager – Application Development and Support, Marathon Oil
“The vision I describe to my colleagues is that they’ll be able to implicitly trust the data that informs them, no matter where in our organization it comes from.”
Robin Miller, Group Data Manager, Lowell Group
The CLAIRE software engine can automate a wide range of data management tasks, including profiling of data. This software can identify anomalies in data, auto generate data quality rules, apply the data quality rules and suggest corrective actions. This goes beyond basic exception management and includes detecting unusual distributions of data - for example if a data load results in an unusual or surprising number of records. Business users are notified when anomalies are detected. A recent acquisition of a company called Privitar adds metadata-driven policy management.
The CLAIRE engine has been enhanced recently to go beyond the ability to identify potential data failures, generate data quality rules and classify data automatically. Data elements can be classified automatically, data schemas compared and data structures detected and catalogued. One logistics customer was able to use this to automatically associate business terms in 95% of cases in a file of over a million records, saving several months of effort.
A text interface now allows business users to discover and interact with data assets, explore metadata and the relationships between data and create data pipelines. An end user could pose a question in English like “Help me find the datasets needed for creating a customer churn report” or “Explain the lineage of the sales KPI report” or “What are the top viewed reports in our company”? This facility is currently in beta test with 150 customers.
Informatica can support both data fabric and data mesh architectures. They have a partnership with Microsoft to embed Informatica technology within the Microsoft Fabric product as a native application; for example, the profiling of a data table and its associated results appear as a native fabric asset with associated data quality rule definitions and executions.
Improving data quality should lead to improved quality of business decisions, and may also avoid regulatory and compliance problems. Recently, another imperative has appeared. The rise of interest in generative AI has led to many companies wishing to train large language models on their own corporate data. However, the success of implementing such AI models is heavily dependent on data quality since the AI model is only as good as the data that it is trained on. As a wise person said: “Everybody is ready for AI except your data”. Companies are finding that high data quality is a precursor to successful AI implementations.
The bottom line
Informatica has evolved from its ETL roots and now has a broad suite of data management capabilities, from data integration to master data management, from data quality to data governance and more. Its substantial investment in artificial intelligence, starting with the launch of CLAIRE in 2018, is paying off now, with significant new capabilities such as CLAIRE GPT. It competes with established broad-based data platform vendors like SAP, IBM and SAS as well as with pure-play data quality products. Informatica is clearly one of the leading players in the data quality market.
Mutable Award: Gold 2024
Informatica Axon Data Governance
Last Updated: 14th July 2020
Mutable Award: Gold 2020
Informatica Axon Data Governance provides browser-based, business level access to a variety of enterprise grade, highly automated and democratised data governance capabilities. These capabilities are tightly coupled with Informatica Enterprise Data Catalog, Informatica Data Quality and Informatica Data Privacy Management (formerly Secure@Source), which operate downstream of Axon Data Governance but seamlessly integrate with it using shared metadata-driven intelligence, as well as CLAIRE, Informatica’s Enterprise Unified Metadata Intelligence Engine. Taken all together, these products offer a complete, unified, and intelligent solution for data governance.
Customer Quotes
“Having an automated, integrated solution from Informatica is making a difference in our data governance program – because you cannot manage what you cannot see.”
L.A. Care Health Plan
“Informatica helps us tackle data governance and management in new and more effective ways, giving us the tools to win more business and retain our existing customers.”
AIA Singapore
Axon Data Governance allows you to view and manage a variety of business and data assets within a single location, including data sets, business terms, policies, processes, and so on. A contextual graph search is provided, as are automated workflows, approval processes, and a variety of dashboards at both the system and local levels (one of which is displayed in Figure 1).
Assets come equipped with a variety of information, notably including connections with your other assets. This includes explicit associations, applicable rules and policies, impact analysis, and data lineage. For policies, this also includes their position in your overall policy hierarchy. The product thus provides a connected view of your overall system. Relevant stakeholders are also highlighted, including both direct stakeholders as well as the broader stakeholder community, and discussion features are provided to facilitate collaboration. The product also offers a view into your technical metadata, held within Enterprise Data Catalog, and allows you to create data sets within Axon Data Governance directly from that metadata.
Data lineage in Axon Data Governance is business-oriented and available at multiple levels. It is displayed visually, and can be filtered, explored, and so forth on the fly. You can also overlay a variety of metadata, such as data quality and risk, onto your lineage view. This is shown in Figure 2. Corresponding technical lineage is available within the Enterprise Data Catalog.
The product’s data discovery capabilities are quite extensive, leveraging CLAIRE to automatically sort your data into a variety of ‘Smart Domains’, a number of which are provided out of the box. CLAIRE can also be used for intelligent business term association, tagging relevant data assets with business terms based on data discovery rules equipped to each term, and thus connecting your business and technical assets.
For data quality, natural language processing is used via CLAIRE to automatically generate new (or recommend existing) data quality rules based on plain English descriptions of your quality requirements. Data quality checks are automated – in particular, newly ingested data is checked automatically – as is reporting. Several categories of data quality are also offered, allowing for nuanced quality measurements.
Several features are provided to support data privacy and regulatory compliance. In addition to the data discovery described above, the product offers sensitive data discovery and classification across structured and unstructured data, support for subject access requests, and the ability to track policy violations and understand and assess risk. Policies can be managed just as any other asset, and can also be associated to business terms and automatically enforced on correspondingly tagged data assets. Integration with Informatica Data Privacy Management also allows your policies to drive risk analysis and privacy monitoring (for example, alerting) inside Axon Data Governance.
Finally, Axon Data Governance provides data democratisation via Informatica Axon Data Marketplace, a new, embedded feature. The Axon Data Marketplace solution allows data owners to publish their data directly from Axon Data Governance, as well as manage and track access to it. For data consumers, it provides a means to search for data – which is organised into meaningful business categories – and request access to it centrally via a checkout process. Axon Data Marketplace also provides a significant degree of automation, automatically notifying data owners of incoming requests, checking requests against existing policies, and delivering the relevant data when a request is approved.
Axon Data Governance provides a notably broad set of data governance capabilities, which are extended even further by its integration with Enterprise Data Catalog and Data Privacy Management. What’s more, these capabilities are, in general, highly automated, often owing to shared metadata-driven intelligence and other functionality provided by CLAIRE. In effect, Informatica provides a complete and often intelligent solution for data governance.
In fact, separating out business concerns within Axon Data Governance from technical concerns in Enterprise Data Catalog and Data Privacy Management provides its own benefits, by offering experiences and views tailored specifically for business and technical users, respectively. The privacy compliance and risk monitoring provided by Informatica Data Privacy Management, and especially the ability to bring the results of that monitoring back into Axon Data Governance for consumption by your business users, is a particular strength.
Axon Data Marketplace also provides significant advantages. By acting as a centralised ‘one stop shop’ for data approval, it makes it much easier for data consumers to find and request access to the data they need. Likewise, the Axon Data Marketplace benefits data owners by providing a single location to manage access requests, by automating much of the approval process, and by automatically delivering data assets to your data consumers upon the completion of said process. In short, it makes the lives of your data owners and consumers simpler and easier, allowing them to spend more of their time and energy on other concerns, such as creating value from the data within your organisation.
The Bottom Line
Informatica Axon Data Governance is the keystone for Informatica’s intelligent data governance solution. Said solution is eminently integrated and complete, highly automated and scalable, and well worth your consideration.
Mutable Award: Gold 2020
Informatica Data Catalog
Last Updated: 6th October 2023
Mutable Award: Gold 2023
Informatica Cloud Data Governance & Catalog is the data governance offering within the Informatica Intelligent Data Management Cloud (IDMC) software suite that includes an integrated data catalog as a component. It has the full range of functionality that you would expect from such a product. It includes a wide range of data source scanners to catalog metadata from assorted source applications such as SAP, and includes pre-built data classifications, automated data lineage, visual displays such as knowledge graphs to navigate a corporate data landscape, and integrated data delivery with Informatica’s data marketplace.
The product is integrated with the well-established data quality tools from Informatica, as well as the CLAIRE machine learning-driven metadata layer of the Informatica software, of which more anon. The software is cloud-native and can be deployed on hyperscalers such as AWS or Azure.
Informatica bolstered their policy management capabilities in July 2023 with the acquisition of Privitar. This allows them to not merely govern data access rule policies for privacy and security compliance, but directly control and enforce them.
The Data Catalog module sits alongside other Informatica offerings such as master data management and data quality, all based on a common metadata automation layer called CLAIRE. This layer, based on machine learning technology, allows such things as automated data discovery and classification and enables business users to interact with their data language via a natural language interface. This itself rests on a connectivity layer allowing access to a wide range of on-premises and cloud applications.
Metadata is accessible via knowledge graphs, and data can be automatically classified according to recommendations from the machine-learning suggestion of CLAIRE. Technical metadata is automatically given glossary definitions. The software can carry out extensive profiling and discovery of data using the data quality capabilities of the Informatica suite. For example, this might include schema matching and source-to-target lineage, all without the need for manual definition. Business users can customise additional data classifications as needed.
The software has extensive search capabilities including browsing business hierarchies visually, and has a natural language search capability. Business users can preview instances of detailed data to provide immediate insights as they navigate their data landscape. Knowledge graphs are an appealing way to visualise complex data relationships, discover related datasets and explore semantic or usage-based relationships. The product can not only scan metadata from applications such as SAP and databases such as Snowflake and Databricks, it can parse code and business intelligence reports to capture detailed information. Workflow automation can be set up to handle tasks such as approvals, for example, changes of data ownership. The tool can even be used to govern AI models along with associated datasets through a dedicated asset type. This allows the management of AI models, showing who is using the models and tracking the corporate data that the model has been trained on. Collaboration tools allow business users to comment on corporate data in a consumer-like manner, a process that enables greater trust to be built in the quality of data. Diagrams can show data quality scores overlaid on data lineage, monitoring how these change along a data pipeline. Lastly, a data marketplace enables trusted data products to be shopped and delivered with minimal curation required by data stewards through automating how data consumers connect with data assets they need to achieve business outcomes.
Customer Quotes
“Informatica AI and ML capabilities have greatly improved our intelligent data governance processes. As an example, one report which previously took more than 150-man hours to create, now takes a fraction of the time.”
Jacky Cheong, Head of Enterprise Data Governance, Celcom Axiata Berhad
“This is truly a life-saving initiative for us. Informatica enables the entire continuum of care, allowing us to take care of our patients and their families the best way we can.”
Dharam Padhaya, Principal Architect, Data Science and Analytics, Hackensack Meridian Health
“Informatica gives us critical insight to help ensure that we are very accurate and understand the risks we’re underwriting.”
Louis DiModugno, SVP, Chief Data Officer, HSB
Data governance has emerged as a prerequisite for the successful management of corporate data. A decade ago, it was normal for large organizations to leave data quality and ownership to their IT departments, but this approach rarely works, since IT departments do not usually have the political clout to insist on data standardization, for example standardising on a common customer or product hierarchy across a company. Many ambitious master data management projects foundered due to a lack of business ownership of data. With a well-run data governance program, ownership of data is taken by business users, with data stewards embedded within business lines, supported by a data governance committee with business leaders who are senior enough to resolve issues.
Implementing such a data governance program is a great deal easier with a central data catalog to discover data, document business ownership of data, data relationships, policies and measurement of data quality. Data governance tools have emerged over the last decade as a way to support and cement data governance processes. This allows a company to gain greater trust in its data and to be able to confidently deal with regulatory and compliance requirements, while keeping data open for business with data democratization support that accelerates data delivery through a marketplace shopping experience. The Informatica Data Catalog has emerged as one of the major contenders in this important market.
The bottom line
Informatica’s Data Catalog is a capable offering that is cloud-native and has a full range of functionality for data governance within Informatica’s unified Cloud Data Governance and Catalog service, further easily extensible with Cloud Data Marketplace for data delivery on a common platform, IDMC. In particular, it has an exceptionally strong artificial intelligence story, drawing on its previous and long-established machine-learning experience with CLAIRE. Customers looking for a data governance solution should carefully consider evaluating it to see whether it matches their specific needs.
Mutable Award: Gold 2023
Informatica Data Privacy Management
Last Updated: 21st May 2020
Mutable Award: Gold 2020
Informatica Data Privacy Management is a data-centric privacy, governance and security solution that is focused on discovering and classifying sensitive data to understand how it moves around the organisation, where it may be located from a geographical perspective, who owns the data, and which people and processes access that data. In short, to manage privacy risks in a comprehensive, integrated, solution. The product shares a common metadata platform with Informatica Enterprise Data Catalog (EDC) as well as the Informatica Axon data governance offerings. These solutions have many of the same cataloguing capabilities as Informatica Data Privacy Management. Informatica also provides format preserving encryption as well as both static and dynamic data masking, which are used to protect sensitive data. The company also offers encrypted archival capabilities. For consent management, you can use the company’s MDM offering for consent mastering, and Informatica also partners with both OneTrust and TrustArc.
Specifically for discovering sensitive data the product supports relational databases in the cloud or on-premises, applications such as Salesforce and SAP R/3, Amazon S3, ETL processes (limited to Informatica, Microsoft SSIS and Cloudera Navigator and Atlas), file systems and both SharePoint and OneDrive. However it is less of a primary focus area – it is not alone in this – when it comes to unstructured and NoSQL data sources. For example, it is limited to supporting Hive and HDFS at present, though the company plans to support Cassandra in version 6.0 (the current release is 5.1) and BigQuery.
Customer Quotes
“Before we embarked on this journey, we didn’t have a clear view of sensitive data. Now, with Informatica, we can see and manage our entire universe of information. This capability is a game-changer, and it’s enabling us to take a proactive approach to data protection that is helping to strengthen customer trust in our services.”
Financial Services Company
Considered holistically, Informatica’s approach is that you start by creating actionable data privacy policies (integration with Axon – see Figure 1 – from which you can overlay policies into Informatica Data Privacy Management). Then, discover and classify your sensitive data; uncover and map “identities” to data (that can be used to support data subject rights requests under regulations such as the GDPR and CCPA); analyse the risks posed by sensitive data so that you can prioritise your protection plans; protect the data (masking and other techniques); respond to rights and consent requests; and, finally, be able to track and report on all of this.
The facilities for discovering sensitive data, which may be run against samples of the data, if required, are extensive and can be automated using ML and AI. You can match on the metadata via patterns, regular expressions and rules and you can also introspect SQL – both SQL queries and any SQL used for data movement purposes – though not stored procedures. Distance constraints (for example, post code needs to be near city name) can be used and you can define white (always sensitive) and black (never sensitive) lists. In the latest version, the underlying engine can make recommendations about what should be in these lists. The leverage of primary/foreign key relationships is planned for the next release. For unstructured data the product uses AI to look for parts of speech and otherwise relies heavily on the use of reference data. When potentially sensitive data is discovered you can set your system up to automatically agree that this is sensitive or that it is not sensitive or that it needs human validation, according to thresholds that you define.
The use of identity mapping, is interesting because it allows you to support rights requests: for example, where is Philip’s data? To discover and map identities, the product uses fuzzy matching and the product ships with various pre-built classification policies such as PCI, GDPR and so forth. This is augmented by support for domains (name, email address and so forth) which are provided out of the box and can be combined.
Finally, we should mention the risk scoring (see Figure 2). In addition to providing risk analytics and key performance indicators there is also risk simulation planning. This allows you to see the impact of using different approaches to say, masking.
While sensitive data isn’t only about personal data, it is issues over complying with new privacy regulatory mandates that are driving the market for sensitive data discovery and the subsequent protection of that data. Informatica is well-known as a market leader in the data management space and this is where its strength lies. The company has very strong credentials when it comes to structured data and it has focused its discovery capabilities in this area, where it has significant strengths. We particularly like the company’s support for managing identities, which makes a lot sense within the context of GDPR, CCPA and similar regulations to determine data access.
Conversely, Informatica has respectable rather than comprehensive capabilities when it comes to discovery in unstructured environments. But, and this is a big but, companies that have focused on discovery for unstructured data tend to have very limited structured capability, typically limited to just Oracle and SQL Server. Most large enterprises are not limited to just these providers which would mean having two different sensitive data discovery solutions which, to our minds, does not represent any sort of solution.
The Bottom Line
Organisations should be aiming to have a single solution for sensitive data discovery that enables a data privacy governance strategy across a global enterprise. Organisations with multiple heterogeneous database implementations, as well as file systems, that they need to secure, would do well to shortlist Informatica as one of only a few companies that offers significant structured data discovery along with unstructured support.
Mutable Award: Gold 2020
Informatica Data Privacy Management – Sensitive Data Discovery
Last Updated: 9th November 2022
Mutable Award: Gold 2022
Informatica Data Privacy Management (see Figure 1) is a solution for enterprise-spanning data privacy, governance and security. Among other things, it offers sensitive data discovery and classification in order to understand how your sensitive data moves around your organisation, where it is geographically located, who owns it, and which people and processes access it. In short, to manage privacy and security risks within a comprehensive, integrated solution.
The product sits within Informatica’s broader data management platform, the Informatica Data Management Cloud, and accordingly shares common metadata, AI and connectivity layers with a range of other Informatica products, including other governance-focused offerings such as Enterprise Data Catalog, Axon Data Governance and Cloud Data Masking. Notably, the platform uses a single data intelligence scan to facilitate data discovery, cataloguing, quality and automation, enabled by CLAIRE, the platform’s AI layer.
Connectivity in general is broad, extending to relational databases in the cloud or on-premises, NoSQL data sources such as MongoDB and Cassandra, applications like Salesforce and SAP S4/HANA, cloud platforms including AWS, Azure and Google Cloud, as well as various file systems and ETL processes. In total, over 100 connectors are provided.
Customer Quotes
“With Informatica, we know we can trust our data and protect sensitive information whether it’s on-premises or in the cloud. That’s critical as we continue our AWS and data modernization journey.”
Aravind “Jag” Jagannathan,
Vice President and Chief Data Officer at Freddie Mac
Informatica’s approach is to first enable you to create actionable data privacy policies. Then, you discover and classify your sensitive data, analyse the risk posed by it in order to determine and prioritise further actions (most likely including masking or other anonymisation methods), carry out those actions, and track and report on all of the above. You’ll also uncover and map “identities” to your data as part of the discovery process that can be used to build a registry of data subjects and thereby respond expediently to rights and consent requests. The results of your discovery process (among other things) is presented in a data privacy dashboard.
The product’s facilities for discovering sensitive data, which can be run against samples of the data if required, are extensive, and can be automated using machine learning and AI. You can pattern match on the metadata (using either regular expressions or a data dictionary) and you can introspect SQL – both SQL queries and any SQL used for data movement purposes – though not stored procedures. Unstructured data is supported via NLP (Natural Language Processing) followed by the same sort of pattern matching.
Data is examined in its context, meaning that data that is only contextually identifying (which is to say, when combined with other information) will still be flagged. Proximity matching – in other words, distance constraints – can be used (for example, post code needs to be near city name) and you can define white (always sensitive) and black (never sensitive) lists. CLAIRE will make automated recommendations about what should be in these lists.
For unstructured data the product uses AI to look for parts of speech and otherwise relies heavily on the use of reference data. When potentially sensitive data is discovered, your system will either automatically agree that it is or is not sensitive, or decide that it needs human validation, according to configurable confidence thresholds. Discovery can also be actioned on images and documents (PDFs, for example) via optical character recognition, as well as compressed files and Outlook 365 emails (including attachments).
As mentioned, identity mapping supports rights requests, such as locating all of a given customer’s data. The product uses fuzzy matching to facilitate this, and it ships with various pre-built classification policies such as PCI, GDPR and so forth. This is augmented by out-of-the-box domain support (name, email address and so forth) as well as a handful of response templates for rights requests.
Finally, we should mention risk scoring. In addition to providing risk analytics and key performance indicators, including proliferation and user activity analysis on sensitive data, there is also risk simulation planning. This allows you to see the impact of using different approaches to protecting your sensitive data. Policy-driven alerting is also included, and can be used with sensitive data. Likewise for automated workflows.
Informatica is well-known as a market leader in the data management space, and has strong credentials in that regard. This certainly shows in its Data Privacy Management product, which offers a broad suite of capabilities that only get broader when you consider it as part of the company’s overall portfolio. This breadth is always an advantage, but sensitive data discovery particularly benefits because of how much it works in concert with other technologies – masking, policy management, and so on – to achieve the desired outcome of protecting your sensitive data and achieving regulatory compliance. We particularly like its support for managing identities, which makes a lot sense within the context of GDPR, CCPA and similar regulations.
Moreover, Informatica is a proponent of the idea (and in this case we wholeheartedly agree) that data privacy can be about more than just compliance. Rather, you can derive real business value from your privacy efforts. For sensitive data discovery, you can make headway in terms of the visibility and accessibility of your data, thus generating significant quantities of actionable data intelligence: recall that Informatica does not scan for sensitive data particularly, but rather for data intelligence generally. In turn, this can be used to examine and improve data quality, enable analytics, and so on.
The Bottom Line
Informatica offers a catch-all, value-oriented proposition for data privacy in general, and for sensitive data discovery in particular, that we find very appealing. It is well worth adding to your shortlist.
Mutable Award: Gold 2022
Informatica Intelligent Data Platform
Last Updated: 3rd October 2022
Mutable Award: Gold 2022
Informatica Intelligent Data Management Cloud is a comprehensive, AI-powered data management platform that supports a broad range of use cases. It offers simple, no-code solutions that can scale up in complexity to meet the needs of the user, and allows ETL developers and data engineers, scientists, and analysts to ingest, consume, transform, and prepare data using a drag and drop interface. Its capabilities encompass data ingestion, data integration, data replication, data quality, data governance, master data management, data cataloguing, data privacy, application integration, API management, a data marketplace, and more. All of these capabilities are underpinned by unified metadata management and connectivity layers as well as CLAIRE, Informatica’s AI and machine learning intelligence engine, which drives automation and guides user experience throughout the platform.
The platform is cloud-native, supports multi-cloud, hybrid and serverless deployments, and leverages microservices and APIs as core parts of its architecture. Its connectivity layer provides a common interface for a variety of connectivity options, notably including CLAIRE-enhanced ‘rich’ metadata, and over 220 native connectors are available. It also features enterprise-grade security and performance.
Customer Quotes
“Informatica Intelligent Cloud Services allowed us to meet much quicker timelines and achieve our goals as a marketing team without significant development effort.”
Lenovo
“The power of Informatica is that we can do ETL, ELT, and mass ingestion at scale, all with a single integrated cloud platform, to deliver trusted information to our data consumers.”
KLA Corporation
Cloud Data Integration (CDI), Informatica’s data integration capability, leverages CLAIRE and the connectivity layer as you would expect. It supports ETL and ELT processes, with data transformations available via a drag and drop interface alongside CLAIRE-driven recommendations. It also provides a unified, wizard-driven interface for ingestion and replication, alongside real-time job monitoring, lifecycle management and alerting.
CDI uses elastic scaling to ensure resiliency and high availability, and serverless deployment is available to minimise infrastructure management. In addition, Informatica Advanced Pushdown Optimization supports ELT in the cloud, wherein transformation logic is converted into native commands and pushed to the source and/or target databases. This accelerates data processing without incurring any additional ingress or egress charges. Moreover, Figure 1 shows further options that Informatica provides for cost-performance optimisation.
The platform provides a variety of features to further automate your data integration. This includes intelligent field mappings, automated structure discovery/schema inference, automated entity creation based on documented (and/or discovered) relationships, pipeline and expression auto-complete based on classifications and other metadata, auto-tuning and auto-scaling at runtime, auto-healing for resiliency, and more.
Capabilities for data pipeline orchestration, programmatic pipeline creation, and pipeline/workflow scheduling, tasking and monitoring are also available. You can reach out to the broader Informatica portfolio to incorporate data quality and governance into your pipelines, and validation is available alongside a degree of support for automated testing. Pipeline monitoring is provided, including resource monitoring (such as spike detection and alerting) and operational analytics. ModelServe (shown in Figure 2) and INFACore, two recent additions to the Informatica platform, also exist specifically to make it easy for data scientists and engineers to build and deploy machine learning (ML) models and compose data pipelines, respectively, in an automated fashion.
Cloud Data integration is supported by Informatica Data Loader, a standalone, self-service data loading offering designed for easy onboarding and a straightforward, wizard-driven user experience. It uses a freemium pricing model alongside a flexible payment scheme that stretches across all of Informatica’s services, and it is available on AWS, Azure, Google Cloud, Snowflake, and Databricks. It features automatic schema drift detection and supports a variety of data sources. It is suitable for initial and incremental loading, and going from setup to starting to load data takes only a few minutes.
Informatica also offers a real-time streaming data ingestion and processing solution with support for high-volume, low-latency stream data integration. In addition, the platform provides a wizard-based data ingestion and replication service - Cloud Mass Ingestion (CMI) – that is compatible with various data sources, including databases, applications, streams, files, and so on. This includes support for Change Data Capture (CDC) and IoT. Moreover, CMI allows you to ingest in bulk or incrementally, and can be used to enable real-time analytics consumption. It can also automatically detect schema drift at the source and replicate incremental changes to the target.
Informatica’s data management platform offers a substantial breadth of capability. Moreover, the individual solutions within it are generally of a very high quality, and in fact we would consider many of them to be best of breed in their own right.
To wit, it is highly suited to the cloud, which for data integration specifically manifests as – for example – pushdown optimisation to various cloud vendors as well as cross-ecosystem pushdown capabilities for workloads utilising multiple clouds simultaneously. It also features extensive automation that is present throughout the platform in general and its data integration capabilities in particular. What’s more, the shared capabilities underpinning the platform – the metadata management layer, the security layer, the connectivity layer, and, most of all, CLAIRE – add significant value, ensuring that each individual solution is well-integrated into a greater whole and enabling the platform itself to be more than the sum of its parts.
Finally, the platform is designed, in part, to democratise data management by enabling collaboration and self-service. This certainly applies to Cloud Data Integration, which is equipped with easy-to-use interfaces and collaboration features like version control integration. Data Loader in particular is notably straightforward to onboard with, thanks to its simple interface and freemium pricing model. It also places no limit on the amount of data users can load.
The Bottom Line
Informatica Intelligent Data Management Cloud is an exceptional solution in many ways. Via Cloud Data Integration (and assisted by Informatica Data Loader) it is more than capable of providing a robust, cloud-native, and highly automated solution for data ingestion, data replication and data integration.
Mutable Award: Gold 2022
Informatica MDM
Last Updated: 25th September 2014
Informatica, a leader in data integration, has strong offerings for master data management. Its flagship product was originally based on the Siperian customer data integration technology, but is now a full multi-domain MDM product. Informatica has strengthened its product data capability through the acquisition of Heiler, a specialist in the mastering of product data.
Informatica MDM is noted for its high performance and scalability for high volume customer data implementations in particular, but it has a broad range of functionality, including support for data governance. It has for, some years, had some of the leading data quality technology on the market, a key element of any successful master data implementation. The company offers a broad platform, covering data integration and data quality as well as MDM.
Informatica focuses on large enterprises and public sector bodies. Known for its strong penetration for MDM in the pharmaceutical market, it now has a wide range of master data implementations across a range of industries. Its strong US presence is now complemented by growing customer deployments in Europe and Asia.
Informatica has some very large master data implementations, with customers including Thomson Reuters, UBS and Harrods. It has a large presence in the healthcare industry, particularly in North America, with customers such as Blue Shield.
The Informatica MDM technology has three distinct editions. Its flagship product has a high performance master data hub that is based on relational database technology. Its product data hub currently has a separate database but able to share data with the core product, and there is another technology for their cloud offering, which focuses on Salesforce CRM. These products can co-exist and work together.
Informatica has its own, highly functional, data quality technology. This allows master data records to be validated at source, enriched where needed and avoid data duplication between different customer source systems. Their technology has support for data governance, providing workflow and reports for data stewards. The flagship MDM technology is noted for its high scalability, and has some of the largest production implementations in the market of high-volume customer master data.
Informatica has a substantial services organisation, which for example has an offering to assist customers with building a quantified business case for MDM. They also partner with a wide range of systems integrators, both global and local, in order to ensure that customer implementations are successful.
Downloads
Informatica Stream Processing
Last Updated: 14th December 2021
Informatica offers stream processing as part of Intelligent Data Management Cloud (IDMC), a broad, cloud-ready, and well-integrated data management platform. IDMC also includes a number of other services, a unified metadata and AI layer (CLAIRE), and over 10,000 metadata-aware connectors that cover all three major public clouds (among other things). In addition, Informatica has partnered with Datumize in order to maximise its compatibility with IoT. The platform also offers comprehensive cloud capabilities and is available on a consumption-based pricing model.
IDMC provides a single architecture (and user experience) for data ingestion – whether via streaming, batch, or whatever else – that leads into solutions for stream processing and data integration (see Figure 1). Features relevant to streaming include real-time ingestion, mass ingestion, automated handling of schema and data drift, and a Kappa messaging architecture.
Customer Quotes
“Informatica Cloud Mass Ingestion allowed us to generate hundreds of mappings in a very short time. It’s a straightforward, secure bridge from source to target, which is exactly what we need. We don’t require a VPN in order to maintain data security.”
University of New Orleans
“Informatica Cloud Mass Ingestion is so easy to use that it saves us 90 percent of the ETL effort. I can just open a browser and access it anytime, anywhere.”
University of New Orleans
Informatica’s solution for stream processing consists of several different products and services that combine within the singular platform of IDMC. The backbone of this solution consists of three services: Cloud Mass Ingestion, High Performance Messaging for Distributed Systems, and Data Engineering Streaming. Respectively, these provide streaming ingestion, high-speed messaging, and stream processing. Other Informatica offerings, such as Cloud Data Integration and Enterprise Data Catalog, can then add to this core. Taken as a whole, IDMC lets you ingest streaming data and move it to wherever it needs to be, while processing, transforming, and governing it as necessary.
More specifically, Cloud Mass Ingestion provides format-agnostic data movement and mass data ingestion, including file transfer, CDC (Change Data Capture) and exactly-once database replication. It also offers mass streaming ingestion from a variety of sources, complete with real time monitoring, alerting, and lifecycle management.
High Performance Messaging for Distributed Systems is what it says on the tin: a performant messaging system boasting “ultra-low latency”, targeted at distributed systems. In addition, it provides high resiliency and guaranteed message delivery. Alternately, you could use Kafka, or another messaging service, via the connectivity options that Informatica provides. For instance, Cloud Data Integration can gather and load in batch data from Kafka directly, with an “at least once” delivery guarantee. Enterprise Data Catalog can also scan Kafka deployments in order to extract relevant metadata (message structure, for instance).
Finally, Data Engineering Streaming is a continuous event processing engine built on Spark Streaming that is designed to handle big data. It supports both batch and streaming data, and it features out of the box connectivity to various messaging sources, as well as no-code visual development (shown in Figure 2). As part of the latter, it provides hundreds of prebuilt transformation functions, connectors and parsers. You can also pipe in your own code, or build your own functions and whatnot using Informatica’s business rules builder. Essentially, it allows you to enrich your streaming data in real time. This could mean improving data quality, masking sensitive data, aggregation, or what have you.
IDMC also supports Spark Structured Streaming, which can be important if you want to aggregate streaming data based on event time (not processing time) and hence reorder data that has arrived out of order before delivering it to your data target. It also supports Confluent Schema Registry, which can be used to parse Kafka messages, retrieve message structure, and handle schemas as they change and grow.
Moreover, you can use CLAIRE to augment your solution with machine learning. For example, to automatically detect and generate schemas. This is particularly beneficial in that it allows you to discover and rectify data (and schema) drift. Informatica also provides a ready-made integration pipeline for data science and machine learning which helps you to apply machine learning and AI models to your streaming flows.
In addition, Informatica is keenly aware of the need to govern streaming data, and the company’s suite of data governance products are available for this purpose. Data cataloguing, preparation, discovery, lineage, and visualization are all available, and have been designed to promote self-service and collaboration. Security features, like masking, authentication and access control, are also available, and real-time job monitoring, analytics, and visualisation are provided via the Operational Insights service.
Informatica provides a high-quality, comprehensive stream processing solution that is positioned as just one part of a much larger, and broader, integrated data platform. Moreover, it is highly compatible with the cloud, and includes native cloud ingestion; it provides a broad range of connectivity, exemplified by the sheer quantity of connectors and scanners provided; and it offers a unified user experience, regardless of whether you’re deploying in-cloud or on-prem, which data sources you’re using, whether you’re using it for streaming or batch processing, and so on. The latter is particularly important, in that it allows you to abstract out much of the underlying complexity of stream processing, thus enabling your business users to work that much more efficiently and effectively.
We are also impressed by IDMC’s ability to transform streaming data in real-time, especially regarding its substantial number of built-in transformations. This is an area where Informatica’s breadth of capability really shines, by allowing you to combine high-end data quality and masking with stream processing. Moreover, Informatica’s data governance and security solutions combine with streaming in much the same way, with similar benefits.
The Bottom Line
Informatica Data Management Cloud offers highly capable and well-integrated stream processing as part of its overall data management functionality.
Informatica Test Data Management
Last Updated: 12th July 2021
Mutable Award: Gold 2021
Informatica Test Data Management (TDM) is Informatica’s solution for test data management, that it bundles as Secure Testing. It offers data subsetting, static data masking, and synthetic data generation, as well as easy access via a test data warehouse and a self-service portal. It fully supports the cloud and is available on all major cloud providers, and it can act on a range of data sources, including relational, NoSQL, cloud and mainframe databases as well as flat files.
Moreover, TDM is only one of a number of Informatica products that collectively provide a holistic solution for data governance, privacy, and protection. Conversely, the broader Informatica ecosystem enriches TDM: for example, it allows it to leverage CLAIRE, Informatica’s shared metadata and AI layer. TDM also exposes a range of REST APIs for integration with third-party software.
Customer Quotes
“We are shrinking clients’ development cycles by working with smaller sets of test data, and lowering IT costs through the use of smaller data sets that require less storage and fewer system resources.”
Cognizant Testing Services
The product supports a variety of methods and options for data subsetting, data masking, and rules-based synthetic data generation. Data masking in particular is policy-driven, complete with out-of-the-box masking policies that support PCI, PII and PHI compliance. Masking works with structured and unstructured data, is federated to ensure consistency across multiple datasets (including datasets in multiple locations, such as on-prem and in cloud), and can leverage encryption that can be reversible or irreversible, and format or metadata preserving, as required. For auditing purposes, you can also generate compliance reports that show exactly how much of your sensitive (test) data is masked.
Furthermore, there are multiple ways to accelerate your test data provisioning within this environment. For instance, the product includes a self-service portal (see Figure 1) that allows test data admins to publish test data sets for consumption. Testers can provision any data sets that are made available to them, and can modify, subset or copy them locally as they will, allowing them to customise their test data for their specific testing requirements. Data within the portal can also be tagged to make it easier to search through. Somewhat more primitively, you can distribute data to your testers via parameterised test data plans, allowing your testers to fill in the plan’s parameters whenever they need relevant test data, and receive it accordingly.
On the other hand, you could integrate provisioning into your existing test automation workflows, perhaps by leveraging the product’s test data warehouse. This is an additional capability that can be used to store, and subsequently provision, up-to-date test data on-demand without troubling your production environment. You can also use it as a baseline to reset your personal testing environment against, allowing you to experiment as you wish without negatively impacting either other testers or your own overall testing efforts. It can even link into your DevOps environments, CI/CD pipelines, and test automation workflows, thus automating your provisioning. You can also use it to contribute to the solution’s self-service capability by publishing data sets in the warehouse directly to the portal.
In addition, the product provides a number of neat visual capabilities, including graphical test data coverage and an entity view while subsetting. The former in particular allows you to see whether you have sufficient data to achieve the level of coverage you desire. This is shown in Figure 2.
Finally, TDM can discover (and hence mask) your sensitive data using several methods. This includes domain and pattern matching, dictionaries, algorithmics, and AI/machine learning. These discovery capabilities are extended by Informatica Data Privacy Management, an additional product that provides enterprise-wide discovery and classification of sensitive data, DSAR reporting, and continuous multi-factor sensitive data risk monitoring with AI-enabled analytics. In particular, Data Privacy Management also tracks the lineage of your masked data and allows you to see where your data has been masked as it flows through your organisation. It is also integrated with TDM’s masking functionality, allowing you to, for example, automate a masking process in TDM from within Data Privacy Management to support risk remediation policies.
TDM is an outstandingly broad test data solution. It offers functionality that covers almost all aspects of test data management, which consequently enables it to address a wide variety of use cases. Its capabilities include data subsetting, static data masking, synthetic data generation, data discovery, test data provisioning, self-service, and format-preserving encryption. Moreover, it is capable in all of these areas, and you could reasonably argue that it is best-of-breed in several.
Standout features include visual test data coverage, the test data warehouse, and the self-service portal. The ease of use, reuse, enhanced collaboration, and reduced dependency on admins that the latter two provide are particularly notable as boons to your testers’ productivity.
Even then, TDM is still only one solution among many available from Informatica, and the breadth of that ecosystem – even if you only consider privacy and governance – is its own advantage, and one that TDM benefits from substantially. For example, shared metadata can be used to enable a far more comprehensive approach to data protection and transparency. There is not enough space on this page to go into the finer details, but suffice it to say that when it comes to Informatica, the whole is greater than the sum of its parts (and the parts themselves are highly capable to begin with).
The Bottom Line
TDM proves very competent in almost every area of test data management, with a particular penchant for robust self-service and expedient test data provisioning. There is little reason it shouldn’t be on your shortlist, especially in the context of building out more complete data governance and privacy operations.
Mutable Award: Gold 2021