skip to Main Content

Data Integration

Last Updated:
Analyst Coverage:

Data integration is a set of capabilities that allows data that is in one place to be moved to another place, frequently via a data pipeline (although for clarification’s sake, we should mention that – unlike, say, the Windows move operation – this does not generally remove or alter the original data). This is more complicated than it sounds. For instance, owing to different requirements in your source and target systems, the metadata surrounding your data will often need to change even as the data itself remains essentially the same. Hence you will often have to both physically copy your data to the target system and transform its metadata into a form suitable for its new location.

This is usually accomplished by applying a set of transformations to your data, and its associated metadata, during the data integration process. ETL (Extract, Transform, and Load) is the most widely known of this type of methodology (in fact, ETL is probably better known – at least as a term – than data integration itself). Lesser-known alternatives include ELT (Extract, Load and Transform) and ETLT (Extract, Transform, Load, and Transform again), with the major difference between these methodologies being – as you might suspect from their names – at what point (or points) in the process the transformations are applied. This has various architectural, performance and security implications, but suffice it to say that all three have their place. There is also Reverse ETL, a process wherein data is extracted out of your data warehouse and placed back into your other (operational) systems, enriched with additional context and insights and ready to be actioned upon by your users.

Other types of data integration include data replication, CDC (Change Data Capture) and associated techniques, which essentially just copy data without transforming it (but with other benefits instead). For example, CDC is frequently used for monitoring data sources for new or changed data, capturing those changes, then forwarding them elsewhere for processing and enabling change propagation. This allows you to keep downstream systems in sync with your data sources in (near-)real-time. Of course, it is also possible to combine CDC with ETL (or ELT, or ETLT) if transformations are required as part of this process (or vice versa).

There are multiple use cases for data integration. For instance, data migrations (particularly cloud data migrations) or moving data from an operational database to a data warehouse in support of data analytics. In the latter case, data warehouse automation tools extend this capability by understanding the relevant schemas and helping to automate the creation of said warehouses. Moreover, they typically use replication and/or CDC in order to ensure that the target system is kept up to date, in a good example of what we were talking about at the end of the previous paragraph.

Note that for analytics in particular, data virtualisation provides a viable alternative to data integration by allowing you to query data that exists in multiple physical locations as if it sat within a single data warehouse. This means that you do not need to physically move it, which tends to greatly simplify things. Of course, data virtualisation cannot be applied to data migration and similar use cases, where the movement of data is not an incidental difficulty but rather the entire point, making it somewhat niche despite its advantages.

Data Integration tools first started to appear in the early 1990s. In other words, the market is more than 30 years old. Needless to say, in terms of its basic capabilities, the market is mature. However, that does not mean that it is static. Indeed, in this report we seek to highlight the data integration vendors that we consider best-of-breed.

Data integration has seen some significant development over the past few years. There is now a significant focus on ease of use; for instance, it is very much the standard for solutions in the space to offer low- or no-code, visual, and often automated data pipeline and/or transformation building. Natural language interfaces (typically AI-driven – see below) are also fairly common, to the same ends. Self-healing pipelines and automated schema detection and management are increasingly common and play into this aspect as well, creating more resilient data pipelines that require less user effort to maintain.

It is typical for data integration products to support a wide range of connectivity, in terms of both data sources and targets, with dozens – if not hundreds – of ready-made connectors being the norm. Cloud coverage is ubiquitous, as is support for unstructured data. Some degree of API support is also practically guaranteed, although to a greater or lesser degree depending on the specific solution. Most solutions offer you a choice of ETL or ELT (although they may bias towards one or the other), and reverse ETL, CDC, and streaming capabilities are all fairly widespread. It is also increasingly common for data integration solutions to incorporate data governance, profiling, and/or quality functionality into their processes, either natively or by integrating with third-party products.

One of the more recent and novel market trends for data integration (indeed, for data in general) is the adoption of generative AI. For example, products in the space will often provide capabilities to help you feed or otherwise incorporate an LLM (Large Language Model, the basis for a generative AI solution) into your data integration processes, or to enable RAG (Retrieval-Augmented Generation, a technique for adding business context to an otherwise generic LLM). It is also increasingly common for data integration vendors to directly offer generative AI-driven capabilities within their solutions. For example, as mentioned above, several offer chatbot-style interfaces that allow you to interact with them using natural language (to create data pipelines, say).

In terms of market movement, there have been two notable acquisitions within recent memory. First, Talend – formerly a data integration vendor – was acquired by Qlik in 2023, and has since been incorporated into its portfolio. Second, Actian acquired Zeenea, a data discovery platform, in 2024, adding said data discovery capabilities to its solution.

Historically, we have divided data integration vendors into platform and point solutions, referring to their broad scope or narrow focus, respectively. This is no longer entirely appropriate: rather than a relatively clean division into two categories, what we have now is a continuum of products of different breadths. While we still have massive platforms on the one hand, there are few genuinely “pure-play” data integration solutions left: rather, almost all of the products featured in this report provide capabilities substantially beyond data integration. API and application integration, document processing, data virtualisation, data warehouse automation, data quality, and data governance are just some of the extended capabilities offered by the included data integration suppliers. Accordingly, your choice of supplier may depend just as much, if not more, on what it provides in addition to data integration as it does on its core data integration capabilities.

It may be illustrative to focus in on one particular aspect: AI, and more specifically, generative AI. Like product breadth, this is a spectrum, with every vendor investing in it to some extent. That said, it is clear that some vendors have invested more than others: in no particular order, we have identified Actian, Astera, Gathr, Matillion, Striim, and Informatica as forerunners in the adoption of, and support for, generative AI. Accordingly, if this is an area that interests you, it would be wise to consider these solutions first and foremost.

On a final note, a handful of vendors that we contacted for this report either did not respond to our requests for information, or only provided us with very limited information. This includes Software AG, Qlik, Hevo Data, TIBCO, and – albeit to a lesser extent – Informatica. The latter excepting, we have been forced to exclude these vendors from this report due to a lack of information.

Data integration is a highly mature space, and it shows in its vendors: well-developed data integration capabilities are commonplace, with meaningful differentiation found primarily in secondary areas, such as data quality, data governance, and generative AI. The latter, in particular, has had almost as profound an impact on data integration as it has had on data in general, with several major suppliers taking pains to adopt and/or support it at a deep level, some even going so far as to rebrand their products to more prominently feature it. This is no surprise, as data integration as a space has long leveraged machine learning and other AI technologies to its benefit, and the hype surrounding generative AI, although it may be tapering off, has certainly not died down.

In summary, if you want to invest in data integration – and there are many reasons to do so, from analytics to cloud migrations to generative AI – you can rest assured that each of the vendors featured in this report offers a highly capable data integration solution. To decide which vendor to invest in specifically, our suggestion is to examine what each candidate can do, and what you would most benefit from, outside of just data integration.

Solutions

  • Actian logo
  • ASTERA logo
  • CLOUDERA logo
  • CLOVER DX logo
  • DATADDO logo
  • FIVETRAN logo
  • GATHR logo
  • GENROCKET logo
  • HITACHI logo
  • IBM (logo)
  • Informatica (logo)
  • InterSystems logo
  • IRI logo
  • K2view logo
  • MATILLION logo
  • Oracle (logo)
  • Progress logo
  • Qlik logo
  • SAS (logo)
  • SNAP LOGIC logo
  • Software AG logo
  • Striim (logo)
  • TALEND logo
  • teradata logo

These organisations are also known to offer solutions:

  • Ataccama
  • Axway
  • Dell
  • EntropySoft
  • Microsoft
  • SAP
  • Syncsort

Research

00002890 - Data Integration MU cover (Feb 2025)

Best-of-breed Data Integration

In this report, we discuss the data integration market and seek to highlight vendors that we consider best-of-breed.
ASTERA Data Integration InBrief (Nov 2024 cover)

Astera Intelligence uses AI to drive data integration

Astera provides an easy-to-use data integration solution that is supported by a collection of (generative) AI capabilities.
Actian Data Platform InBrief (Nov 2024 cover)

Actian Data Platform provides integration and quality as a service

The Actian Data Platform offers effective integration as a service, within a much broader data ecosystem solution, alongside robust data quality functionality.
Striim InBrief (Oct 2024 cover thumbnail)

Striim (2024)

Striim offers a unified, enterprise-level combination of continuous data integration, real time CDC, stream processing and streaming analytics.
FIVETRAN InBrief (July 2024 cover)

Fivetran (2024)

Fivetran is a fully managed, cloud-native solution for automated data movement, most notably data integration.
00002852 - DATADDO InBrief (cover)

Dataddo

Dataddo is a fully-managed, no-code data integration platform that offers business users easy access to data integration while still accommodating developers.
SNAPLOGIC Data Fabric InBrief (cover thumbnail)

SnapLogic in the Data Fabric

SnapLogic has data integration capabilities that support data fabric and data mesh.
K2view InBrief (April 2024 cover thumbnail)

K2view Test Data Management

The K2view Data Product Platform is a unified data management platform that offers numerous capabilities, including test data management.
Back To Top