Matillion was founded in Manchester, England in 2011. In the time since, the company has established a second headquarters in the United States as well as an office in Spain.
The company’s first data integration product, Matillion ETL, launched in 2015, while its latest, Data Productivity Cloud, launched in 2022. The latter has joined the ranks of the aforementioned Matillion ETL (cloud-native data integration within a virtual private cloud), Matillion Data Loader (data migration into cloud data warehouses and data lakes), Matillion Hub (a ‘central nervous system’ for managing Matillion services), and Matillion Exchange (a marketplace for community-created assets) in the company’s product catalogue.
Matillion has been described as one of the fastest growing privately owned (in this case, backed by venture capital) tech companies in the UK, and it has more than 1,300 customers worldwide. This includes such household names as Amazon, Sony, Nintendo, Subway, and Cisco, as well as various mid-market enterprises. It also has a significant partner network.
Company Info
Headquarters: Station House, Stamford New Road, Altrincham WA14 1EP, England Telephone: +44 (0)161 938 8038
Matillion Data Productivity Cloud is the most recent offering from Matillion. It is a platform solution designed for building and managing data pipelines in the cloud in service of a variety of use cases, including analytics, AI, and so on. Moreover, Matillion has developed a number of (generative) AI capabilities for the product that should significantly add to its appeal.
Customer Quotes
“Matillion enables our team to provide meaningful data insights quickly. And, because it’s built for modern cloud data warehouses, we can use native Snowflake functionality to transform our data.” Cisco
Fig 1 - Transformation pipeline in Data Productivity Cloud
Like Matillion ETL, Data Productivity Cloud offers a graphical, drag-and-drop, low-code development user interface for architecting your data pipelines. These pipelines can include several stripes of data integration, including ETL, ELT, and Reverse ETL. CDC (Change Data Capture) and RAG (Retrieval-Augmented Generation) pipelines are also supported, the latter of which will be particularly relevant to anyone building a generative AI solution. Pipeline templates and prebuilt transformation components are available, as are a full orchestration layer for managing your pipelines and a library of ready-made connectors. The product also offers “flex” connectors that are available from Matillion on request but can be (re)configured to accommodate additional use cases down the line, and custom connectors that can be created in minutes using a wizard-driven UI. Connectors for unstructured data sources are not currently available but are expected to arrive soon. Git integration is provided. The end result of all of these features is a valiant (and largely successful) effort to offer self-service data pipeline creation and management.
Architecturally, Data Productivity Cloud must be deployed on top of a cloud data warehouse. Specifically, Snowflake (over either AWS or Azure), Databricks, or Redshift. BigQuery is not currently supported, though we are told it is on the roadmap. For use cases not covered by this selection, Matillion ETL is still very much available. Data Productivity Cloud can integrate with the ‘big 3’ cloud service providers (AWS, Azure, and GCP) as well as knowledge graphs, data catalogues, LLMs (Large Language Models) and vector stores. These latter two will, again, be particularly interesting if you want to implement generative AI. The platform can be deployed as either a fully-managed SaaS or hybrid-SaaS solution. In either case, the platform leverages stateless microservices containers as agents – one for each of your data pipelines – either within your network or a managed Matillion environment depending on your chosen deployment method. These agents can be spun up or down as needed, seamlessly, and on an individual basis, making for great scalability and minimal performance overheads. Other architectural features of note include data lineage (leveraging open-source lineage standards) and extensive use of pushdown (such as the recently-added Python Pushdown feature, which allows you to execute Python scripts directly within Snowflake using its Snowpark service).
As mentioned above, Data Productivity Cloud has recently incorporated several features designed to support generative AI. We have already touched on its support for RAG pipelines, as well as its connectivity with LLMs and vector stores. Pipeline components that allow you to submit a prompt to an LLM from within a data pipeline are also available, as is prompt engineering, and the product offers lineage for AI processes, which may prove very important in the coming months and years as AI-oriented governmental regulations continue to spring up across the globe. Moreover, the platform is starting to offer features that directly take advantage of generative AI. At present, this includes a natural language AI copilot (currently in preview) that can help you to create your data pipelines (among other things), and automatic documentation of data pipelines and pipeline components, including the generation of readable business summaries.
Data Productivity Cloud is an advancement over Matillion ETL in several ways, and thus carries over many of the advantages of that product while enhancing them and adding its own. Most obviously, it is purpose-built to sit on top of a cloud data warehouse. But it also makes significant advancements over its predecessor in terms of usability and functionality, architecture, and (most excitingly) AI.
For example, ETL offers a user-friendly interface that incorporates low-code, drag-and-drop techniques for building data pipelines and integration processes. Data Productivity Cloud retains this tried-and-true base while also adding such things as a greater range of available data integration processes, custom (and flex) connectors, and soon an AI-driven copilot.
Architecturally, while the product is currently missing some of the compatibility offered by ETL (BigQuery users only have access to the older product, for instance, at least for now) its heavily distributed deployment approach of using many agents, each matched to an individual data pipeline, has a lot going for it in terms of scalability and performance. It also stands in contrast to some of the older products on the market, which tend towards a single, monolithic, and therefore often inefficient agent.
Finally, in terms of AI Data Productivity Cloud is an obvious step up, providing several options capable of supporting generative AI and even a small handful of features that leverage generative AI themselves. That said, it is still early days when it comes to generative AI – for everyone, not just Matillion – so we expect more and better things to come in the future.
The Bottom Line
Data Productivity Cloud takes what was already good about Matillion ETL and transplants it to a platform that has been built from the ground up to accommodate the cloud data warehouse. At the same time, it adds new features and improves old ones, not least of which by incorporating generative AI. In short, we are very impressed.
Matillion ETL is a cloud-native data integration tool that sits inside your virtual private cloud, although despite its name it operates using an ELT paradigm, which is to say that it transforms the data after, rather than before, it has been loaded into its target environment. A second tool, Matillion Data Loader, exists to facilitate migrations from existing environments into cloud data warehouses and data lakes via the creation of data pipelines. Both platforms natively support all the major cloud platforms, including cloud data warehouses such as Snowflake and Databricks. That said, Matillion Data Loader supports slightly fewer data sources than the main product. The company also offers Matillion Hub, a ‘central nervous system’ for managing your Matillion services, and Matillion Exchange, a marketplace for community-created assets.
Customer Quotes
“Matillion Data Loader allowed us to solve our problem without any changes required to the source and we were live within weeks.” DocuSign
“We chose Matillion due to its cloud-native architecture, ease of deployment, adaptability, and smaller ramp-up curve. We have achieved 5X improvement in our data processing speed.” Pacific Life
Matillion ETL primarily offers a graphical, low-code development user interface for architecting your ELT processes. This of course includes transformations, as well as cloud infrastructure orchestration. It is deployed – as mentioned, to your virtual private cloud – as one (or more) of a series of virtual machine images, each one tailored for a specific cloud platform. APIs are available for capturing data lineage and other metadata as part of your ELT processes. The product also provides infrastructure management capabilities within the same (low-code) environment.
Under the hood, Matillion ETL works by generating SQL code. It makes extensive use of dynamic variables in order to minimise any required code regeneration (needed, for example, whenever there are schema changes), which is the biggest downside of any code generating tool. Other notable points are that, while Matillion ETL itself does not provide CDC, it does support technologies – such as Snowflake Streams – which provide equivalent functionality. In addition, Matillion Data Loader offers CDC (see below). There is also a built-in job scheduler, and you can define listeners that will trigger jobs when required.
Matillion ETL offers a significant number (90+) of source connectors. These are primarily application-oriented. There are also native connectors (much to be preferred for performance reasons) to most popular relational databases, although the overall number of database connectors is relatively limited. Facilities exist to build your own connectors (using a REST API) to source applications and systems, but such functionality does not currently extend to targets.
That said, one of Matillion ETL’s primary strengths lies in the tightness of its integration with its target environments. While excellent for supported environments, implementing such depth of capability also explains why Matillion ETL does not support building your own, as there is a significant investment in each target: it is not simply a question of building a connector. Matillion ETL supports cloud platforms, including the ‘big three’ of AWS, Azure and Google Cloud, as well as Snowflake, Redshift, and Databricks. Multi-cloud is supported on all of Matillion’s platforms. Support for cloud object storage, such as Amazon S3, is also provided.
Fig 2 - Matillion Data Loader
Matillion Data Loader is a no-code offering for building data pipelines. It leverages an agent-based, hybrid-SaaS architecture – it operates like a SaaS, but your data always remains within your environment – and a freemium, consumption-based pricing model. For most potential customers, it will be most appealing as a fast and easy way to get your data into the cloud. It is even simpler to use than Matillion ETL, and like its sister product it offers a broad range of connectivity options, including compatibility with batch replication and CDC.
On the other hand, it doesn’t have a transformative capability: it is strictly a migratory tool. On the third hand, Matillion ETL itself (and, frankly, most data integration offerings) will be overkill for the sorts of tasks that don’t require transformations. Notably, and unlike Matillion ETL, Matillion Data Loader also offers CDC. This is log-based, and can integrate with your other data integration processes (for example, to trigger downstream transformations).
Matillion ETL has been designed to take full advantage of the cloud via features such as native cloud integrations and push-down ELT to major cloud platforms. It is enterprise-ready, enabled in part by its scalability and catalogue support, and it offers a user-friendly interface that incorporates low-code, drag-and-drop techniques for building data pipelines and integration processes.
Moreover, it provides bespoke deployment images for each cloud platform as well as a range of prebuilt connectors, particularly for application-based sources. In addition, the depth of integration offered often goes further than the competition. On the other hand, database and data warehouse support is relatively limited (perhaps due to the latter point).
Matillion Data Loader in particular is a highly fit-for-purpose tool for enabling cloud migrations, one of the more significant data integration use cases at the moment. It is even simpler to use than Matillion ETL, is available as a freemium offering, and is both lightweight and secure (thanks to its hybrid-SaaS deployment model). In other words, it is very easy to start getting value out of it. In turn, because of Matillion Data Loader, Matillion as a whole can address both new implementations of supported data warehouses and cases where companies are migrating to these environments.
The Bottom Line
Matillion ETL offers relatively pure-play ELT, and together with Matillion Data Loader it makes for an effective and easy to use solution for data integration, especially in the cloud.
In this paper, we discuss and compare a selection of prominent data integration solutions, namely those provided by Fivetran, Gathr, Informatica and Matillion.
We use third-party cookies, including Google Analytics, to ensure that we give you the best possible experience on our website.I AcceptNo, thanksRead our Privacy Policy