Fig 1 - Matillion ETL
Matillion ETL primarily offers a graphical, low-code development user interface for architecting your ELT processes. This of course includes transformations, as well as cloud infrastructure orchestration. It is deployed – as mentioned, to your virtual private cloud – as one (or more) of a series of virtual machine images, each one tailored for a specific cloud platform. APIs are available for capturing data lineage and other metadata as part of your ELT processes. The product also provides infrastructure management capabilities within the same (low-code) environment.
Under the hood, Matillion ETL works by generating SQL code. It makes extensive use of dynamic variables in order to minimise any required code regeneration (needed, for example, whenever there are schema changes), which is the biggest downside of any code generating tool. Other notable points are that, while Matillion ETL itself does not provide CDC, it does support technologies – such as Snowflake Streams – which provide equivalent functionality. In addition, Matillion Data Loader offers CDC (see below). There is also a built-in job scheduler, and you can define listeners that will trigger jobs when required.
Matillion ETL offers a significant number (90+) of source connectors. These are primarily application-oriented. There are also native connectors (much to be preferred for performance reasons) to most popular relational databases, although the overall number of database connectors is relatively limited. Facilities exist to build your own connectors (using a REST API) to source applications and systems, but such functionality does not currently extend to targets.
That said, one of Matillion ETL’s primary strengths lies in the tightness of its integration with its target environments. While excellent for supported environments, implementing such depth of capability also explains why Matillion ETL does not support building your own, as there is a significant investment in each target: it is not simply a question of building a connector. Matillion ETL supports cloud platforms, including the ‘big three’ of AWS, Azure and Google Cloud, as well as Snowflake, Redshift, and Databricks. Multi-cloud is supported on all of Matillion’s platforms. Support for cloud object storage, such as Amazon S3, is also provided.
Fig 2 - Matillion Data Loader
Matillion Data Loader is a no-code offering for building data pipelines. It leverages an agent-based, hybrid-SaaS architecture – it operates like a SaaS, but your data always remains within your environment – and a freemium, consumption-based pricing model. For most potential customers, it will be most appealing as a fast and easy way to get your data into the cloud. It is even simpler to use than Matillion ETL, and like its sister product it offers a broad range of connectivity options, including compatibility with batch replication and CDC.
On the other hand, it doesn’t have a transformative capability: it is strictly a migratory tool. On the third hand, Matillion ETL itself (and, frankly, most data integration offerings) will be overkill for the sorts of tasks that don’t require transformations. Notably, and unlike Matillion ETL, Matillion Data Loader also offers CDC. This is log-based, and can integrate with your other data integration processes (for example, to trigger downstream transformations).