Data integration in Fivetran has three key facets: prebuilt, fully managed connectors; normalisation of the data you are moving; and provision of analysis-ready schemas for target connectors. Between them, the process of data integration (and data movement) is almost completely automatic. Moreover, Fivetran operations are idempotent, meaning that its pipelines are essentially self-correcting: idempotence prevents the creation of duplicate data when data syncs fail. In other words, data integrity will always be maintained. Idempotency is achieved, in part, because Fivetran will automatically add or remove columns whenever there is a schema change. This is enabled by built-in CDC, which can be utilised with or without log-based updates, and in the former case with or without agent-based connectors, as you prefer.
Fivetran currently offers more than 500 fully-managed connectors that have been purpose-built to support a wide variety of data sources, destinations, and use cases. This includes SaaS applications, on-premises and cloud-hosted databases, file systems, cloud data warehouses, event services, legacy data sources, and more. In particular, the product supports various cloud platforms across the ‘big 3’ cloud service providers (AWS, Azure, and Google Cloud), including Snowflake, Databricks, Google BigQuery, Amazon Redshift, and Azure Synapse. Multi-cloud is also supported, as are data lakes hosted on Amazon S3, Azure Data Lake Storage, or OneLake. Facilities to prevent data lakes from becoming data swamps are available as well. For example, you can use Fivetran to convert your big data into an open table format (such as Apache Iceberg or Delta Lake), making curation and compliance much easier by providing some degree of structure (and the enhanced functionality that comes with it). For otherwise unsupported data sources, you can have Fivetran create custom “Lite” connectors via the company’s By Request program. You can also create your own connectors via the provided Custom Connector Framework, and Fivetran partners have access to an SDK to do the same.
The platform is highly extensible and can be integrated with various third-party tools. Notably, this includes data catalogue integration, which may be especially useful for providing additional visibility into your data. Data visibility is further supported by the platform’s metadata sharing and column-level lineage functionality. In addition, the product particularly targets integration with the cloud, and offers various features to support this, such as compatibility with several cloud platforms and minimised compute usage. Past that, the company’s robust partner network provides access to extensive data management capabilities, including data governance, data cataloguing, data masking, and so on. This is enhanced by Fivetran’s ability to automatically propagate associated metadata during data movement, which is particularly useful if you are, say, using it to feed a data catalogue.
The product provides a robust library of pre-built, SQL-based, dbt data models that can transform, join, and calculate connector-loaded data to fill common reporting requirements. Some of these models can be downloaded and orchestrated within Fivetran directly using Quickstart transformations. You can integrate your own dbt project into the platform to orchestrate and manage any custom data models you might have. With both methods, you can synchronise model-run orchestration with connector loads, reducing data latency and computational costs. This is visualised in a data lineage graph, providing observability. Integration with dbt offers version control, logging, alerting, and various other features. Perhaps most notably, this includes data quality functionality that can be built into your data movement pipelines.
Additional capabilities are available, such as support for stream processing (including integration with Apache Kafka), automatic data updates, and automated schema migrations, management, and drift handling. On the latter point, Fivetran will also standardise your schemas for easy querying and API access (for example, by applying deduplication processes). These revamped schemas are fully documented by the product. It also features integrated scheduling for your data movement jobs, and can set transformations to run automatically whenever data is loaded into your system.
What’s more, security and governance are clear priorities for Fivetran. The product is certified against industry best practices and other regulations, including GDPR, SOC2, ISO27001, PCI and HIPAA; it encrypts all data, both in transit and at rest, and data moved into the Fivetran environment is deleted as soon as its data movement workflow is verified; and it provides role-based access control and authorisation, including automated user provisioning (secured via programmatic controls accessed through a REST API), support for Azure AD (Active Directory), single sign-on integration, and various other features. Moreover, the product’s Connect Card functionality lets your users access Fivetran through third-party interfaces and applications without compromising data security or requiring them to interact with the back-end system. By the same token, you can also use it to effectively white label the Fivetran platform.
Governance and regulatory compliance are further supported by the product’s automatic detection of PII within source data. More specifically, Fivetran identifies types of sensitive data within your connector schema, then proactively protects (via column-level masking or blocking) associated data before it lands in the target environment. Although it is currently in private preview, and right now it only supports North American PII definitions, this is still a very promising feature.