Apache Kafka enables its users to process data in-stream. It can combine stream processing with historic data, and it provides a Kappa architecture capable of processing all data in a single environment. Several performance-oriented features are provided, including “log compaction” (a Kafka exclusive), partitions to support parallel processing, and both offset and timestamp indexes.
By default, Kafka supports applications written in Java and Scala. Confluent adds development support for other languages, including C, C++, Go, Python, and .NET. Confluent Connect provides a wide range of connectors – 120 are currently available, a substantial number of which are fully managed – and Confluent Control Center offers end-to-end monitoring and alerting. A range of operational capabilities are also provided, including automated load balancing and replication capabilities that support multi-data centre Kafka implementations.
Confluent Stream Governance, a recent addition, is a data governance suite designed specifically for governing streaming data. It includes Confluent’s Schema Registry, a longstanding capability that acts as a central repository for the format of Kafka data, as well as a streaming data catalogue. Data discovery and data lineage capabilities are also provided. This is both a notable addition and a significant differentiator, and it will likely prove particularly relevant for managing your streaming data as streaming technologies become increasingly pervasive and widespread. In previous years, we have praised Confluent Control Center for adding manageability to Kafka: Confluent Stream Governance builds on this to the nth degree, raising the platform to another level in this regard.
It is also worth noting what Confluent means when it claims to be “cloud-native”. In short, it is fully managed by Confluent, serverless, and takes full advantage of cloud capabilities, such as elasticity, scalability, and practically unlimited storage. Operational processes like deployment, scaling, and so on, can be taken out of your hands entirely, the end result requiring no self-management whatsoever. This provides all of the attendant benefits of extensive automation – speed, efficiency, consistency, and so on – and is only made possible by the improvements Confluent has made to Kafka, including separation of compute and storage, automated data balancing, integration with Kubernetes, and more.
Confluent is available internationally through the three major public clouds (Amazon Web Services, Microsoft Azure, and Google Cloud Platform) and, moreover, it can operate (and extend streaming applications) across multiple clouds, running in and across cloud environments as a persistent bridge in a fully consistent and real-time manner. You can also link multiple Confluent clusters together via cluster linking, which can be used to, for instance, connect on-prem and cloud streaming environments. Finally, it is deployable alongside a wide range of environments and data sources. This is in addition to its substantial partner network, and the connectors mentioned above.