
Fig 2 - Creating a Striim pipeline
Striim provides a web-based, graphical, no-code development environment (shown in Figures 2 and 3) with built-in automation and APIs. You can capture data from many sources – including databases (via CDC, which is both non-intrusive and compatible with past and future versions of any given data source), log files, message queues (including Kafka, among others), events, and so on – then build transformation and analytics pipelines that act on that data continually and in real time, enabling a variety of use cases including streaming analytics, data integration, and, more generally, real-time data delivery.

Fig 3 - Assembling a Striim pipeline
Pipelines are built using Striim’s proprietary SQL-like language, TQL. TQL runs in-memory and, in addition to the filtering, transformation, and aggregation of data, can be used to enrich streaming data with reference data stored in the platform’s distributed, in-memory data grid. There is a facility for users to add extensions written in Java, and the platform supports the import and export of Java analytic models, although not models using R or Python. Similarly, the product does not support PMML (Predictive Modelling Mark-up Language). Tumbling, sliding, and session windows are supported based on record count, time, or data attributes, with additional support for filtering and aggregation, joining streams with historical data, multi-stream correlation, pattern matching, and anomaly detection. There is a library of predictive analytics functions, and streamed data can be persisted on the fly. A wide range of connectors are provided, as is an open processor for extending your pipelines to include third-party code.
This can be used to leverage additional capabilities, such as data quality or governance, within a streaming context. An upcoming version of Striim will also enable partner companies to write their own, custom connectors.
Striim exposes the real-time insights generated by its analytics in various forms, such as alerting (and other notifications) and dashboards (and other visualisations). The latter offers a number of noteworthy features, like the ability to filter real-time data, either by time or by field, and the ability to rewind time-based queries to look at past data. Page- and chart-level searching and filtering are provided, and Striim charts can be embedded into HTML pages.
Striim is highly scalable, performant, and has an elastic architecture, with its distributed execution model combining a continuous query engine, an in-memory data grid, in-memory stream processing, a high-speed, distributed messaging/queuing system, and a results cache built on Elasticsearch. Incoming data streams are shared over the cluster for horizontal scalability, with checkpointing for recovery and restart from the last known good state, providing exactly once processing (E1P) guarantees. It also utilises various integrations with cloud providers, such as multi-threading, that can drive performance.
Much of this serves to reduce latency and support real-time use cases, not the least of which is (generative) AI. In this context, Striim provides real-time embeddings that offer context to each user prompt, allowing for your streams to be enriched with AI-derived insights in real time. Striim intends to further capitalise on this with a series of “AI Insights” features, the first of which (currently in beta) uses AI to discover sensitive/PII data in your streams, then applies existing Striim functionality to obfuscate, encrypt, or tag (so that, for instance, it can be picked up by a third-party tool) that data before it lands in its destination. Matching is based on either regular expressions or Microsoft Presidio, and leverages an OpenAI model of your choice. This dovetails with other Striim capabilities that support regulatory compliance, such as filtering and masking data, maintaining a single customer view that can be used to confirm compliance or detect data breaches, and more general data monitoring, all of which is provided in real time.