Guavus SQLstream uses ANSI standard SQL to execute continuous queries over arriving data streams, with SQL queries automatically translated into executable processing and analytics pipelines. The design is such that you are encouraged to analyse data as it arrives, then store it if (and only if) you have a use for it. This allows you to reduce storage and maintenance costs, since there will be less stray data floating around your system, while keeping your data stores cleaner and less polluted – and therefore faster and more accessible – without losing any meaning. It also fits neatly with recent compliance mandates, such as GDPR.
Fig 1 - Real-time emergency call analytics dashboard in SQLstream
The platform also supports Java, Python and Scala, in addition to SQL. This is notable in part because many competing vendors require the use of proprietary languages. In addition, the product provides automatic query optimisation, and with performance features such as lock-free scheduling of query execution, has removed the need for manual tuning whilst delivering excellent throughput and latency performance on a small hardware footprint. The architecture supports distributed processing over server clusters, redundancy and recovery options such as high availability, exactly once processing, and it scales well both up to large clusters (as shown in Figure 1) and down to IoT hardware. You can deploy the entire platform through Kubernetes, and you can even run it in-server as an external agent, limiting its functionality but further minimising its footprint. It also exposes a microservice API provided over Web Sockets.
In addition, the product includes a geospatial analytics library, as well as a library of data collection and enterprise integration connectors. This includes support for Hadoop, data warehouses and messaging middleware, among other things, and there is an SDK provided for building new connectors and data processing operators. Native support for operational data issues is also included, for example handling delayed, missing or out of time order data in a way that is invisible to the user or application. The architecture itself is also compatible with Java, Python and C++ plugins.
s-Studio and, more notably, StreamLab, provide you with the tools to interact directly with live data streams and build stream processing and streaming analytics pipelines on top of them. StreamLab, in particular, offers visual, no-code, drag-and-drop tooling for building these pipelines, as well as an intelligent recommendation engine: the Scrutiniser. The Scrutiniser parses your data as it arrives based on a predefined set of rules, then analyses it and makes suggestions accordingly. It can even be applied iteratively to generate fully wrangled data sets with little effort on your part. You can also apply the same kind of analysis to historical data, essentially analysing it as if it was just now entering your system. Live data streams can also be correlated against historical data or augmented with stored data sets to provide even richer analytical capability.
Data output from these pipelines can then be sent to integrated, real-time SQLstream dashboards (again shown in Figure 1), web apps, and external systems or storage platforms. Notably, these dashboards are designed to display streaming data, at speed, with a comprehensible and approachable UI. Although they not as fully featured as some dashboards, they are very well suited for streaming data.