IBM Streams and Streaming Analytics

Update solution on December 6, 2018

IBM Streams and Streaming Analytics
Mutable Award: Gold 2018

IBM Streams (previously IBM InfoSphere Streams) is a high performance, low latency platform for analysing and scoring data in real-time. It is a part of the Watson & Cloud Platforms. In addition to the main Streams product (currently in version 4.2.x) IBM also offers a Quick Start Edition that is available for free download. This is a non-production version but is unlimited in terms of duration. There is also a cloud-based offering, called IBM Streaming Analytics, that runs on IBM Cloud; there is an optional “lite” version that allows up to 50 free hours usage per month.

IBM Streams is both a development and runtime environment for relevant analytics. In the case of the latter the product will run on a single server or across multiple, clustered servers depending on the scale of the environments and ingestion rates required for real-time processing. There is also a Java-based version developed to run on edge devices, which has been open-sourced as Apache Edgent. This requires a Java Virtual Machine (JVM) but is otherwise very lightweight. It supports Kafka and MQTT (Message Queue Telemetry Transport) as does Streams, and you can push down analytic functions from Streams into Edgent.

Customer Quotes

“Once we had settled on IBM Streams, we were able to plug in the statistical models developed by our data scientists and embark on a rapid proof of concept, which went very well. From there, we were able to industrialize the solution in just a few months.”
Cerner Corporation

“IBM Streams increases accuracy of Hypoglycemic event prediction to ~ 90% accuracy with a three-hour lead time over base rate of 80%.”
Medtronic

“With our partner, IBM, we are leveraging the power of the unstructured and structured data through streaming and cognitive capabilities to position ourselves effectively to meet the needs of our customers.”
Verizon

Typical use cases for IBM Streams involve looking for patterns of activity (such as fraud), or exceptions to expected patterns (data breaches) or to find meaningful information out of what otherwise might be considered noise (six sigma), as well as commercial applications such as analysing how customers are using their cell phones, or to support Internet of Things (IoT) applications such as predictive maintenance.

Figure 1 – Flow Editing in Streams Designer

As stated, the product is both a development and deployment platform. The latter has been discussed. As far as the former is concerned the product primarily supports SPL (stream processing language), which is a SQL-like declarative language. However, for most practical purposes this is under the covers as the product includes an Eclipse-based drag-and-drop graphical editor (Streams Studio) for building queries. Using this you drag and drop operators (which include functions such as record-by-record processing, sliding and tumbling windows, and so on) while the software automatically syncs the graphical view you are creating with the underlying (SPL) source code. Debugging capabilities are provided for those that want to develop directly with SPL. In addition to SPL, Streams Studio also supports development in Java, Python and Scala (via a Java API). SPL will typically outperform Python (for example) as SPL is written and compiled in C whereas much of Python (for example) is interpreted.

Currently in beta, an alternative called Streams Designer offers a web-based environment, which is reputedly easier to use. While the current Streams Studio is usable by business analysts we expect Streams Designer (Figure 1 illustrates flow editing in Streams Designer) to be more popular amongst this constituency.

Figure 2 – Functions and connectivity offered by IBM Streams

Figure 2 illustrates some of the functions of IBM Streams as well as the connectivity options that are available. There are, however, notable capabilities omitted from the figure. In terms of functions these include integration with IBM’s rules engine and the ability to do deep packet inspection. There is also no mention of the Db2 Event Store, which can be used to persist events. Figure 2 also fails to cover support for PMML (predictive modelling mark-up language) for model scoring portability. It is also worth mentioning integration with Apache Beam (via an API), which is a software development kit (SDK) for constructing streaming pipelines. This would be as alternative to using Streams Designer. Finally, but by no means least, IBM Streams is delivered with some twenty pre-built machine learning algorithms. These are typically packaged into toolkits for specific verticals, such as cybersecurity.

The ability to ingest and analyse data in real-time is fundamental to many existing and developing environments. The most commonly cited are fraud applications on the one hand and Internet of Things based applications on the other. However, while IBM Streams is clearly one of the market leaders when it comes to both performance and analytics capability for such conventional capabilities, it has also been extended into areas that other vendors cannot reach. As one example, Streams leverages IBM Watson’s speech to text capabilities (for call centres, for example); as another, IBM is making significant contributions in the medical arena, and not just with respect to Medtronic example quoted. It is also worth noting the internationalisation of Streams, which is available both in single byte and double byte languages.

The Bottom Line

IBM Streams was not the earliest product to be introduced into this market but it is almost a decade old. While modernisation is always an ongoing requirement, the enterprise-class features you require come from the sort of maturity that IBM has in spades.

Related Company

IBM

Connect with Us

Ready to Get Started

Learn how Bloor Research can support your organization’s journey toward a smarter, more secure future."

Connect with us Join Our Community