Cray Graph Engine
Update solution on January 23, 2019

The Cray Graph Engine (CGE), is an RDF (resource description framework) triplestore database that supports SPARQL as its query language. Extensions to SPARQL have been added to support a variety of specific graph algorithms. Built-in functions (of which there are many) include geo-spatial functions. Both post-graph and pre-graph analysis can be performed using R.
From a hardware perspective the Cray Graph Engine is available on any Cray system that includes the Cray Aries interconnect (that is, the Cray Urika-GX or a Cray XC Series running the Urika-XC suite which includes the Graph Engine). Urika-GX comes as a 42U single rack that uses the Cray Aries supercomputing interconnect. You can have 16, 32 or 48 2-socket Intel Xeon processor nodes with up to 1,728 cores per system and up to 22 TB of memory, between 35 and 176 TB of SSD storage and 192 TB of conventional disk storage. Other storage options are available. Of these elements, clearly the Cray engineering expertise is considerable but from a performance point of view perhaps the most notable feature is the amount of memory available. This is especially relevant for analytics because analytic processing takes place in memory. The Cray Graph Engine is also available for Cray’s XC line of supercomputers, as part of an AI and Analytics suite called Urika-XC. This software-only package contains CGE plus Spark, R and a suite of Deep learning packages. All the included software is highly optimised for the XC50 architecture.
The architecture of Urika-GX is illustrated in Figure 1. As can be seen there is a combination of Hadoop, Spark and graph technologies in a single configuration. There are various points to make. Firstly, an advantage of including Hadoop and Spark is that you can leverage either to transform and load data into the graph database much more efficiently than would otherwise be the case. More generally, Hadoop/Spark are typically used for batch processing purposes. Additionally, Spark can be used for streaming and iterative analytics as well as machine learning, with the graph engine used for graph-related analytics. You can have workflows that span these environments so that different elements of a workflow process can use the technology that is most appropriate for it. Needless to say, you can have multiple processes running concurrently in each mode.

Figure 1 – The Cray Graph ecosystem
The Urika-XC software package for Cray XC50 contains many of the software packages found on the Urika-GX, with the exception of Hadoop and HDFS. However, it adds additional packages for Deep Learningincluding Apache Spark, Intel BigDL, TensorFlow, Cray ML Scalability Plugin, Java, R, Scala, Anaconda Python and Python, Dask and Dask Distributed, and Jupyter Notebooks. More such support is planned.
Cray sees the necessity for supercomputing to underpin graph and artificial intelligence in many environments, and the Cray Urika suites (Urika-GX and Urika-XC) are just one part of its strategy in this space. The product is specifically targeted at the most intractable analytic problems and use cases include cybersecurity, the Internet of Things, machine learning, research into new drugs and new uses of existing drugs, and to uncover risk and compliance issues within financial services environments.
As you might expect, performance is a key differentiator for Cray and it recently completed a trillion triples benchmark. It is not the first company to do so – there have been at least two others – and it is difficult to compare different systems running different hardware. Nevertheless, Cray’s performance for both inferencing and query response was approaching an order of magnitude better than its nearest competitor.
The Bottom Line
There is a perception that Cray – and, indeed, any supercomputing product – is expensive. However, if you have to process hundreds of billions of triples, and do that within a reasonable timeframe, then that is not going to come cheap. In practice, Cray believes that it is competitively priced compared to other offerings – if there actually are any – with similar capabilities. We are inclined to agree with the company.
Related Company
Connect with Us
Ready to Get Started
Learn how Bloor Research can support your organization’s journey toward a smarter, more secure future."
Connect with us Join Our Community