Sparksee is a property graph database, meaning that it contains entities, relationships, and labels for those relationships. In Sparksee in particular, your graph data is managed and stored via bitmaps. To flesh this out a bit further, your entities – in other words, your data – are stored as normal, while the relationships between them are described in bitmap files. In this case, you can essentially think of a bitmap as a matrix of 0s and 1s, with each 1 representing an established relationship between the entity represented by its row and the entity represented by its column. Since real world graphs are usually very sparse, meaning that most things do not have a relationship with most other things, these bitmaps are sparse as well: most of the values in them are 0. Therefore, they can be highly compressed by storing only the nonzero values (in other words, the 1s). This compression minimises the storage space taken up by your graph relationships and serves to further reduce Sparksee’s footprint.
This takes care of the graph’s storage layer. Moving on to its compute layer, Sparksee allows for something interesting here as well: it provides the ability to push all computation as close to the data as possible. Ideally, all computation will be done within your embedded systems. This has two major advantages. The first is that it allows the user to implement and take advantage of your entire network of embedded systems to enable massive parallelisation. This has obvious benefits for performance that will only become more significant as the size of your network grows. The second is that by pushing the computation to the embedded side, it can minimise network traffic and therefore bandwidth usage.
The idea here is that when you run a query Sparksee can facilitate your distributed deployment – maybe you’ve got a network of self-driving cars and you want to know where they all are, or maybe you’ve got a network of atmospheric sensors and you want to know if any of them are indicating it’s going to rain – each node in your network receives your query, computes whichever information you’ve asked for, then sends strictly that information to you. The alternative would be for each node to report all information to you and allow you to compute the information you need yourself. This is closer to the traditional way of doing things, but you can easily imagine how slow, unwieldy, and unnecessarily costly this would become when dealing with, say, an entire city’s worth of self-driving cars. Sparksee’s more gourmet approach of consuming data selectively – rather than ravenously devouring as much data as possible – is far more cost effective.