Fig 01 - Scaling out with Neo4j
Neo4j is a property graph database with a native engine that is targeted at operational, hybrid operational/analytic (HTAP) and pure analytic use cases. It is ACID compliant and supports immediate consistency. Additional technologies and tooling are available to support the Neo4j environment. Since version 4.0 was released (4.1 is the current version) the product has supported scale-out as well as scale-up, as shown in Figure 1, which depicts the (geographically) distributed environment that Neo4j now supports. This is based on the introduction of support for sharding, which extends the horizontal multi-cluster scaling that was introduced in version 3.4. The replicas illustrated refer to read replicas, which have been available within the product for some time. Also included in the most recent release is support for much more granular security than was previously the case.
Most users (see below) employ Cypher or openCypher (the open source version), which is the declarative language developed by Neo4j. It is notable that SAP, Redis, Memgraph and others have adopted OpenCypher and it is also being used within several open source projects including Cypher for Apache Spark, and Cypher for Gremlin, as well as in research projects like InGraph for streaming queries. As with any declarative language this is best implemented along with a database optimiser and the company has devoted considerable resources to this, extending beyond an original rules-based optimiser so that it is now primarily cost-based, supporting optimisation for writes as well as reads.
Customer Quotes
“Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code. At the same time, Neo4j allowed us to add functionality that was previously not possible.”
eBay Shutl
“I’d like to comment on Neo4j’s scalability and capability of looking at millions and millions of nodes. We have a “big data” problem — not only in structured data, but in unstructured data — and we are continually gathering more data. At NASA, my focus right now is on the unstructured data. And I need a product or an application that can go across and develop millions if not billions of nodes, connect that information and at fast speeds. Neo4j is that tool.”
NASA