MarkLogic Data Hub Service and MarkLogic Server

Update solution on September 11, 2020

MarkLogic Data Hub Service and MarkLogic Server
Mutable Award: Gold 2020

Related Company

MarkLogic Data Hub, whose architecture is shown in Figure 1, is a platform for data storage, integration, operationalisation and governance. It’s built on top of MarkLogic Server, a multi-model database that is capable of handling graph, relational, and document data. MarkLogic’s offerings can be deployed either on-premises or in the cloud. The latter in particular is enabled via MarkLogic Data Hub Service, the company’s fully managed, multi-cloud data hub SaaS solution. Moreover, the company supports a gradual transition – what it terms a “pathway to the cloud” – from on-premises deployment, to self-managed cloud deployment, to the fully managed Data Hub Service.

Customer Quotes

“I’ve never seen such a rich and a fast technology in 25 years career.”
Airbus

“MarkLogic’s search, semantics and security features made it optimal to serve as the foundation for the next generation of our catalog.”
Johnson & Johnson

MarkLogic Server serves as the storage layer for MarkLogic Data Hub. As a multi-model database, it can be used to store documents (including JSON, XML and flat text documents), relational data via tables, rows and columns, and graph data via RDF triples (although technically it uses quads – which means it supports named graphs. The database is ACID compliant with immediate consistency, and there are enterprise grade features available such as high availability, resilience and so forth. Security capabilities are also provided, including encryption and key management, role-based security, redaction and even data masking. As far as graph is concerned, it supports inferencing via backward chaining, and there are built-in semantic search capabilities as well as (bi-)temporal and geospatial functionality.

Appropriate query support is provided for each model: search for documents, SQL for relational data, and SPARQL for graph data. The latter also supports GraphQL and GeoSPARQL. MarkLogic also provides its Optic API for multi-model querying using a combination of the aforementioned query languages. Notably, relational and graph data in MarkLogic Server leverage the same index technology, meaning that you can query the same data as either a set of triples or as rows and columns. All queries are composable between types of model, and consistency is maintained across all models that represent the same data.

You can also leverage multiple models simultaneously. For example – in fact, this is a typical use case – you can use MarkLogic to build a knowledge graph where the entities are documents and the relationships are triples. Among other things, this means that the nodes in your graph can contain properties (read: metadata) in and of themselves, without requiring additional nodes in the graph. This allows you to store a wealth of property information within your graphs, and thus provide fodder for detailed searching and querying, without fear that the size of your graphs will balloon out of all reasonable proportion.

Alternatively (or additionally), you can annotate your documents using triples or embed your triples inside of your documents. By partitioning your set of triples between your documents, you can leverage document search to rapidly locate which documents – and therefore which subset of your triples – are relevant to your query, then apply your query strictly to that subset. This has benefits for speed and scalability.

Fig 02 – MarkLogic Data Hub QuickStart

As far as the MarkLogic Data Hub itself is concerned, it acts as a hub for data management on top of MarkLogic Server. This means that it inherits the latter’s multi-model approach, then exposes it via a unified data integration and management platform. It also offers a number of additional features, including data lineage tracking, fast data pipelines, a “QuickStart” user interface (see Figure 2), and additional governance features. Most notably, it allows you to directly enforce policy rules on your queries at the code level, filtering the results in order to comply with the applied policy and thus embedding governance into the queries themselves. As a means of policy enforcement this is highly effective because it is present in your system at a deep level, which means it cannot be ignored or easily circumvented.

MarkLogic Server’s greatest selling point is undoubtedly the data agility that users get because it is multi-model and has schema flexibility. Being able to leverage graph, relational and document data interchangeably allows you to choose the right tool for the right job, and moreover, leveraging them in combination can create a solution that is more than the sum of its parts. Leveraging documents inside of graphs, and triples inside of documents, are particularly strong examples of this. The latter enables MarkLogic’s query optimisation, while the former allows you to store much more property information in your graph than would normally be practical. Moreover, the document format allows you to store information as needed, without regard for whether it fits into a rigid schema. This enables you to retain all relevant information, and thus provide a complete business context and provenance for each of the data assets within your (knowledge) graph.

MarkLogic Data Hub adds to this by providing a window into your multi-model data. It gives you multiple lenses with which to view your data (notably, relational and graph) and in general acts as a unified platform for interacting with and curating your graph, relational and document data. Its unique (and, as already indicated, highly effective) approach to policy management – which could more appropriately be described as policy enforcement – is worth noting as well.

The Bottom Line

MarkLogic Server excels as a multi-model database not only because it gives you an expanded range of options for storing data, but because it provides ways for those options to interoperate, and thereby produce synergies that would otherwise not be possible. MarkLogic Data Hub builds on this foundation by providing fast ingestion, curation, and data access. In short, if you want to leverage graph, relational and document data together, you should be looking at MarkLogic.

Connect with Us

Ready to Get Started

Learn how Bloor Research can support your organization’s journey toward a smarter, more secure future."

Connect with us Join Our Community