Fig 01 The analytic architecture of Couchbase Server 6.0
Figure 1 illustrates the analytic architecture introduced in Couchbase Server 6.0. Some of this may need explanation. Specifically, internal Database Change Protocol (DCP), is a parallel capability that streams changes (updates and so forth) to all nodes, and keeps services synchronised, reducing and providing faster view consistency. More generally, there is an embedded parallel engine to support performance for data intensive jobs. In addition, Couchbase Analytics supports parallel search, parallel joins (hash, index nested loops, and broadcast joins), parallel group-by (both pre-sorted and hash-based) and parallel sorts. All of these, incidentally, support spill-to-disk capabilities in case memory is insufficient.
Allied with this is N1QL for Analytics (and an Analytics Query Editor), which is based on SQL++. This is a language developed to bring SQL-like, declarative capabilities to environments where the data is semi-structured rather than structured, and N1QL for Analytics is the first commercial implementation of the SQL++ Framework. As can be seen, Couchbase Server uses log-structured-merge trees, which are good for high-speed ingestion, especially where indexed access (and Couchbase supports secondary indexes) is required.
There are several things that are not shown in this diagram. Notably, support for Bloom Filters, which are used to improve search efficiency; ODBC/JDBC connectivity and Spark integration. Jupyter Notebooks are supported through a Python SDK but the product does not support R or PMML (predictive modelling mark-up language). While the company is visualisation agnostic the lack of a pure SQL interface, integration with third party business intelligence tools is achieved through ODBC/JDBC drivers. The company has a partnership with Knowi. Finally, there is an important facility providing workload isolation that allows you to assign specific nodes in your cluster to complex analytics, indexing, ad hoc queries and so forth, as required, supporting the separation of operational and analytic workloads.