Fig 01 - Architecture
BigID is a microservices- based solution leveraging Docker containers and Kubernetes. It is available both in-cloud and on-premises and has an architecture illustrated in Figure 1, the key point about which is that multiple scanners can be deployed in parallel. These scanners can be directed to different data sources or, where appropriate, against a single source, focused on a specific schema.
As far as the actual process of discovering sensitive data is concerned, the software uses different algorithms, depending on the source, examining both the data itself and metadata. For structured data it uses correlation-based machine learning and metadata enrichment, and for unstructured data employs a neural network based on name or entity-based recognition algorithms, as well as document classifiers, with file cluster analysis planned. This allows you to apply weightings to metadata features and to calculate confidence thresholds which are fed back into the machine learning process. There is a library of some 60+ classifiers provided out of the box, including document classifiers, and this will be augmented during Q1 2020 by the addition of a cluster analysis algorithm. This will bring a level of structuring to what are otherwise unstructured environments and it will, in due course, be used to support sampling. Tags can be added to discovered data that can be leveraged by third-party policy-based solutions.
Fig 02 - Privacy-centric data intelligence
BigID actually describes itself as providing “privacy-centric data intelligence”. To that end it provides some significant dashboarding and reporting of its own, illustrated in Figure 2. This (partial) screenshot illustrates the data flows tool, which supports Article 30 of GDPR (records of processing activities) as well as providing data lineage. With respect to other capabilities note the “global risk” score highlighted in the panel to the left. Other options include the policy screen where you can set relevant policies, tools to investigate breaches, DSAR request capabilities, a metadata catalogue and general dashboarding capabilities. There are facilities provided to identify both potential policy breaches and high-risk data sources (and who is using them).
In addition to the dashboards and other tools provided directly by BigID, the environment also integrates with third-party environments such as Tableau and Qlik, as well as with Azure Information Protection.