BigID was founded in 2016 and it is based in New York and Tel Aviv and also has offices in London. The company has raised approaching $100m in venture capital to-date, with notable investors including Salesforce Ventures, Bessemer Ventures, Scale Ventures, ClearSky Ventures and SAP.io. The latter is also a partner and BigID can be directly licensed from SAP as well as from other value-added resellers. As far as SAP is concerned, this partnership will be important as BigID increasingly targets European markets (especially the financial services sector). Other notable partnerships include TrustArc, OneTrust, RSA Archer, Collibra and Immuta, amongst others. The company is currently developing a reference architecture to be used in conjunction with Privitar, a company that, like Immuta, provides data masking.
Company Info
Headquarters: 165 Mercer St, 4th Floor, New York, NY, 10012, United States Telephone: +1 (917) 765 5727
BigID specialises in (sensitive) data discovery and classification. However, unlike other vendors in this space that have incrementally added discovery capabilities onto tools that were historically focused in other directions, BigID has been built from the ground up to concentrate in this area. What is more, from the outset the product has been developed to exploit machine learning and artificial intelligence. Again, very few competitors have this sort of capability and, even where they do, they are add-ons, as opposed to products that have been predicated on machine learning from the outset.
Apart from the technology that underpins BigID we should also mention that the company supports some 50 different data sources. These span popular relational data sources, big data sources (S3, Azure BLOB storage and a variety of NoSQL databases), unstructured data sources (file servers, SharePoint, Microsoft Office and so forth), applications, messaging, streams (for example, Kafka and Amazon Kinesis) and middleware. It is worth commenting that the NoSQL support is broader than any other supplier we have spoken to. This is important when it comes to DSARs (data subject access requests) as you would like to be able to make just one such request across your data infrastructure: this will not be possible if not all relevant sources are supported.
BigID is a microservices- based solution leveraging Docker containers and Kubernetes. It is available both in-cloud and on-premises and has an architecture illustrated in Figure 1, the key point about which is that multiple scanners can be deployed in parallel. These scanners can be directed to different data sources or, where appropriate, against a single source, focused on a specific schema.
As far as the actual process of discovering sensitive data is concerned, the software uses different algorithms, depending on the source, examining both the data itself and metadata. For structured data it uses correlation-based machine learning and metadata enrichment, and for unstructured data employs a neural network based on name or entity-based recognition algorithms, as well as document classifiers, with file cluster analysis planned. This allows you to apply weightings to metadata features and to calculate confidence thresholds which are fed back into the machine learning process. There is a library of some 60+ classifiers provided out of the box, including document classifiers, and this will be augmented during Q1 2020 by the addition of a cluster analysis algorithm. This will bring a level of structuring to what are otherwise unstructured environments and it will, in due course, be used to support sampling. Tags can be added to discovered data that can be leveraged by third-party policy-based solutions.
Fig 02 - Privacy-centric data intelligence
BigID actually describes itself as providing “privacy-centric data intelligence”. To that end it provides some significant dashboarding and reporting of its own, illustrated in Figure 2. This (partial) screenshot illustrates the data flows tool, which supports Article 30 of GDPR (records of processing activities) as well as providing data lineage. With respect to other capabilities note the “global risk” score highlighted in the panel to the left. Other options include the policy screen where you can set relevant policies, tools to investigate breaches, DSAR request capabilities, a metadata catalogue and general dashboarding capabilities. There are facilities provided to identify both potential policy breaches and high-risk data sources (and who is using them).
In addition to the dashboards and other tools provided directly by BigID, the environment also integrates with third-party environments such as Tableau and Qlik, as well as with Azure Information Protection.
There are several things about BigID that make it stand out. To begin with, there is the extent of its support for NoSQL databases, which is broader than anyone else we are aware of. Secondly, there is the ability to discover sensitive data not just for data at rest but also data in motion. This ability to monitor data pipelines is becoming increasingly important. Thirdly, there is the whole approach to discovery and classification, based on machine learning. This is advanced in BigID whereas it is rudimentary, at best, in most other solutions which rely on techniques that are typically prone to excessive numbers of false positives.
We should also comment on BigID’s approach to partnerships and working with third-party vendors. The company is refreshingly proactive in this respect. For example, apart from suppliers that offer their own data masking solutions we have not found any company in this space, apart from BigID, that is actively working with third-party anonymisation solutions.
The Bottom Line
We understand that BigID has some big-name customers despite its relatively recent entrance to the market. It would be nice if some of these could be made public. It is also notable that BigID has won a number of awards. We are not surprised about either of these facts and we are impressed by the company and its technology.
BigID specialises in (sensitive) data discovery and classification and unlike other vendors has been built from the ground up to concentrate in this area.
We use third-party cookies, including Google Analytics, to ensure that we give you the best possible experience on our website.I AcceptNo, thanksRead our Privacy Policy