Ixiwa is a data lake (Hadoop and Spark-based) management product that ingests data automatically, collects metadata about the ingested data (automatically) and classifies that data for you. However, it is how you explore the data that is most interesting, both with respect to structured data and image and text processing.
In the case of structured data, you can explore this visually – using a heat map-based approach – in any way that it is classified. Thus, for example, you could bring up a heat map based on the sensitivity of the data (the product integrates with technologies such as Apache Atlas and Ranger) or its quality, or simply because you want to look at all customer-related data. In the case of sensitive data, this might be based on PPI or PHI definitions or other standards that SynerScope supports, or you can create your own definitions. Other possible heat maps might be based around data size, the costs associated with data sets, and so on. Other types of data can also be clustered and explored in this way. For example, this technique can be used for exploring sensor data.
For unstructured data Ixiwa can classify individual elements within a single image using an object detection algorithm, and it also uses pixel sorting. This is more granular than auto-captioning, though you can use auto-captioning in conjunction with Ixiwa if you want to. Ixiwa does something very similar with text, for which it has a “text sorter”. For audio files it uses voice to text conversion first and you can use Ixiwa in conjunction with products that do feature extraction from text, if that is appropriate. The pixel and text sorters will extract similarities about, say, images, then infer clusters (same colour, same shape, same maker of car, facial recognition and so forth) and then tag those images appropriately. These sorters use TensorFlow to enable this sort of processing.
Under the covers, Ixiwa starts with the automated collection of metadata. There are three different types of such metadata that it collects, the first of which is “provided metadata”. That is, the metadata that accompanies the source system such as descriptions, field names and defined keys. The second type of metadata that Ixiwa supports is what the company refers to as “inferred metadata”. This would be metadata that is inferred from scanning the data itself. For this, Ixiwa has embedded machine/deep learning (based on TensorFlow) so that its inferences, and the tagging that it generates will improve over time. Finally, the third type of metadata that Ixiwa collects is what the company calls “attributed data”. This is where you can bring in external metadata about such things as data quality, cost and usage information. Full data and user level audit trails are captured, to support these sorts of metrics.