What’s happening in test data management?

Written By:
Published:
Content Copyright © 2024 Bloor. All Rights Reserved.
Also posted on: Bloor blogs

What’s happening in test data management? banner

Test Data Management (TDM) is, as ever, a changing space. Moreover, it is a space in two halves. On the one hand, TDM technology is increasingly popular and widespread among enterprises. In many cases, it is even seen as a requirement. A large part of this can be attributed to a greater recognition of the need for regulatory compliance, which has spurred on the desire for both test data management in general (especially synthetic data – see below) and data masking in specific. The latter is increasingly seen as important for data security (in data breach prevention, for example) when deployed across the enterprise, and although data masking and TDM are separate technologies, the substantial overlap between them has created a knock-on effect for TDM. In some cases, TDM tools can also contribute to other data tasks, such as more general data provenance, versioning, provisioning, and so on, because the techniques used for curating test data are not always that different from the ones needed for curating production data.

On the other hand, many enterprises are, to quote one vendor we spoke to, “still in the stone age” when it comes to TDM. Production data is still widely used for testing, in spite of both compliance demands to the contrary and the sheer quantity of testing tools (for TDM and more general test automation) available. We have to wonder what, if anything, it will take to convince these enterprises to see the value in test data, and we suspect that most vendors in the space have – with good reason – effectively given up on them.

In terms of implementation, testing has long been moving away from the centralised “centre of excellence” to a more distributed system. This trend continues, and hybrid solutions are common. It is also increasingly normal for enterprises to integrate their TDM into a common workflow alongside various *Ops process (DevOps, DataOps, MLOps, and so on). The complexity this integration adds to test data pipelines, as well as the increased complexity of test pipelines in general, has made the ability to systematically operate on your test data after its creation – but, crucially, before it enters the wider data environment – more necessary as of late.

As for the three principle TDM techniques (data subsetting, synthetic data generation, and database virtualisation) synthetic data continues to rise in popularity, in part due to the ever-growing concerns around compliance, while the buzz around database virtualisation appears to have tapered off. There are a number of reasons why this could be the case – compliance concerns that surround distributing even masked copies of your entire production database, “hidden” additional costs when combined with cloud storage, and Delphix cornering the market and inadvertently convincing it that database virtualisation must be very expensive, to name a few – but regardless, it is clear that much of the market has demurred from the technique. Indeed, this also applies more generally, with a number of vendors losing interest in either other subsections of the TDM space or even the space itself (often by implication only, of course – watch for tools that are poorly supported, or are simply treading water).

This seems particularly poor timing because of the recent excitement around AI (and specifically generative AI and LLMs (Large Language Models)) that has swept up every data space, most certainly including TDM. In fact, we know at least one TDM vendor that considers AI to be the key issue for the modern enterprise, and frankly, we would be hard-pressed to disagree. Within testing, we have seen AI used as an accelerator and copilot, interfacing with an LLM to help you build and provision your test assets, and indeed your test data, more quickly and more effectively. Several vendors use AI to generate synthetic data sets that mimic the attributes of production data sets, for example. Conversely, we have also seen vendors feed LLMs their test data (and other test assets) in order to provide them with additional context and rigor. In short, it is clear that there is a lot of potential for using AI in the TDM space – and, indeed, for TDM as a whole.