Data management trends in 2025
Published:
Content Copyright © 2025 Bloor. All Rights Reserved.
Also posted on: Bloor blogs
As 2025 unfolds there are several trends that sweeping the world of data management. This article explores some of the main ones.
Data Fabric and Mesh
Traditionally enterprises have deployed separate applications for their various operational applications, and even though there has been some consolidation with enterprise resource planning (ERP) systems, each company still has many (usually hundreds) of separate applications in use. Being able to answer broad questions about the state of the business becomes tricky in this data landscape. Just knowing seemingly simple things like who your most profitable products and customers are can only be done if you have a consistent view of the groupings of customers and products, and also the allocation of costs that goes with these. To get a consistent picture the data warehouse was invented, which involved copying data from operational systems into a central database, and resolving inconsistencies in data, such as multiple customer account records, at the point of loading that data into the warehouse. Once a consistent set of trusted data is built then that data warehouse can be queried by analytic tools. This all sounds good, but in practice, the data warehouse structure can be brittle, with hierarchies of customer and products defined and stored in linked database tables. If these structures change due to a reorganization or an acquisition or merger then the data warehouse can get out of synch with the operational systems. If enough changes happen regularly then the data warehouse structure may never catch up to the reality on the ground. An alternative approach has emerged recently, whereby data is left in place in the source systems but a layer of active metadata is built that documents all the key data sources, rather like a data treasure map. A semantic layer using business terminology that hides the physical data structure can be presented to business users, so instead of seeing table and column names they see a data marketplace that uses business terminology, possibly presented in an attractive graphical form (a knowledge graph). This approach is known as a data fabric, with a related approach called data mesh having a similar goal but with data governance and ownership decentralized into the relevant business units. These approaches have the big advantage that data movement only occurs in response to business queries, so the source data is queried directly and is always up to date. The drawback is that dealing with the data quality and duplication issues, which were handled in the data warehouse, now have to be dealt with by the semantic layer, and a single query may result in complex database calls across multiple sources, with performance complications. Nonetheless, data fabric and data mesh approaches are gaining traction. The data fabric market was worth $2.3 billion in 2024 with 21% growth, according to Fortune Business.
Cloud
Over half of all enterprise data is now in the cloud, according to various 2024 surveys. Enterprises that used to have all their data in a corporate data center now see their data split between this and public clouds such as AWS, Azure and Google Cloud Platform, as well as in privately run cloud environments. This fragmentation has worsened the issue of data silos, which were a big enough problem even when all data was in a single data center. A Deloitte survey showed that 63% of businesses see data silos as a top challenge. This gradual migration of applications from on-premise to the cloud(s) creates a large data integration challenge as migration occurs, but also further integration issues throughout the process as data moves around. Even if and when an enterprise completes a total migration to the cloud, which may never happen in many cases, there is still the problem of how to manage multiple cloud environments and how to ensure data security, availability and compliance with regulations that may specify the geographic location of data, the so-called data sovereignty issue.
Artificial Intelligence and Automation
Since November 2022 and the emergence of ChatGPT, the issue of AI and in particular generative AI based on large language models has been high on the corporate agenda. A torrent of money has been poured into AI start-up companies, with around a third of all venture capital investment in late 2024 being AI-related. Enterprises have scrambled to deploy the technology in applications as wide-ranging as customer chatbots, coding assistants, medical image interpretation and building more targeted marketing content. Some of these have been more successful than others, as a host of well-publicised project failures have started to appear in newspaper and television headlines. McDonald’s had to abandon a hundred-store rollout of a drive-through generative AI assistant, and lawyers have been sanctioned for submitting work to judges in cases that are AI-written, complete with completely made-up legal precedent cases, a product of AI “hallucination”. In the data management world, machine learning has long been used to help with the merge/matching of records, and vendors are experimenting with generative AI to create business glossary descriptions and training chatbots on their product manuals to provide better customer assistance, amongst other applications.
Data Quality
Data quality has long been an issue for enterprises to deal with, ever since the first records were input into a database. The average large enterprise has over four hundred applications, with many of those systems holding overlapping or duplicated data about customers, products, assets and more. Much of that data is incomplete, out of data, invalid or inaccurate. The first dedicated data quality software company (Innovative Systems) was founded in 1968 and there is an industry of data quality tools now, yet despite this only about a third of executives trust their own company’s data, according to various recent surveys. This causes many problems, but things have become more pressing in the data quality world since enterprises started to use their own datasets to supplement large language models (LLMs), in a technique called retrieval augmented generation. LLMs are only as good as the data they are trained on, and if you train your customised LLM on data from your company systems that is of poor quality then that will lead to the LLM producing poor-quality answers. Consequently, data quality is moving up the agenda of management as they seek to deploy LLMs for competitive advantage.
Interoperability
Every enterprise these days has many different systems, whether hosted on-premises or in a public or private cloud, and usually a mix of all of these. These systems need to interact, both at a technical and semantic level. There are many standards around such as XML and JSON that facilitate technical interaction, but what is talked about less is the need to preserve the meaning of data across systems, whether through shared data models or via a shared business vocabulary. So just as there are technical interoperability standards like XML, there are standards in metadata, some public and others that can be established within an enterprise. The application of such standards ensures better decision-making, as business users have access to a more complete view of the data in their enterprise that supports their decisions. This is quite apart from cost reduction that occurs from a reduced need for data integration.
Data Governance
Enterprises have made significant efforts to manage their data in recent years, realising that their data is a key business asset that should not just be left to the IT department to worry about. Data catalogs can be used to document data sources, business definitions, data models, metadata, and to help define a business glossary. Above all, data ownership is assigned to designated business personnel, typically with a central group driving things, data stewards embedded within business units and a steering committee of senior business leaders to resolve boundary disputes and set direction. The advent of the use of AI models is another class of data asset that needs managing. Each LLM can have a “model card” defined for it that documents the version being used, where it is being applied, the data it is trained on, its architecture, performance metrics etc. These can themselves be stored in or referenced by an enterprise data catalog, which is an important thing to keep track of as LLMs are deployed and regulatory regimes start to demand that companies can trace their usage.
Conclusions
Data management faces a busy time as it adapts to a world in which enterprises are migrating gradually to a hybrid cloud environment, deploying new architectures such as data fabric, and deploying AI models that need to be governed and managed, these models themselves depend more on more on high-quality corporate data on which they can be trained.