The Data Layer
The Data Layer is where data is managed and, usually but not always, stored. It provides specific support to higher layers that need to access the data, whether for use in operational or transactional applications, or for analysis that drives insight. In all environments metadata is the key to the transformation of data into information and into knowledge. While some metadata is typically implicit within application programs, the explicit discovery of metadata and exploitation falls within the domain of the Data Layer. There is a similar overlap between the Data Layer and Trust.
Metadata: Metadata is data that describes other data. Metadata summarises basic facts about data, which can make finding and working with particular instances of data easier. Metadata is universal in IT environments and does not just apply to the data that is directly used by the business – databases, files, spreadsheets, images, videos, websites and the like – but also to parts of the environment such as hardware system designations and user authorisation details. Metadata may be created manually, often during creation processes or, more typically, it is derived by automated information processing.
Different levels of metadata exist. For example, applied at the file level details such as file size, file extension, when the file was created and who by, would all be pertinent metadata. Conversely, in a database, a database column has a datatype associated with it that defines the format of data that should be represented in the column. The column name may also represent metadata about the column if appropriate naming conventions are applied, though this is often not the case.
Metadata may or may not be directly associated with the data being described. For example, an application program may know that the ninth column of table A in the database represents outstanding customer balances, without the database itself having any knowledge of that fact. The extent to which metadata is or is not embedded in the Data Layer will depend on individual architectural decisions.
It should be apparent from this description that it is possible and, in the right circumstances, desirable to be able to discover and manage metadata about metadata: meta-metadata. Indeed, advanced tools support meta-meta-metadata. It is not theoretically possible to abstract to higher levels of metadata beyond meta-meta-metadata
While manual creation of metadata is always possible, automated processes that allow users to augment automatically gathered metadata with additional information that the user feels is relevant, will be preferred. At the lowest level automated metadata creation can be quite elementary, discovering only limited amounts of information but more sophisticated metadata products will crawl through data environments to provide extensive cataloguing and search-based capabilities.
Data: Data is represented by the quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. Data has specific “form” though this form may be more or less prescribed. In all cases it will conform to a particular datatype (date, time, numeral, latitude and longitude, text, audio, video or a particular IT format such as XML or JSON) and have a certain size (length). In IT systems an item of data is represented as a particular instance of a datatype. It will often have a prescribed length in addition to the format dictated by the datatype. More generally, data represents facts that may be used as a basis for reasoning, discussion, or calculation.
Data Storage: the Data Layer encompasses not just the formal storage of data but also data that is not stored. For example, sensor data may be collected and processed at source and only selected data passed on. Similarly, in some cases, event data may be processed centrally but, again, only the results of that processing are retained. These types of instances are best regarded as logical data storage environments even though that is not true from the viewpoint of physical instantiation.
In so far as physical instantiations are concerned, data storage encompasses all forms of databases, file systems, spreadsheets and so on.
Data Management: the management of data within data storage is typically the preserve of database management systems at a physical level but of other elements such as document and content management, information lifecycle management and archiving at a logical level. At a higher level master and reference data management represent ways in which data management interacts with Trust in order to ensure governance and compliance.
Applying data management to data often (not always) results in information. For example, relational database management systems store data in a structured way that imparts context to the raw data. For some NoSQL databases this is not the case and third party tools need to be used to derive metadata – and hence context – about the data held in these databases.
Information: Information is data in context. A piece of data may be just a number such as “3107.50”. What you need to know to be able to treat this as information rather than raw data is that this represents an amount, that it is in a particular currency, and that it represents an outstanding account balance. Or perhaps it represents the temperature inside a blast furnace. In any case, data has to be interpreted within the context of a (business) process to become information. Data of any type only becomes information when it is associated with well-defined metadata. While ex tempore conversion of data to information is possible (the metadata is in someone’s head), this is not to be recommended. To summarise, information is data where the relevant structures and semantics are well understood, ideally in formal documentation or models (which may be embedded in devices or servers for automation purposes), but at least in informal practice.
A single factoid (piece of information) on its own is seldom of value. In practice, it acquires value by accumulation with other pieces of information. This may be done over time and/or space – a series of information points about the same thing – and/or by acquiring complementary information.
Access: Access is about retrieval and it is irrespective of data versus information. Indeed, data may be accessed precisely for the purpose of applying context to the data. In so far as accessing information is concerned there are a broad range of applications and tools that need to access information for a variety of processing purposes. In fact, it would be true to say that all processing requires the ability to access pertinent data. The key question is how that is done. Most simply it is achieved by direct methods, via APIs (application programming interfaces), specialised connectors, through use of data integration tools and via technologies such as data federation.
Knowledge: Knowledge provides meaning to information. What do a series of temperature readings in a blast furnace tell you about the state of the furnace? Is an outstanding account balance of a particular amount unusual or out-of-line, or is it run of the mill? Knowledge allows you to establish the significance of information. It is the understanding and comprehension that results from the application of procedural (both algorithmic and heuristic) and judgemental (both constraints and planned goals) tools and processes to factual information.
Actionable Insight: is a term that effectively conflates two distinct trends within IT. The first of these is what we might call ‘insight automation’. This is the ability to analyse (large quantities of) historic data to infer the future behaviour of both people and things – this then enables the automation of the business processes that leverage those inferences, replacing what were historically manual activities. This automation represents the actioning of the insight derived from analyses. The second trend is towards ‘self-service insight visualisation’, which provides the same capabilities but within an environment where it is not practical to automate decision processes: where the human input to the process is enhanced by high quality visualisation of the data that the user can derive for him or herself (self-service) and without reliance on IT. In this case, insight is actioned by people rather than automated processes. Hybrid models are also possible where actions are recommended directly by the analytics software but which need to be authorised by appropriate users.