skip to Main Content

Big Data

Last Updated:
Analyst Coverage:

Big Data refers to the ability to analyse any type of data and not just the relational data that is usually analysed in data warehouses. This typically means instrumented or sensor-based data (sometimes called machine generated data) on the one hand and text, video, audio and similar media types on the other. Both of these types of data have the potential to dwarf the relational (transactional) data in terms of the quantities of such data that are generated and available for analysis: hence the term “big”.

Big data technologies are also noteworthy because they are often inexpensive. Many (not all) can be implemented across low-cost commodity servers, which makes the storage of large amounts of data a much more realistic proposition, from a financial point of view, than it was previously.

Big data technologies are essentially extensions to a data warehousing environment although there are some exceptions, notably where there are also operational (usually real-time) requirements. As such, big data provides exactly the same sorts of business intelligence and analytic functionality.

These extensions are usually implemented at the back-end of the data processing environment alongside the data warehouse or mart but, where there are very high volumes of data that need to be processed in a very short time, then the big data solution may be implemented prior to storing the data in a data warehouse. These latter solutions may use Complex Event Processing, also known as (event) stream processing, or there are big data solutions (for example, based on Cassandra) that may be used for this purpose, the difference being that the former tends to be better when the model being processed is static and the latter when it is fluid.

Because of the low cost of many big data platforms these may also be used for other purposes besides business intelligence and analytics. For example, a number of companies are using Hadoop as a platform for ETL (extract, transform and load—see here) purposes while graph databases may be used for data quality matching and deduplication as well as for exploring relationships.

In general the sorts of users who should care about big data are the same as those who care about data warehousing; that is relevant managers and C level executives who care about such things as:

  • Customer acquisition and retention
  • Customer up-sell and cross-sell
  • Supply chain optimisation
  • Fraud detection and prevention
  • Telco network analysis
  • Marketing optimisation

However, there are additional potential users in areas such as preventative maintenance, smart metering and other sensor-related activities. There is also a significant use of big data within web-based organisations such as online gaming, mobile applications and so on.

Hadoop, and its associated tools, is currently the ‘big beast’ of the big data world and the Hadoop environment is undergoing rapid development, especially in areas such as its robustness, manageability and SQL access (though there is not generally a database optimiser present), all of which are currently limited.

Gathering momentum are graph databases (essentially triple stores with an inference engine) and we expect these to grow in popularity as their ability to identify and parse relationships out to 6 or 7 degrees of separation is recognised (a typical relational databases can manage about 3 degrees before performance dies). Graph databases, however, do not run on the low-cost clustered platforms that are otherwise typical of big data solutions, so these are not inexpensive in the same way that, say, Hadoop is.

Longer term, we expect (we know of two already) relational database vendors to implement HDFS (the file system used in Hadoop) as storage engines within their databases. This will combine the low-cost storage advantages of Hadoop with a single management layer that integrates the data warehouse and big data environments.

New vendors continue to enter the market and it is too early for any consolidation. Many, but not all, suppliers offer open source solutions and may have significant venture capital backing but little in the way of revenues. We do not believe that this can continue indefinitely—there are too many vendors and too many products; it is reminiscent of the dot.com bubble. We would advise companies looking at investing in this market to be sure of their due diligence before licensing any particular product, especially if the solution to be adopted will be mission critical (which is often the case with sensor-based environments).

Notable recent announcements have been IBM’s new PureData platform based around GPFS (its version of HDFS) and the announcement by InterSystems that you can now use Globals (the Caché database without the development environment that comes with it normally) as a replacement for HDFS under Hadoop. Given how many alternatives there are to HDFS (Cassandra and RainStor to name just two more) there is going to be major guessing game as to whether HDFS will survive and, if not, what will replace it.

Solutions

  • Actian logo
  • AWS logo
  • ataccama logo
  • Cambridge Semantics (logo)
  • CAZENA logo
  • CLOUDERA logo
  • CRATE.io logo
  • DataStax (logo)
  • EXASOL (logo)
  • FAUNA logo
  • Greenplum logo
  • HITACHI logo
  • IBM (logo)
  • INFLUXDATA logo
  • INTERANA logo
  • KX Logo
  • McOBJECT logo
  • Microsoft (logo)
  • N5 logo
  • Neo4j (logo)
  • Objectivity (logo)
  • Oracle (logo)
  • Progress logo
  • Qlik logo
  • QUASAR DB logo
  • Redis Labs (logo)
  • SCYLLA logo
  • Software AG logo
  • SOLIX logo
  • SPARSITY logo
  • STARBURST logo
  • TALEND logo
  • TIBCO (logo)
  • TigerGraph (logo)
  • TIMESCALE logo
  • Trendalyze (logo)
  • Unifi (logo)
  • VICTORIAMETRICS logo
  • YELLOWBRICK logo

These organisations are also known to offer solutions:

  • Databricks
  • Esgyn
  • Franz Inc
  • HortonWorks
  • Informatica
  • Kognitio
  • Memgraph
  • Ontotext
  • Pivotal
  • Precisely
  • SAP
  • SingleStore
  • Snowflake
  • Stardog
  • Teradata
  • Vaticle
  • Vertica

Research

N5 RUMI In Context (cover thumbnail)

Financial Trading Technology and RUMI from N5

IT is critical for financial services companies & often the prime or only source of competitive advantage.
Master Data Management Market Update 2021 (cover thumbnail

Master Data Management (2021)

This report summarises the current state of the master data management (MDM) market at a high level, assessing the leading vendors in the space.
CLOUD DATA MANAGEMENT PLATFORMS Market Update (thumbnail)

(Cloud) Data Management Platforms

This (Cloud) Data Management report compares platform-based approaches that support data integration to/from cloud-based deployments.
TALEND InBrief (cover thumbnail)

Talend Data Fabric

The basic concept behind the Talend Data Fabric is to allow you to collect, govern, transform, and share your data.
SOLIX InBrief (cover thumbnail)

Solix Cloud Management

SOLIXCloud is a cloud data management platform from Solix Technologies that provides four primary solutions.
QLIK InBrief (cover thumbnail)

Qlik Data Integration Platform

The Qlik Data Integration Platform is essentially a melding of the capabilities provided through the acquisitions of Podium Data and Attunity.
ORACLE InBrief (cover thumbnail)

Oracle Unified Information Management Platform

Oracle’s data management capabilities span the establishment of (cloud-based) data warehouses and data lakes.
INFORMATICA InBrief (cover thumbnail)

Informatica Intelligent Data Platform (2021)

The Intelligent Data Platform encompasses data integration, quality, governance, MDM, cataloguing, privacy, application integration, and more.
Back To Top