Approximate Query Processing
Date:
By: Philip Howard
Classification: White Paper
Data warehouses and databases, alongside business intelligence tools,
are designed to produce exact answers to whatever questions you care
to ask. This even applies to big data environments such as Hadoop.
However, do you always need to know the exact answer to any particular
question? For example, if you sell thousands of a particular product
every day, do you really need to know exactly how many you sold last
month? Surely, the answer is no: you can quite happily round to the
nearest hundred, thousand or tens of thousands.
That being the case, why do you have a data warehousing or analytic
environment that insists on calculating exact answers for you all the
time? The short answer is that is what vendors offer you. What if—and it
is an important if—an approximate answer could be provided in significantly
less time and with a lower resource requirement and deliver the
answers you need? Moreover, what if the reduced resources needed for
these types of queries could be freed up for those analytic processes
that do require precise answers, thereby improving performance for
these also?
This paper examines approximate query processing (AQP) as an
approach to particular types of query where approximations are appropriate.
AQP has been the subject of considerable research for about the
last 15 years, but has largely been confined to academia with relatively
little appearing in commercial products to date. This paper argues in
favour of more AQP capabilities, at least as an option. We will discuss
both AQP as a generic set of capabilities as well as use cases where it
may be especially useful.