Putting Data Quality On the Map

Let’s face it: data quality is not the sexiest of subjects. Business people certainly are aware that they rely increasingly on data within their computer systems, and know that incomplete or duplicated data can cause problems, but it can be tough convincing people of the “value add” for a data quality project.

An example of where data quality solutions can be more exciting is in the “enrichment” of data. If you are an insurance company then you certainly want to know whether you have the correct address of someone that you are either selling a policy to, or processing a claim for, yet this in itself is necessary but hardly gripping. But what if in your claims processing application you could check insurance claims for (say) hurricane Katrina, to see whether the address of the person making the claim was actually in the path of the hurricane (or, more interestingly, not)? If you are going to issue a home insurance policy, would it not be useful to be able to tell, just from the address and in real time, whether the home in question was on a flood plain, or within five miles of a coastline?

A module within the latest version (5.5) of Group 1’s software lets you do precisely this, by allowing data quality enrichment of data through geo-coding. This is especially relevant in the United States, where postal codes are quite broad and can cover a wide area. However even in countries where the postal codes are much tighter in scope, such as the UK, it may still be very useful to know whether a specific address is within a certain boundary. For example, is an address within the congestion charge zone in London, or where the nearest doctor’s surgery is to a particular address.

Version 5.5 also has substantially improved facilities for data stewards. Setting data matching thresholds is a bit of a black art. Set the criteria too tightly and you will get false negatives, which could be very bad if, say, you miss out someone on a bad credit watch list. Relax the rules too much and you may get false positives e.g. matching up a father and son incorrectly (who may have the same name and live at the same address). The latest software has facilities to not just show a match or otherwise, but to explain to a data steward why a match was made, and which business rules caused that match to be flagged by the software. Additionally, templates are provided which make it easier for business users to set up commonly occurring scenarios. The software is SOA enabled, so can be called up within other applications.

Enrichment of data via functionality such as geo-coding is a way in which data quality vendors can genuinely add value to an enterprise. In this way they can move the conversation away from geeky conversations about the merits of different matching algorithms, and on to subjects that business users care about, and will be prepared to pay money for.