SAS Data Quality Update
Published:
Content Copyright © 2023 Bloor. All Rights Reserved.
Also posted on: Bloor blogs
SAS Institute was founded in 1976 and has long been a major player in analytics. It now has over $3 billion in annual revenue, with customers in 150 countries. It is very unusual for a company this size in being a private company, owned by founder Jim Goodnight, along with his co-founder (John Sall). Mr Goodnight is a sprightly 80-year-old who takes a very hands-on approach to running SAS. He still sets aside half a day a week to get his hands dirty actually coding.
In the data quality space, SAS built their offering initially on the 2000 acquisition of a leading DQ vendor called DataFlux. More recently the SAS data quality offering has been incorporated into their broader data management platform, called Viya. Broadly, Viya covers data access and integration, analytics workflow, compliance, risk mitigation and trust.
SAS see adaptive AI, cloud, and integration with data governance as some of the broad trends in the market at present. The SAS data quality offering covers the full spectrum of data analysis, cleansing, matching, enriching, monitoring and remediation, with the product aimed primarily at business users rather than IT. Access is achieved by a wide range of data connectors e.g. all kinds of file structures, including databases, ODBC, Spark and many more. SAS have automatic crawlers for data discovery in their SAS information catalog tool. These crawlers analyse metadata to detect new columns, tables etc, so keeping the tool fully up to date. The product can assess data completeness, check rules for necessary for approvals (maybe personal data needs masking for example?), and general profiling (highlighting distinct values, averages, uniqueness etc,…). The product has many rules in their quality knowledge base to save time e.g. detect the format of a phone number or social security number or credit card number etc.
In the coming year a new business glossary will be released, with machine learning to analyse datasets and suggest business rules. SAS already use AI in their matching, but there will be a much wider deployment of this technology. For data observability, they have dashboards showing how data quality is improving (or not) over time. SAS studio is a visual design tool to build workflows and schedule batch jobs etc. so supporting data stewards.
A key differentiator for SAS is their SAS quality knowledge base. This has knowledge of common data formats in different countries like postcodes, etc, covering 47 countries and 36 languages are supported. This set of built-in rules allows very quick parsing etc without having to write code. It includes for kanji, Arabic etc. SAS helps with data preparation e.g. from a database or a sensor, which may need to be parsed, matched, enriched etc. The SAS Data Preparation tool supports transformations enabled via a visual interface.
SAS compete in the data quality arena with big vendors like Informatica, SAP and IBM as well as specialist companies like Talend, Precisely, Ataccama and Experian. SAS continues to invest heavily in its data quality offering as part of its broader data management platform.