Monte Carlo positions itself as a data observability platform, monitoring enterprise data quality. The company uses the term “data downtime”, an analogy of website downtime, to explain how customers need to carefully monitor the quality of data in their enterprise, often in real-time. In one survey of 200 data professionals, companies reported an average of 67 data quality incidents per year, with well over half of survey respondents estimating that 25% or more of their company’s revenue was impacted by poor data quality.
Monte Carlo’s software makes extensive use of machine learning to automate the production and monitoring of business data quality rules, which in the case of some older tools is a manual task. The software has connectors to various popular data sources like SAP, Oracle, MySQL, Snowflake and DataBricks amongst others, and reads the metadata of these and makes use of database logs to build a picture of the dataflows in an enterprise. Additional connectors exist to data transformation tools such as Airflow, DBT and Prefect, though not as yet to some of the larger data integration vendors like Informatica or Talend. In this way, the Monte Carlo software is able to, at least to a degree, generate a view of the lineage of data from source systems through to, say, a data warehouse or data lake.
Once installed, the software does extensive profiling of the data and generates data quality rules based on the characteristics and volumes of the data that it is tasked to check, including thresholds of values. It then builds extensive exception reporting and alerts to highlight when the data quality thresholds are breached, which may be due to a faulty data feed, or invalid data or something else like a systems upgrade that has caused a data load to be missing.
Customer Quotes
“Monte Carlo alerts are high quality. We don’t get many false alarms, which really helps build a culture of urgency
to event management and response.”
Adam Woods, Chief Technology Officer, Choozle
“Giving the power back to the domain owners and experts, is one of the most important steps in achieving improved data observability.”
Martynas Matimaitis, Senior Data Engineer, Checkout.com