Session abstract:
Every day, companies rely on data to guide every single business process and decision. Missing or incorrect information seriously compromises the customer experience and any decision process downstream. Therefore, a crucial, but tedious task for every team involved in data processing is to verify the quality of their data. In this talk, we will show how to continuously verify data quality by defining metrics and constraints, resulting in better testing for data pipelines and machine learning applications.
Link to open source repo: https://github.com/awslabs/deequ.