Large-scale Data Quality Verification - How to Unit-test Your Data with deequ

Stream
06/18/2019 - 11:50 to 12:10
Frannz Salon
short talk (20 min)
Beginner

Session abstract: 

Every day, companies rely on data to guide every single business process and decision. Missing or incorrect information seriously compromises the customer experience and any decision process downstream. Therefore, a crucial, but tedious task for every team involved in data processing is to verify the quality of their data. In this talk, we will show how to continuously verify data quality by defining metrics and constraints, resulting in better testing for data pipelines and machine learning applications.

 

Link to open source repo: https://github.com/awslabs/deequ.

 

 

Video: 

Slide: