Implementing Data Quality with Amazon Deequ & Apache Spark
9 min readNov 26, 2019
Data quality is an important aspect whenever we ingest data. In a big data scenario this becomes very challenging considering the high volume, velocity & variety of data. Incomplete or wrong data can lead more false predictions by a machine learning algorithm, we may lose opportunities to monetize our data because of the data issues and business can lose their confidence on the data.