Implementing Data Quality with Amazon Deequ & Apache Spark

Prosenjit Chakraborty
9 min readNov 26, 2019

Data quality is an important aspect whenever we ingest data. In a big data scenario this becomes very challenging considering the high volume, velocity & variety of data. Incomplete or wrong data can lead more false predictions by a machine learning algorithm, we may lose opportunities to monetize our data because of the data issues and business can lose their confidence on the data.

--

--