Azure Purview — Cataloging Delta Lake Assets using Apache Atlas API

Prosenjit Chakraborty
4 min readFeb 8, 2021

Azure Purview, one of the latest tools delivered by Microsoft helps to properly govern customer Data Lake and have well-integration with various Azure services. Its support to Apache Atlas API can easily extend the data governance service to various non-Azure components as well. In my earlier blog, we have seen how we can leverage the API to catalog/lineage Apache Hive assets. In this blog, we’ll see how we can register Delta Lake assets into Purview.

Scanning Azure Data Lake identifies Delta Lake table schema. Find below few screenshots.

A scan has discovered ADLS folder & files containing records in Delta format.
The scan rightly identifies the Delta table schema.
Even the other Delta table/files have been discovered and put under ‘Related’ tab.

Though this should be fine for most of the cases however, there may be specific use case where, we need to take advantage of Delta Lake metadata to specifically catalog Delta assets along with storing the lineage information. To achieve this, we need to create a new type…

--

--

Prosenjit Chakraborty
Prosenjit Chakraborty

Written by Prosenjit Chakraborty

Tech enthusiast, Principal Architect — Data & AI.

No responses yet