Member-only story

Improving Resiliency with Databricks Delta Lake & Azure

6 min readApr 22, 2020

Resiliency is one of the most important aspects we should consider while creating a data lake. Azure Storage provides some great features to improve resiliency. On top of these, Databricks Delta Lake can add a cool feature called time travelling to make the lake more resilient and easily recoverable.

In this blog, we’ll discuss about few features which will help to protect our data from corruption/deletion and can help to restore easily in case of any issues.

Right Access Permission

First thing we will consider providing the right access. Only the resource administrator should have the owner access, developers should have read access and applications can have contributor access. By this way, data can only be deleted by the resource administrator or by a process e.g. by Databricks or by Azure Data Factory pipelines.

Accidental Delete Protection

To avoid any accidental deletion we should always add a delete lock on our data lake.

Adding a ‘Delete’ lock on the Storage Account.

By mistake if someone tries to delete, he’ll get a prompt to remove the lock first!

Delta Lake Time Travelling

Delta Lake time travelling is a great feature and should be used in case of any data corruption in the Delta Lake (e.g. by wrong data ingestion or faulty update procedure). Find below a short example:

import org.apache.spark.sql.SaveMode// adding records for the first time
val studentDF = Seq(
  (1, "Prosenjit"),
  (2, "Abhijit"),
  (3, "Aadrika")
).toDF("id", "name")studentDF.write.format("delta").mode("overwrite").save("/mnt/mydeltalake/Student")// updating with a new record
val studentDF2 = Seq(
  (4, "Ananya")
).toDF("id", "name")studentDF2.write.format("delta").mode("append").save("/mnt/mydeltalake/Stude…

Improving Resiliency with Databricks Delta Lake & Azure

Right Access Permission

Accidental Delete Protection

Delta Lake Time Travelling

Written by Prosenjit Chakraborty

No responses yet