Member-only story
Improving Resiliency with Databricks Delta Lake & Azure

Resiliency is one of the most important aspects we should consider while creating a data lake. Azure Storage provides some great features to improve resiliency. On top of these, Databricks Delta Lake can add a cool feature called time travelling to make the lake more resilient and easily recoverable.
In this blog, we’ll discuss about few features which will help to protect our data from corruption/deletion and can help to restore easily in case of any issues.
Right Access Permission
First thing we will consider providing the right access. Only the resource administrator should have the owner access, developers should have read access and applications can have contributor access. By this way, data can only be deleted by the resource administrator or by a process e.g. by Databricks or by Azure Data Factory pipelines.
Accidental Delete Protection
To avoid any accidental deletion we should always add a delete lock on our data lake.

By mistake if someone tries to delete, he’ll get a prompt to remove the lock first!

Delta Lake Time Travelling
Delta Lake time travelling is a great feature and should be used in case of any data corruption in the Delta Lake (e.g. by wrong data ingestion or faulty update procedure). Find below a short example:
import org.apache.spark.sql.SaveMode// adding records for the first time
val studentDF = Seq(
(1, "Prosenjit"),
(2, "Abhijit"),
(3, "Aadrika")
).toDF("id", "name")studentDF.write.format("delta").mode("overwrite").save("/mnt/mydeltalake/Student")// updating with a new record
val studentDF2 = Seq(
(4, "Ananya")
).toDF("id", "name")studentDF2.write.format("delta").mode("append").save("/mnt/mydeltalake/Stude…