Generating BI Reports on Encrypted Data using Azure Databricks

Image for post
Image for post

Data encryption in cloud is an important factor for the enterprises specially banking and financial companies.

Though different security features (specially in Azure Databricks) have been implemented at different levels e.g. encryption at rest, encryption in flight, role based access controls, network level security etc., enterprises may want to store sensitive information in cloud with their chosen encryption technology.

In this blog, we’ll see how we can achieve this using Azure Databricks, Azure Key Vault and Power BI.

To start with, let’s take an example of customer information where Email and Aadhar are sensitive information.

Email, Aadhar are sensitive information and shouldn’t be stored as plain text in cloud

Encryption Method

In this example we have chosen Java Advanced Encryption Standard (AES) APIs to encrypt and decrypt. For more details on Java AES, follow this nice article.

We have used the sample encrypt and decrypt methods written by the same author.

We have considered a secret key = abcdefgh123456789 and used the following Java code to encrypt all of the records inside the above RED block.

public static void main(String[] args) {
final String secretKey = "abcdefgh123456789";
String originalString = "<input your String>";
String encryptedString = AES.encrypt(originalString, secretKey) ;
String decryptedString = AES.decrypt(encryptedString, secretKey) ;

After encrypting at on-premise, we have the records like below:

Image for post
Image for post
AES encrypted with the secret key

Upload Encrypted Data into Cloud

After encrypting the required columns we have imported the records into an Azure Databricks table (we could store into Azure SQL Database or SQL Data Warehouse as well).

Store Secret at Azure Key Vault

As we’ll be needing the secret key to decrypt the records, we have stored it into Azure Key Vault.

Image for post
Image for post
Azure Key Vault — Create a secret

There are few simple steps required to configure a secret scope at Azure Databricks. Follow my previous blog.

Once configured, check it by executing:

Image for post
Image for post
Execute in a Databricks notebook

Create a Library for Decryption

Again referring the same blog on Java AES, we have created a utility jar (e.g. AES.jar) which contains the decryption method. The method expects two strings as arguments — the column value we want to decrypt and the secret key.

Create a Custom UDF

We need to create a permanent Hive user defined function (UDF) which uses the AES.jar to decrypt input encrypted string.

We can find a sample UDF, here.

To write the UDF we also need hive-exec*.jar to be included. Get the latest jar from:

package org.apache.hadoop.hive.ql.exec.UDF; // hive-exec*.jar
import; // AES.jar
class DecryptString extends UDF {
def evaluate (strToDecrypt: String, secretKey: String): String = {
AES.decrypt(strToDecrypt, secretKey)

The UDF is expecting two parameters: (i) the field to be decrypted, (ii) the secret key which we will fetch from the Azure Key Vault we have configured already.

Once the coding is done, compile and export as a JAR say, DecryptStringv2.jar (ensure, not to include the AES.jar & hive-exec*.jar inside).

Upload the JARs

Let’s go to the Azure Databricks, select or create a cluster and install the JARs — AES.jar (contains the decrypt utility method) and DecryptStringv2.jar (contains the UDF).

Image for post
Image for post

Create a Function

We’ll create a permanent function on the UDF we have created. Follow this for more information on the syntax.

CREATE FUNCTION fnDecryptString AS '';
Image for post
Image for post
The function needs to be permanent

Create a View

We’ll create a view which will use the function fnDecryptString to decrypt the encrypted fields.

%scalaval query: String = 
s"""CREATE OR REPLACE VIEW vw_ClientMasterEncrypted
AS SELECT Name, City,
scope = "my-key-vault-secrets",
key = "secretKey")}') as Email,
scope = "my-key-vault-secrets",
key = "secretKey")}') as Aadhar
from ClientMasterEncrypted"""
spark.sql (query)

Now, if we select from the view, we can get the encrypted fields decrypted!

Image for post
Image for post

Connect from Power BI

We can connect to the view from Power BI. We should restrict to the table/view access to the appropriate BI user only. For further details on access control from BI tool, please check out my blog.

Image for post
Image for post

Points to note

  • The decryption code shouldn’t contain any debug statements/prints/display of the sensitive records. Otherwise, these will be stored in Spark/Databricks logs.
  • Azure Key Vault access should be restricted to specific groups/users. As, the stored secrets can be retrieved from there.
  • Views are executed in memory, however in case of huge data returned by the underneath Spark SQL, the data can spill over to the worker node disks.
  • We should avoid using UDFs unless absolutely necessary as Spark doesn’t optimize the code inside.
  • Calling the decryption functions for huge records may take time so, we should try to generate reports using plain texts columns only (may be few reports having limited record sets need column-decryption).
  • Azure supports encryption at rest for different services; even column level encryption for Azure SQL Database is supported. We can check if those are sufficient for our use cases.


With the above discussed approach, without storing sensitive information as plain texts, we can use views and dynamically decrypt on the fly to serve BI users. However, in case we have some complex decryption steps (e.g. calling some URL with encrypted values/objects and receive decrypted values), this may not be the suitable way to follow.

Thanks for reading. Connect with me on LinkedIn.

Tech enthusiast, Azure Big Data Architect.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store