Databricks — Table Access Restriction For BI Tools

Image for post
Image for post

Azure Databricks can be connected as a data source for Power BI or any other BI reporting tool and different reports can be developed based on the underneath Hive tables/Databricks Delta tables. However, reporting users may not need to know about other tables/views required for other purposes e.g. ETL loads, staging tables etc. We may also want to restrict the access of specific tables/schema to a specific set of users.

If that’s your purpose, Databricks — Table Access Control is the solution you should look for.

If you want to connect to the Databricks and want to access the Hive tables, you need the connection string and one user token. The following few steps guide how to create a connection string and use Power BI to connect to Databricks.

Construct The Databricks Connection String

Image for post
Image for post
Go to your cluster > Advanced Options > JDBC/ODBC > note down the rectangle selection of the JDBC URL and Construct the Server connection string

The PowerBI server connection string will be something like the below:

https://<Databricks_Region>.azuredatabricks.net:443/sql/protocolv1/o/<WorkSpace_Id>/<Cluster_Id>

Generate A User Token

Image for post
Image for post
Go to User Settings and generate a new token
Image for post
Image for post
The selected user shouldn’t have Admin / cluster creation access

Power BI Connectivity To Azure Databricks

Image for post
Image for post
Open Power BI > Get Data
Image for post
Image for post
Look for the ‘Spark’ connector
Image for post
Image for post
Input the Server connection string. select Protocol = HTTP & Data Connectivity mode = DirectQuery
Image for post
Image for post
All the tables/views will be listed!

In the above image, you can find Power BI is listing all the tables/views accessible by the user (who has issued the user token).

In this example, default.transaction_details
is a Databricks Delta table, assume that it’s being updated by an ETL load and analytics.transactions is a view which should only be displayed to the BI users.

Now, to restrict the business users only to access the relevant tables or views, you can follow the following steps.

Databricks Workspace Level Configuration

To start with, ‘Cluster and Jobs Access Control’ and ‘Table Access Control’ — both of these have to be enabled. First one is enabled and second one is disabled respectively by default. So, we just need to enable the Table Access Control.

Image for post
Image for post
Select Admin Console
Image for post
Image for post
By default, it should be enabled, if not enable it.
Image for post
Image for post
Enable this setting as well.
Image for post
Image for post
Press Confirm to proceed.

Databricks Cluster Level Configuration

Image for post
Image for post
Add ‘spark.databricks.acl.sqlOnly true’ at the cluster > Advanced Options > Spark Config and restart the cluster.

Granting Schema Or Table Level Access

Once we have completed the above steps, we can use the following statements to GRANT a user (we should use a dummy user rather than a ‘real’ Databricks user in case of access by any tool/application).

For the commands with all options, follow this.

%sql-- Provide SELECT access to the Database
GRANT SELECT ON DATABASE analytics to `dummy_user@mydomain.com`;
-- Read Metadata access is required so PowerBI can fetch table/view -- details while listing
GRANT READ_METADATA ON DATABASE analytics to `dummy_user@mydomain.com`;
-- Alternative - Provide SELECT access to the View for selective -- access
GRANT SELECT ON VIEW analytics.transactions to `dummy_user@mydomain.com`;
GRANT READ_METADATA ON VIEW analytics.transactions to `dummy_user@mydomain.com`;
-- If required - Exclusively deny access to the table
DENY SELECT ON DATABASE default to `dummy_user@mydomain.com`;

We can check the reporting user (or dummy user) access types by executing SHOW GRANT on the relevant views/tables etc.

Image for post
Image for post
Check who all have different access
Image for post
Image for post
Reporting user should not have access on the ETL tables

Power BI Connectivity After The Changes

Image for post
Image for post
Now only the permitted view can be accessed!

Once we have the right access set for the reporting user, Power BI connection to Databricks with the user token will fetch only the required tables or views.

Points To Note

  • Once we enable spark.databricks.acl.sqlOnly for a cluster (let’s call it ReportingCluster), we can’t run any Scala, Python code — “This version of table access control restricts users on the cluster to SQL commands only.”
  • In case we want to access the tables/views with Scala/Python code, use a separate cluster without the ‘sqlOnly’ configuration (let’s call it ETLCluster).
  • Reporting users/group shouldn’t have access of the ETLCluster. They should have access to the ReportingCluster only.
Image for post
Image for post
Instead of giving access to individual users, better to add them in appropriate groups and group should be given proper access to a cluster
Image for post
Image for post
Edit Cluster > Advanced Options > Permissions
  • If the table ACL is disabled on a cluster (which is the default), owners are not registered if you create a table/view/database. On that case, the admin should assign an owner after the table ACL is enabled.
%sql
ALTER database analytics OWNER TO `new_owner@mydomain.com`;
  • In case, the admin or owner is moving out/leaving the team, do not forget to assign a new owner to the databases/tables/views! Assign a new owner will remove the previous owner.

I hope these will help to setup right access levels to a reporting user for any BI tool connectivity with Databricks. For further details, please refer this.

Written by

Tech enthusiast, Azure Big Data Architect.

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store