Azure Databricks can be connected as a data source for Power BI or any other BI reporting tool and different reports can be developed based on the underneath Hive tables/Databricks Delta tables. However, reporting users may not need to know about other tables/views required for other purposes e.g. ETL loads, staging tables etc. We may also want to restrict the access of specific tables/schema to a specific set of users.
If that’s your purpose, Databricks — Table Access Control is the solution you should look for.
If you want to connect to the Databricks and want to access the Hive tables, you need the connection string and one user token. The following few steps guide how to create a connection string and use Power BI to connect to Databricks.
Construct The Databricks Connection String
The PowerBI server connection string will be something like the below:
Generate A User Token
Power BI Connectivity To Azure Databricks
In the above image, you can find Power BI is listing all the tables/views accessible by the user (who has issued the user token).
In this example,
is a Databricks Delta table, assume that it’s being updated by an ETL load and
analytics.transactions is a view which should only be displayed to the BI users.
Now, to restrict the business users only to access the relevant tables or views, you can follow the following steps.
Databricks Workspace Level Configuration
To start with, ‘Cluster and Jobs Access Control’ and ‘Table Access Control’ — both of these have to be enabled. First one is enabled and second one is disabled respectively by default. So, we just need to enable the Table Access Control.
Databricks Cluster Level Configuration
Granting Schema Or Table Level Access
Once we have completed the above steps, we can use the following statements to GRANT a user (we should use a dummy user rather than a ‘real’ Databricks user in case of access by any tool/application).
For the commands with all options, follow this.
%sql-- Provide SELECT access to the Database
GRANT SELECT ON DATABASE analytics to `email@example.com`;
-- Read Metadata access is required so PowerBI can fetch table/view -- details while listing
GRANT READ_METADATA ON DATABASE analytics to `firstname.lastname@example.org`;-- Alternative - Provide SELECT access to the View for selective -- access
GRANT SELECT ON VIEW analytics.transactions to `email@example.com`;
GRANT READ_METADATA ON VIEW analytics.transactions to `firstname.lastname@example.org`;-- If required - Exclusively deny access to the table
DENY SELECT ON DATABASE default to `email@example.com`;
We can check the reporting user (or dummy user) access types by executing
SHOW GRANT on the relevant views/tables etc.
Power BI Connectivity After The Changes
Once we have the right access set for the reporting user, Power BI connection to Databricks with the user token will fetch only the required tables or views.
Points To Note
- Once we enable spark.databricks.acl.sqlOnly for a cluster (let’s call it ReportingCluster), we can’t run any Scala, Python code — “This version of table access control restricts users on the cluster to SQL commands only.”
- In case we want to access the tables/views with Scala/Python code, use a separate cluster without the ‘sqlOnly’ configuration (let’s call it ETLCluster).
- Reporting users/group shouldn’t have access of the ETLCluster. They should have access to the ReportingCluster only.
- If the table ACL is disabled on a cluster (which is the default), owners are not registered if you create a table/view/database. On that case, the admin should assign an owner after the table ACL is enabled.
ALTER database analytics OWNER TO `firstname.lastname@example.org`;
- In case, the admin or owner is moving out/leaving the team, do not forget to assign a new owner to the databases/tables/views! Assign a new owner will remove the previous owner.
I hope these will help to setup right access levels to a reporting user for any BI tool connectivity with Databricks. For further details, please refer this.