Member-only story

Sharing Databricks Hive Metastore

Prosenjit Chakraborty
5 min readJul 21, 2021

--

Databricks Workspace instance contains internal Hive metastore accessible by all its clusters to persist table metadata. However, instead of its own metastore, Databricks can connect to external Hive metastore as well.

External Hive metastore can be connected by using thrift service or by connecting directly to the metastore database.

Databricks cluster — advanced property to connect via thrift service:

spark.hadoop.hive.metastore.uris thrift://<hive-thrift-server-connection-url>:<thrift-server-port>

Databricks cluster — advanced property to connect directly to metastore database:

Hive metastore connection specific entries, to be added into Databricks cluster Configuration > Advanced Options > Spark > Spark Config.

javax.jdo.option.ConnectionURL <hive-metastore-db-jdbc-connection-string>
javax.jdo.option.ConnectionDriverName <hive-metastore-db-jdbc-driver-class>
javax.jdo.option.ConnectionUserName {{secrets/<my-secret-scope>/<hive-conn-userid-key-name>}}
javax.jdo.option.ConnectionPassword {{secrets/<my-secret-scope>/<hive-conn-pass-key-name>}}

In case we want to read data from ADLS Gen 2, we can append the spark config with:

fs.azure.account.auth.type OAuth
fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<directory-id>/oauth2/token
fs.azure.account.oauth2.client.id {{secrets/<my-secret-scope>/<application-id>}}
fs.azure.account.oauth2.client.secret {{secrets/<my-secret-scope>/<service-credential-key-name>}}
fs.azure.createRemoteFileSystemDuringInitialization true

In case we want to read data from ADLS Gen 1,

fs.adl.oauth2.access.token.provider.type ClientCredential
fs.adl.oauth2.refresh.url https://login.microsoftonline.com/<directory-id>/oauth2/token
fs.adl.oauth2.client.id {{secrets/<my-secret-scope>/<application-id>}}
fs.adl.oauth2.credential {{secrets/<my-secret-scope>/<service-credential-key-name>}}

The above approach will need effort to create & maintain a centralize Hive metastore…

--

--

Prosenjit Chakraborty
Prosenjit Chakraborty

Written by Prosenjit Chakraborty

Tech enthusiast, Principal Architect — Data & AI.

Responses (4)

Write a response