atabricks Workspace instance contains internal Hive metastore accessible by all its clusters to persist table metadata. However, instead of its own metastore, Databricks can connect to external Hive metastore as well.

External Hive metastore can be connected by using thrift service or by connecting directly to the metastore database.

Databricks cluster — advanced property to connect via thrift service:


Image source: Unsplash

eature Stores are being used for few years now to manage machine learning data/features. Google’s Feast, an open source feature store or Uber’s Michelangelo, its very own machine learning platform which has a feature data management layer, often fascinate other companies to either implement or buy a centralize feature storage…

our previous blog, we talked about different MLflow components and concentrated on tracking, managing models & deploying into model registry. In this blog, we’ll talk about Databricks AutoML feature and MLflow model serving.


Databricks AutoML helps you automatically apply machine learning to a dataset. It prepares the dataset for…

atabricks is one of the top choices among data scientists to run their ML codes. To help them to manage their codes and models, MLflow has been integrated with Databricks.

MLflow is an open source platform for managing the end-to-end machine learning lifecycle..Azure Databricks provides a fully managed and hosted…

How about a classifier to classify humans characters into Good, Bad or Ugly! Image source.

Spark MLlib is a distributed machine learning framework comprising a set of popular machine learning libraries and utilities. As this use Spark Core for parallel computing, so really useful to apply the algorithms on big data sets.

In this blog, we’ll use 9 well known classifiers to classify the Banknote…

“Sensitive data is a part of every large organization’s normal business practice. Allowing sensitive data from production applications to be copied and used for development and testing environments increases the potential for theft, loss or exposure — thus increasing the organization’s risk. Data masking is emerging as a best practice…

Prosenjit Chakraborty

Tech enthusiast, Azure Big Data Architect.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store