Databricks — AutoML & Model Serving

Prosenjit Chakraborty
4 min readJun 24, 2021

In our previous blog, we talked about different MLflow components and concentrated on tracking, managing models & deploying into model registry. In this blog, we’ll talk about Databricks AutoML feature and MLflow model serving.

AutoML

Databricks AutoML helps you automatically apply machine learning to a dataset. It prepares the dataset for model training and then performs and records a set of trials, creating, tuning, and evaluating multiple models. It displays the results and provides a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code. AutoML also calculates summary statistics on your dataset and saves this information in a notebook that you can review later.

To start with, let’s first prepare the dataset (I have taken a sample dataset from sklearn) to train the model and save this as a Delta table.

%python
import sklearn
input_pdf = sklearn.datasets.fetch_california_housing(as_frame=True)
chDf = spark.createDataFrame(input_pdf)

chDf.write\
.format("delta")\
.save("/mnt/delta/california_housing")
spark.sql ("CREATE TABLE default.california_housing USING DELTA LOCATION '/mnt/delta/california_housing'")

Once we prepare the training dataset we can use Databricks AutoML experience to train multiple models and present to us.

Create AutoML Experiment

We’ll select a cluster (I’m using Azure Databricks Runtime for Machine Learning 8.3 ML Beta). As this is a regression problem we’re trying to solve, we’ll select ML problem type as Regression (other option available is Classification).

Next, we’ll browse and select the training dataset i.e. the Delta table we’ve created and the prediction target column.

AutoML Experiment Configuration screen.

Once the right configuration is set, AutoML will start training. The training process will last for 60 minutes. We can stop the process if required.

--

--