Databricks — AutoML & Model Serving
In our previous blog, we talked about different MLflow components and concentrated on tracking, managing models & deploying into model registry. In this blog, we’ll talk about Databricks AutoML feature and MLflow model serving.
AutoML
Databricks AutoML helps you automatically apply machine learning to a dataset. It prepares the dataset for model training and then performs and records a set of trials, creating, tuning, and evaluating multiple models. It displays the results and provides a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code. AutoML also calculates summary statistics on your dataset and saves this information in a notebook that you can review later.
To start with, let’s first prepare the dataset (I have taken a sample dataset from sklearn) to train the model and save this as a Delta table.
%python
import sklearn
input_pdf = sklearn.datasets.fetch_california_housing(as_frame=True)chDf = spark.createDataFrame(input_pdf)
chDf.write\
.format("delta")\
.save("/mnt/delta/california_housing")
spark.sql ("CREATE TABLE default.california_housing USING DELTA LOCATION '/mnt/delta/california_housing'")
Once we prepare the training dataset we can use Databricks AutoML experience to train…