8 Clustering Methods From Scikit Learn We Should Know
Scikit Learn is an open source, Python based very popular machine learning library. It supports various supervised (regression and classification) and unsupervised learning models.
In this blog, we’ll use 8 clustering methods or unsupervised machine learning models on the Iris Plants database (download from here and for details, refer here). Instead of going deep into the algorithms or mathematical details, we have limited our discussion on using the Scikit Learn clustering methods only.
The database contains the following details:
1. Sepal Length in cm
2. Sepal Width in cm
3. Petal Length in cm
4. Petal Width in cm
5. Class: Iris Setosa or Iris Versicolour or Iris Virginica
Let’s load the data and use it as a Spark table…
df = spark.table ('iris_data_set')
print(f"""There are {df.count()} records in the dataset.""")
labelCol = "Class"
df.show(5)
…and convert the Spark DataFrame into a Panda DataFrame:
import pandas as pd
dataset = df.toPandas()