10 Classification Methods From Scikit Learn We Should Know

1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
9. Class variable (0 or 1)
df = spark.table ('pima_indians_diabetes')
print(f"""There are {df.count()} records in the dataset.""")
df.show(5)
Image for post
Image for post
import pandas as pd
dataset = df.toPandas()

Data Analysis

Mean & Standard Deviation

dataset.describe().transpose()
Image for post
Image for post

Scatter Matrix

sampled_data = df.drop ("Class").sample(False, 0.8).toPandas()axs = pd.plotting.scatter_matrix(sampled_data, figsize=(10, 10))
num_cols = len(sampled_data.columns)
for cur_col in range(num_cols):
ax = axs[cur_col, 0]
ax.yaxis.label.set_rotation(0)
ax.yaxis.label.set_ha('right')
ax.set_yticks(())
h = axs[num_cols-1, cur_col]
h.xaxis.label.set_rotation(90)
h.set_xticks(())
Image for post
Image for post
The scatter matrix shows, the input features are not correlated.

Data Preprocessing

Train & Test Datasets

X = dataset.iloc[:,:-1].values
y = dataset.iloc[:, -1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = \
train_test_split( \
X, y, \
test_size = 0.25, \
random_state = 0)

Standard Scaling

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

1. Logistic Regression

Image for post
Image for post
Example: Graph of a logistic regression curve showing probability of passing an exam versus hours studying (reference: Wikipedia).
Image for post
Image for post
Reference: https://www.saedsayad.com/logistic_regression.htm

Implementation

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state = 0)
lr.fit(X_train, y_train)
Image for post
Image for post
y_pred_lr = lr.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm_lr = confusion_matrix(y_test, y_pred_lr)
print(cm_lr)
acc_lr = accuracy_score(y_test, y_pred_lr)
print (acc_lr)
Image for post
Image for post

Confusion Matrix

Image for post
Image for post
Reference: Wikipedia

Accuracy Score

2. K-Nearest Neighbours (K-NN)

Image for post
Image for post
Image source: Wikipedia

Implementation

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, \
metric = 'minkowski', p = 2)
knn.fit(X_train, y_train)
Image for post
Image for post
y_pred_knn = knn.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm_knn = confusion_matrix(y_test, y_pred_knn)
print (cm_knn)
acc_knn = accuracy_score(y_test, y_pred_knn)
print (acc_knn)
Image for post
Image for post

3. SVC (Support Vector Classifier) with Linear Kernel

Image for post
Image for post
Image for post
Image for post
Image source: Scikit Learn

Implementation

from sklearn.svm import SVC
svc = SVC(kernel = 'linear', random_state = 0)
svc.fit(X_train, y_train)
Image for post
Image for post
y_pred_svc = svc.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm_svc = confusion_matrix(y_test, y_pred_svc)
print (cm_svc)
acc_svc = accuracy_score(y_test, y_pred_svc)
print (acc_svc)
Image for post
Image for post

4. Kernel SVM (Support Vector Machine)

Image for post
Image for post
Source: scikit-learn
Image for post
Image for post
Image source: Scikit Learn; using linear, polynomial & RBF kernels respectively.

Implementation

from sklearn.svm import SVC
svc_rbf = SVC(kernel = 'rbf', random_state = 0)
svc_rbf.fit(X_train, y_train)
Image for post
Image for post
y_pred_svc_rbf = svc_rbf.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred_svc_rbf)
print (cm)
acc_svc_rbf = accuracy_score(y_test, y_pred_svc_rbf)
print (acc_svc_rbf)
Image for post
Image for post

5. Naïve Bayes

Image for post
Image for post
Bayes’ Theorem — mathematical formula, source: Wikipedia
Image for post
Image for post
Refer the Probabilistic model for further reading.

Implementation

from sklearn.naive_bayes import GaussianNB
nb = GaussianNB()
nb.fit(X_train, y_train)
Image for post
Image for post
y_pred = nb.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
acc_nb = accuracy_score(y_test, y_pred)
print(acc_nb)
Image for post
Image for post

6. Decision Tree

Image for post
Image for post
Image source: Learning Spark, 2nd Edition, figure 10–9. Decision tree example.

Implementation

from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
dt.fit(X_train, y_train)
Image for post
Image for post
Image for post
Image for post
Reference: source
y_pred = dt.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print (cm)
acc_dt = accuracy_score(y_test, y_pred)
print (acc_dt)
Image for post
Image for post

7. Random Forest

Image for post
Image for post
Image source: Wikipedia

Implementation

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(max_depth=2, random_state=0)
rf.fit(X_train, y_train)
Image for post
Image for post
y_pred_rf = rf.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred_rf)
print (cm)
acc_rf = accuracy_score(y_test, y_pred_rf)
print (acc_rf)
Image for post
Image for post

8. AdaBoost Classifier

Implementation

from sklearn.ensemble import AdaBoostClassifier
abc = AdaBoostClassifier()
abc.fit(X_train, y_train)
Image for post
Image for post
y_pred_abc = abc.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred_abc)
print (cm)
acc_abc = accuracy_score(y_test, y_pred_abc)
print (acc_abc)

9. Quadratic Discriminant Analysis

Implementation

from sklearn.discriminant_analysis \
import QuadraticDiscriminantAnalysis
qda = QuadraticDiscriminantAnalysis()
qda.fit(X_train, y_train)
Image for post
Image for post
y_pred_qda = qda.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred_qda)
print (cm)
acc_qda = accuracy_score(y_test, y_pred_qda)
print (acc_qda)
Image for post
Image for post

10. MLP Classifier

Image for post
Image for post
Image source: Scikit Learn

Implementation

from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(alpha=1, max_iter=1000)
mlp.fit(X_train, y_train)
Image for post
Image for post
y_pred_mlp = mlp.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred_mlp)
print (cm)
acc_mlp = accuracy_score(y_test, y_pred_mlp)
print (acc_mlp)
Image for post
Image for post

Conclusion

Written by

Tech enthusiast, Azure Big Data Architect.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store