Machine Learning: Multiclass Classification Template for any Classification Dataset
We do often use classification algorithms to predict a particular class based on provided input features. In this tutorial, we will implement a python class file that includes most of the classification methods.
Utilities
After creating an object of the class, we will be able to do the following:
- Split train-test data
- Generate Classification Report
- Plot Classification Report
- Generate Confusion Matrix
- Save the model for future use
Included Algorithms
- Logistic Regression Classifier
- K-Neighbor Classifier
- SVM (Linear) Classifier
- SVM (RBF) Classifier
- Naive Bayes Classifier
- Decision Tree Classifier
- Random Forest Classifier
Necessary Modules
We will use the following modules in our code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report, confusion_matrix
import joblib
import os
pandas
andnumpy
$\rightarrow$ for data (in dataframe) processingmatplotlib
andseaborn
$\rightarrow$ for plottingsklearn.metrics
$\rightarrow$ to generate the classification reports and confusion matricesjoblib
andos
$\rightarrow$ to store the models
Apart from these modules, we will also use sklearn.model_selection
for splitting the whole dataset to train and test-data. Also, corresponding sklearn
submodules for importing the classifiers. We will include those while creating methods for each classifier.
Train-test Split Function
The following function will help us to split the dataset into train-test data. We can keep it outside the class or inside the class as a static method (using the @static
decorator beforehand).
def preprocess(dataset, x_iloc_list, y_iloc, testSize):
# dataset = pd.read_csv(csv_file)
X = dataset.iloc[:, x_iloc_list].values
y = dataset.iloc[:, y_iloc].values
# split into training and testing set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = testSize, random_state = 0)
# standardization of values
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
return X_train, X_test, y_train, y_test
Classification Class
Now let’s create our class. here, I name the class as Classification
.
Constructor Method
We will need to provide four input arguments while creating an object of the class.
- X_Train, X_test $\rightarrow$ train and test data for target features
- y_train, y_test $\rightarrow$ train and test data for target labels
class Classification: def __init__(self, X_train, X_test, y_train, y_test): self.X_train = X_train self.X_test = X_test self.y_train = y_train self.y_test = y_test
Accuracy
We can use the following method to obtain accuracy from the confusion matrix.
def accuracy(self, confusion_matrix):
sum, total = 0,0
for i in range(len(confusion_matrix)):
for j in range(len(confusion_matrix[0])):
if i == j:
sum += confusion_matrix[i,j]
total += confusion_matrix[i,j]
return sum/total
Classification Report plot
The following method will be used to generate plots of the data we achieve in our classification report. We will save the plots in our local sub-directory named clf_plots
.
def classification_report_plot(self, clf_report, filename):
folder = "clf_plots"
if not os.path.isdir(folder):
os.mkdir(folder)
out_file_name = folder + "/" + filename + ".png"
fig=plt.figure(figsize=(16,10))
sns.set(font_scale=4)
sns.heatmap(pd.DataFrame(clf_report).iloc[:-1, :].T, annot=True, cmap="Greens")
fig.savefig(out_file_name, bbox_inches="tight")
Classifiers
Now, let’s add individual method for each classification algorithm. Here in each method, we will
- import the classifier
- create an object of the classifier class
- fit the train data to the model
- store our model in
model
sub-directory - predict with test data
- generate classification report
- generate confusion matrix
- cenerate classification report plot
Linear Regression
def LR(self): from sklearn.linear_model import LogisticRegression lr_classifier = LogisticRegression() lr_classifier.fit(self.X_train, self.y_train) joblib.dump(lr_classifier, "model/lr.sav") y_pred = lr_classifier.predict(self.X_test) print("### Logistic Regression Classifier ###") print('Classification Report: ') print(classification_report(self.y_test, y_pred),'\n') print('Confusion Matrix: ') print(confusion_matrix(self.y_test, y_pred),'\n') print('Precision: ', self.accuracy(confusion_matrix(self.y_test, y_pred))*100,'%') self.classification_report_plot(classification_report(self.y_test, y_pred, \ output_dict=True), "LR")
KNN
def KNN(self):
from sklearn.neighbors import KNeighborsClassifier
knn_classifier = KNeighborsClassifier()
knn_classifier.fit(self.X_train, self.y_train)
joblib.dump(knn_classifier, "model/knn.sav")
y_pred = knn_classifier.predict(self.X_test)
print("### K-Neighbors Classifier ###")
print('Classification Report: ')
print(classification_report(self.y_test, y_pred),'\n')
print('Confusion Matrix: ')
print(confusion_matrix(self.y_test, y_pred),'\n')
print('Precision: ', self.accuracy(confusion_matrix(self.y_test, y_pred))*100,'%')
self.classification_report_plot(classification_report(self.y_test, y_pred, \
output_dict=True), "KNN")
SVM (Linear and RBF)
# kernel type could be 'linear' or 'rbf' (Gaussian)
def SVM(self, kernel_type):
from sklearn.svm import SVC
svm_classifier = SVC(kernel = kernel_type)
svm_classifier.fit(self.X_train, self.y_train)
joblib.dump(svm_classifier, "model/svm.sav")
y_pred = svm_classifier.predict(self.X_test)
print("### Support Vector Classifier (" + kernel_type + ") ###")
print('Classification Report: ')
print(classification_report(self.y_test, y_pred),'\n')
print('Confusion Matrix: ')
print(confusion_matrix(self.y_test, y_pred),'\n')
print('Precision: ', self.accuracy(confusion_matrix(self.y_test, y_pred))*100,'%')
self.classification_report_plot(classification_report(self.y_test, y_pred, \
output_dict=True), "SVC"+kernel_type)
Naive-Bayes
def NB(self):
from sklearn.naive_bayes import GaussianNB
nb_classifier = GaussianNB()
nb_classifier.fit(self.X_train, self.y_train)
joblib.dump(nb_classifier, "model/nb.sav")
y_pred = nb_classifier.predict(self.X_test)
print("### Naive Bayes Classifier ###")
print('Classification Report: ')
print(classification_report(self.y_test, y_pred),'\n')
print('Confusion Matrix: ')
print(confusion_matrix(self.y_test, y_pred),'\n')
print('Precision: ', self.accuracy(confusion_matrix(self.y_test, y_pred))*100,'%')
self.classification_report_plot(classification_report(self.y_test, y_pred, \
output_dict=True), "NB")
Decision Tree
def DT(self):
from sklearn.tree import DecisionTreeClassifier
tree_classifier = DecisionTreeClassifier()
tree_classifier.fit(self.X_train, self.y_train)
joblib.dump(tree_classifier, "model/tree.sav")
y_pred = tree_classifier.predict(self.X_test)
print("### Decision Tree Classifier ###")
print('Classification Report: ')
print(classification_report(self.y_test, y_pred),'\n')
print('Confusion Matrix: ')
print(confusion_matrix(self.y_test, y_pred),'\n')
print('Precision: ', self.accuracy(confusion_matrix(self.y_test, y_pred))*100,'%')
self.classification_report_plot(classification_report(self.y_test, y_pred, \
output_dict=True), "DT")
Random Forest
def RF(self):
from sklearn.ensemble import RandomForestClassifier
rf_classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy')
rf_classifier.fit(self.X_train, self.y_train)
joblib.dump(rf_classifier, "model/rf.sav")
y_pred = rf_classifier.predict(self.X_test)
print("### Random Forest Classifier ###")
print('Classification Report: ')
print(classification_report(self.y_test, y_pred),'\n')
print('Confusion Matrix: ')
print(confusion_matrix(self.y_test, y_pred),'\n')
print('Precision: ', self.accuracy(confusion_matrix(self.y_test, y_pred))*100,'%')
self.classification_report_plot(classification_report(self.y_test, y_pred, \
output_dict=True), "RF")
The entire code is available in Github.
Thanks everyone! Have a nice day!!!
Leave a comment