Machine Learning: Multiclass Classification Template for any Classification Dataset

4 minute read

We do often use classification algorithms to predict a particular class based on provided input features. In this tutorial, we will implement a python class file that includes most of the classification methods.

Utilities

After creating an object of the class, we will be able to do the following:

Split train-test data
Generate Classification Report
Plot Classification Report
Generate Confusion Matrix
Save the model for future use

Included Algorithms

Logistic Regression Classifier
K-Neighbor Classifier
SVM (Linear) Classifier
SVM (RBF) Classifier
Naive Bayes Classifier
Decision Tree Classifier
Random Forest Classifier

Necessary Modules

We will use the following modules in our code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report, confusion_matrix
import joblib
import os

pandas and numpy $\rightarrow$ for data (in dataframe) processing
matplotlib and seaborn $\rightarrow$ for plotting
sklearn.metrics $\rightarrow$ to generate the classification reports and confusion matrices
joblib and os $\rightarrow$ to store the models

Apart from these modules, we will also use sklearn.model_selection for splitting the whole dataset to train and test-data. Also, corresponding sklearn submodules for importing the classifiers. We will include those while creating methods for each classifier.

Train-test Split Function

The following function will help us to split the dataset into train-test data. We can keep it outside the class or inside the class as a static method (using the @static decorator beforehand).

def preprocess(dataset, x_iloc_list, y_iloc, testSize):
    # dataset = pd.read_csv(csv_file)
    X = dataset.iloc[:, x_iloc_list].values 
    y = dataset.iloc[:, y_iloc].values 

    # split into training and testing set
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = testSize, random_state = 0)

    # standardization of values
    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)
    return X_train, X_test, y_train, y_test

Classification Class

Now let’s create our class. here, I name the class as Classification.

Constructor Method

We will need to provide four input arguments while creating an object of the class.

X_Train, X_test $\rightarrow$ train and test data for target features

y_train, y_test $\rightarrow$ train and test data for target labels

class Classification:
    
  def __init__(self, X_train, X_test, y_train, y_test):
      self.X_train = X_train
      self.X_test = X_test
      self.y_train = y_train
      self.y_test = y_test

Accuracy

We can use the following method to obtain accuracy from the confusion matrix.

    def accuracy(self, confusion_matrix):
        sum, total = 0,0
        for i in range(len(confusion_matrix)):
            for j in range(len(confusion_matrix[0])):
                if i == j: 
                    sum += confusion_matrix[i,j]
                total += confusion_matrix[i,j]
        return sum/total

Classification Report plot

The following method will be used to generate plots of the data we achieve in our classification report. We will save the plots in our local sub-directory named clf_plots.

    def classification_report_plot(self, clf_report, filename):
        folder = "clf_plots"
        if not os.path.isdir(folder):
            os.mkdir(folder)
        
        out_file_name = folder + "/" + filename + ".png"
        
        fig=plt.figure(figsize=(16,10))
        sns.set(font_scale=4)
        sns.heatmap(pd.DataFrame(clf_report).iloc[:-1, :].T, annot=True, cmap="Greens")
        fig.savefig(out_file_name, bbox_inches="tight")

Classifiers

Now, let’s add individual method for each classification algorithm. Here in each method, we will

import the classifier
create an object of the classifier class
fit the train data to the model
store our model in model sub-directory
predict with test data
generate classification report
generate confusion matrix

cenerate classification report plot

Linear Regression

  def LR(self):
      from sklearn.linear_model import LogisticRegression
      lr_classifier = LogisticRegression()
      lr_classifier.fit(self.X_train, self.y_train)
      joblib.dump(lr_classifier, "model/lr.sav")
      y_pred = lr_classifier.predict(self.X_test)

      print("### Logistic Regression Classifier ###")
      print('Classification Report: ')
      print(classification_report(self.y_test, y_pred),'\n')
      print('Confusion Matrix: ')
      print(confusion_matrix(self.y_test, y_pred),'\n')
      print('Precision: ', self.accuracy(confusion_matrix(self.y_test, y_pred))*100,'%')

      self.classification_report_plot(classification_report(self.y_test, y_pred, \
                                                                  output_dict=True), "LR")

KNN

	def KNN(self):
        from sklearn.neighbors import KNeighborsClassifier
        knn_classifier = KNeighborsClassifier()
        knn_classifier.fit(self.X_train, self.y_train)
        joblib.dump(knn_classifier, "model/knn.sav")
        y_pred = knn_classifier.predict(self.X_test)
        
        print("### K-Neighbors Classifier ###")
        print('Classification Report: ')
        print(classification_report(self.y_test, y_pred),'\n')
        print('Confusion Matrix: ')
        print(confusion_matrix(self.y_test, y_pred),'\n')
        print('Precision: ', self.accuracy(confusion_matrix(self.y_test, y_pred))*100,'%')

        self.classification_report_plot(classification_report(self.y_test, y_pred, \
                                                                    output_dict=True), "KNN")

SVM (Linear and RBF)

    # kernel type could be 'linear' or 'rbf' (Gaussian)
    def SVM(self, kernel_type):
        from sklearn.svm import SVC
        svm_classifier = SVC(kernel = kernel_type)
        svm_classifier.fit(self.X_train, self.y_train)
        joblib.dump(svm_classifier, "model/svm.sav")
        y_pred = svm_classifier.predict(self.X_test)
        
        print("### Support Vector Classifier (" + kernel_type + ") ###")
        print('Classification Report: ')
        print(classification_report(self.y_test, y_pred),'\n')
        print('Confusion Matrix: ')
        print(confusion_matrix(self.y_test, y_pred),'\n')
        print('Precision: ', self.accuracy(confusion_matrix(self.y_test, y_pred))*100,'%')

        self.classification_report_plot(classification_report(self.y_test, y_pred, \
                                                                    output_dict=True), "SVC"+kernel_type)

Naive-Bayes

    def NB(self):
        from sklearn.naive_bayes import GaussianNB
        nb_classifier = GaussianNB()
        nb_classifier.fit(self.X_train, self.y_train)
        joblib.dump(nb_classifier, "model/nb.sav")
        y_pred = nb_classifier.predict(self.X_test)
        
        print("### Naive Bayes Classifier ###")
        print('Classification Report: ')
        print(classification_report(self.y_test, y_pred),'\n')
        print('Confusion Matrix: ')
        print(confusion_matrix(self.y_test, y_pred),'\n')
        print('Precision: ', self.accuracy(confusion_matrix(self.y_test, y_pred))*100,'%')

        self.classification_report_plot(classification_report(self.y_test, y_pred, \
                                                                    output_dict=True), "NB")

Decision Tree

    def DT(self):
        from sklearn.tree import DecisionTreeClassifier
        tree_classifier = DecisionTreeClassifier()
        tree_classifier.fit(self.X_train, self.y_train)
        joblib.dump(tree_classifier, "model/tree.sav")
        y_pred = tree_classifier.predict(self.X_test)
        
        print("### Decision Tree Classifier ###")
        print('Classification Report: ')
        print(classification_report(self.y_test, y_pred),'\n')
        print('Confusion Matrix: ')
        print(confusion_matrix(self.y_test, y_pred),'\n')
        print('Precision: ', self.accuracy(confusion_matrix(self.y_test, y_pred))*100,'%')

        self.classification_report_plot(classification_report(self.y_test, y_pred, \
                                                                    output_dict=True), "DT")

Random Forest

    def RF(self):
        from sklearn.ensemble import RandomForestClassifier
        rf_classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy')
        rf_classifier.fit(self.X_train, self.y_train)
        joblib.dump(rf_classifier, "model/rf.sav")
        y_pred = rf_classifier.predict(self.X_test)
        
        print("### Random Forest Classifier ###")
        print('Classification Report: ')
        print(classification_report(self.y_test, y_pred),'\n')
        print('Confusion Matrix: ')
        print(confusion_matrix(self.y_test, y_pred),'\n')
        print('Precision: ', self.accuracy(confusion_matrix(self.y_test, y_pred))*100,'%')

        self.classification_report_plot(classification_report(self.y_test, y_pred, \
                                                                    output_dict=True), "RF")

The entire code is available in Github.

Thanks everyone! Have a nice day!!!

Share on

Twitter Facebook LinkedIn

Shanto Roy