Introduction to Deep Learning in Python (Keras and Tensorflow) using the MNIST Dataset

3 minute read

In this post we will see how to implement basic deep learning in python (Keras and Tensorflow). To make the post shorter, I skip the theoretical basics in this post.

We will implement a basic deep neural network, a convolutional neural network, and a recurrent neural network to learn the MNIST data in this post.

Prerequisite

We will use the following modules and classes in our implementation.

import tensorflow as tf
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout, LSTM
from keras.utils import normalize
import numpy as np
import matplotlib.pyplot as plt

Modules

tensorflow $\rightarrow$ for importing the dataset
keras $\rightarrow$ to use artificial neural network (ANN) functionalities
numpy and matplotlib $\rightarrow$ check the prediction and plot the image

We will use the following classes for different types of neural networks

Common

DNN $\rightarrow$ Dense, Flatten, Dropout

Additional

CNN $\rightarrow$ Conv2D, MaxPooling2D
RNN $\rightarrow$ LSTM

Here, Dense, Flatten, and Dropout are common in all types of ANNs. Let’s take a brief look at what functionalities these classes/methods provide

Dense $\rightarrow$ neural network layer that feeds all outputs from previous layer to its neurons
Flatten $\rightarrow$ serialize a multi-dimensional tensor (e.g., 10x10 2D image to 1x100 flat data)
Dropout $\rightarrow$ prevents overfitting by randomly selecting nodes to be dropped out with a given input probability
conv2D $\rightarrow$ Filter image data while maintaining the relations between pixels, e.g., 3x3 pixel filter (CNN is mainly used for image type data)
MaxPooling2D $\rightarrow$ reduces spatial size by counting the max values within the 2x2 pixel boundaries

Load Data

Let’s first download the MNIST data. Here, we download the dataset using the tensorflow keras API and then normalize the data. For CNN, the additional work is to reshape the train and test data as 4-dimensional numpy arrays.

Load data for NN and RNN

def load_data_NN():
    # load mnist dataset
    mnist = tf.keras.datasets.mnist  # 28 x 28 images of 0-9
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    # normalize data
    x_train = normalize(x_train, axis = 1)
    x_test = normalize(x_test, axis = 1)
    return x_train, y_train, x_test, y_test

Load data for CNN

def load_data_CNN():
    # load mnist dataset
    mnist = tf.keras.datasets.mnist  # 28 x 28 images of 0-9
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    # reshape data
    x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
    x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
    # convert from integers to floats
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    # normalize data
    x_train = normalize(x_train, axis = 1)
    x_test = normalize(x_test, axis = 1)
    return x_train, y_train, x_test, y_test

Neural Network Models

Usually most of the layers except the final output layers use the relu activation. In our final layer, we will use softmax as it converts the scores to a normalized probability distribution.

Additionally, as the model parameters, we will use the adam optimizer and sparse_categorical_crossentropy loss function. You can take a look at the list of all available optimizers here.

A simple DNN

def simple_DNN():
    model = Sequential()
    model.add(Flatten())  # input layer
    model.add(Dense(128, activation = 'relu'))
    model.add(Dense(128, activation = 'relu'))
    model.add(Dense(10, activation = 'softmax'))

    model.compile(optimizer= "adam", 
                  loss= "sparse_categorical_crossentropy", 
                  metrics=["accuracy"])
    return model

Recurrent Neural Network (LSTM)

def RNN(input_shape):
    model.add(LSTM(128, input_shape=input_shape, activation = 'relu', return_sequences=True))
    model.add(Dropout(0.2))
    model.add(LSTM(128, input_shape=input_shape, activation = 'relu'))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation = 'relu'))
    model.add(Dropout(0.2))
    model.add(Dense(10, activation = 'softmax'))

    model.compile(optimizer= "adam", 
                  loss= "sparse_categorical_crossentropy", 
                  metrics=["accuracy"])
    return model

Convolutional Neural Network

def conv_NN(input_shape):
    model = Sequential()

    model.add(Conv2D(32, (3,3),  input_shape = input_shape))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Flatten())  # converts 3D feature maps to 3D feature vectors
    model.add(Dense(100, activation='relu'))
    model.add(Dense(10, activation='softmax'))

    model.compile(loss="sparse_categorical_crossentropy",
                 optimizer="adam",
                 metrics=["accuracy"])
    return model

Additional Utilities

The following function will help us to check the prediction output of a particular index of the image dataset.

def sample_prediction(index):
    plt.imshow(x_test[index].reshape(28, 28),cmap='Greys')
    pred = model.predict(x_test[index].reshape(1, 28, 28, 1))
    print(np.argmax(pred))

Main Function

Now, we can train and test the dataset with our built neural networks.

DNN

if __name__ == "__main__":
    # load data
    x_train, y_train, x_test, y_test = load_data_NN()
	# load the model
    model = simple_DNN()
    
    print("\n\nModel Training\n")
    model.fit(x_train, y_train, epochs = 5)
    
    print("\n\nModel Evaluation\n")
    model.evaluate(x_test, y_test)
    
    print("\n\nSample Prediction")
    sample_prediction(0)

RNN

if __name__ == "__main__":
    # load data
    x_train, y_train, x_test, y_test = load_data_NN()
    
    # load model
    model = RNN(x_train.shape[1:])
    
    print("\n\nModel Training\n")
    model.fit(x_train, y_train, epochs = 5)
    
    print("\n\nModel Evaluation\n")
    model.evaluate(x_test, y_test)
    
    print("\n\nSample Prediction")
    sample_prediction(0)

CNN

if __name__ == "__main__":
    # load data
    x_train, y_train, x_test, y_test = load_data_CNN()
    
    # load model
    input_shape = (28,28,1)
    model = conv_NN(input_shape)
    
    print("\n\nModel Training\n")
    model.fit(x_train, y_train, epochs = 5)
    
    print("\n\nModel Evaluation\n")
    model.evaluate(x_test, y_test)
    
    print("\n\nSample Prediction")
    sample_prediction(0)

For now, the post is very short. I will keep adding contents and details over time with further explanation. Cheers!

References

Share on

Twitter Facebook LinkedIn

Shanto Roy