Week 1 - Feedforward network with MNIST dataset

Week 1 of the plan was to train a feedforward network using the MNIST dataset. This is probably the easiest and most straightforward example to understand how the training cycle goes.

Before starting, I should mention that the code for this week -and the other weeks too- was not written from scratch by me. And that’s the main difference between school and work. It treated this homework the same way I treat work. I can google what I want and understand/refactor it. It can also be about reading about a specific network and understanding the recommended range of values for a specific parameter or the recommended layers structure.

This time, it was too simple that I copied the code and started playing with it to understand how it works. And as we move forward in the weeks, I had to write more myself. Consider it a Hello World week. Now lets see the code!

You can find the whole python notebook on Github here.

import time
import numpy as np
from matplotlib import pyplot as plt
from keras.utils import np_utils
import keras.callbacks as cb
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import RMSprop
from keras.datasets import mnist

def load_and_prepare_data():
    print('Loading data...')

    # Keras provides MNIST dataset as two tuples for training and testing sets.
    # https://keras.io/datasets/#mnist-database-of-handwritten-digits
    (X_train, y_train), (X_test, y_test) = mnist.load_data()

    print('Done.')
    print('Preparing data...')

    # The input datasets should have values between 0-1 instead of having a range of 0-255.
    # This requires typecasting them from int to float at first.
    X_train = X_train.astype('float32')
    X_test = X_test.astype('float32')
    X_train /= 255
    X_test /= 255

    # The output datasets should be converted from simply having a value between 0-9 to having
    # a matrix of categorical one-hot encoding. Example:
    # 2 -> [0 0 1 0 0 0 0 0 0 0]
    # 9 -> [0 0 1 0 0 0 0 0 0 9]
    y_train = np_utils.to_categorical(y_train, 10)
    y_test = np_utils.to_categorical(y_test, 10)

    # The MINST input is 28x28 grayscale images. Let's convert them to single row vectors.
    X_train = np.reshape(X_train, (60000, 784))
    X_test = np.reshape(X_test, (10000, 784))

    print('Done.')
    return [X_train, X_test, y_train, y_test]

def compile_model():
    print('Compiling model...')

    start_time = time.time()

    model = Sequential()
    model.add(Dense(500, input_dim=784))
    model.add(Activation('relu'))
    model.add(Dropout(0.4))
    model.add(Dense(300))
    model.add(Activation('relu'))
    model.add(Dropout(0.4))
    model.add(Dense(10))
    model.add(Activation('softmax'))

    rms = RMSprop()
    model.compile(loss='categorical_crossentropy', optimizer=rms, metrics=['accuracy'])

    print('Model compiled in {0} seconds.'.format(time.time() - start_time))
    return model

def compile_model():
    print('Compiling model...')

    start_time = time.time()

    model = Sequential()
    model.add(Dense(500, input_dim=784))
    model.add(Activation('relu'))
    model.add(Dropout(0.4))
    model.add(Dense(300))
    model.add(Activation('relu'))
    model.add(Dropout(0.4))
    model.add(Dense(10))
    model.add(Activation('softmax'))

    rms = RMSprop()
    model.compile(loss='categorical_crossentropy', optimizer=rms, metrics=['accuracy'])

    print('Model compiled in {0} seconds.'.format(time.time() - start_time))
    return model

class loss_history_callback(cb.Callback):
    def on_train_begin(self, logs={}):
        self.losses = []

    def on_batch_end(self, batch, logs={}):
        batch_loss = logs.get('loss')
        self.losses.append(batch_loss)

def train_model(model=None, data=None, epochs=20, batch=256):
    try:
        start_time = time.time()

        X_train, X_test, y_train, y_test = data

        print('Training model...')

        loss_history = loss_history_callback()
        model.fit(X_train, y_train, epochs=epochs, batch_size=batch,
                  callbacks=[loss_history],
                  validation_data=(X_test, y_test), verbose=2)

        print("Training duration : {0} seconds.".format(time.time() - start_time))

        score = model.evaluate(X_test, y_test, batch_size=16)

        print("Model's test score [loss, accuracy]: {0}".format(score))

        return model, loss_history.losses
    except KeyboardInterrupt:
        # This way we can interrupt the model training at any time without losing data collected so far.
        print('>>> KeyboardInterrupt')
        return model, loss_history.losses

# Use the losses collected through the history callback method to plot results.
def plot_losses(losses):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.plot(losses)
    ax.set_title('Loss per batch')

Output

Here is one example for running the above code and the resulted output

model = compile_model()
model, losses = train_model(model=model, data=data)
plot_losses(losses)
Compiling model...
Model compield in 0.14888596534729004 seconds.
Training model...
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
 - 7s - loss: 0.3536 - acc: 0.8914 - val_loss: 0.1571 - val_acc: 0.9510
Epoch 2/20
 - 6s - loss: 0.1558 - acc: 0.9532 - val_loss: 0.1012 - val_acc: 0.9669
Epoch 3/20
 - 6s - loss: 0.1151 - acc: 0.9652 - val_loss: 0.0877 - val_acc: 0.9720
Epoch 4/20
 - 6s - loss: 0.0953 - acc: 0.9710 - val_loss: 0.0814 - val_acc: 0.9758
Epoch 5/20
 - 6s - loss: 0.0810 - acc: 0.9755 - val_loss: 0.0733 - val_acc: 0.9789
Epoch 6/20
 - 6s - loss: 0.0716 - acc: 0.9780 - val_loss: 0.0789 - val_acc: 0.9781
Epoch 7/20
 - 6s - loss: 0.0644 - acc: 0.9808 - val_loss: 0.0652 - val_acc: 0.9831
Epoch 8/20
 - 6s - loss: 0.0575 - acc: 0.9823 - val_loss: 0.0658 - val_acc: 0.9828
Epoch 9/20
 - 6s - loss: 0.0540 - acc: 0.9830 - val_loss: 0.0677 - val_acc: 0.9824
Epoch 10/20
 - 9s - loss: 0.0490 - acc: 0.9847 - val_loss: 0.0709 - val_acc: 0.9824
Epoch 11/20
 - 7s - loss: 0.0465 - acc: 0.9861 - val_loss: 0.0711 - val_acc: 0.9827
Epoch 12/20
 - 7s - loss: 0.0474 - acc: 0.9860 - val_loss: 0.0696 - val_acc: 0.9828
Epoch 13/20
 - 7s - loss: 0.0433 - acc: 0.9872 - val_loss: 0.0701 - val_acc: 0.9834
Epoch 14/20
 - 6s - loss: 0.0394 - acc: 0.9882 - val_loss: 0.0743 - val_acc: 0.9827
Epoch 15/20
 - 7s - loss: 0.0396 - acc: 0.9882 - val_loss: 0.0706 - val_acc: 0.9834
Epoch 16/20
 - 7s - loss: 0.0352 - acc: 0.9893 - val_loss: 0.0757 - val_acc: 0.9846
Epoch 17/20
 - 6s - loss: 0.0348 - acc: 0.9893 - val_loss: 0.0747 - val_acc: 0.9832
Epoch 18/20
 - 6s - loss: 0.0357 - acc: 0.9896 - val_loss: 0.0762 - val_acc: 0.9839
Epoch 19/20
 - 6s - loss: 0.0345 - acc: 0.9897 - val_loss: 0.0709 - val_acc: 0.9853
Epoch 20/20
 - 6s - loss: 0.0342 - acc: 0.9898 - val_loss: 0.0705 - val_acc: 0.9848
Training duration : 131.75203776359558 seconds.
10000/10000 [==============================] - 2s 167us/step
Model's test score [loss, accuracy]: [0.07048261645096685, 0.9848]

Markdowm Image

Resources:

comments powered by Disqus
rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora