Now it’s time to actually build the neural network architecture. Let’s start with the input layer (input1). So this layer basically takes all the image samples in our X data. Hence we need to ensure that the first layer accepts the exact same shape as the image size. It’s worth noting that what we need to define is only (width, height, channels), instead of (samples, width, height, channels).
Afterwards, this input1 layer is connected to several convolution-pooling layer pairs before eventually being flattened and connected to dense layers. Notice that all hidden layers in the model are using ReLU activation function due to the fact that ReLU is faster to compute compared to sigmoid, and thus, the training time required is shorter. Lastly, the last layer to connect is output1, where it consists of 3 neurons with softmax activation function. Here softmax is used because we want the outputs to be the probability value of each class.
input1 = Input(shape=(X_train.shape, X_train.shape, 1))
cnn = Conv2D(16, (3, 3), activation='relu', strides=(1, 1),
cnn = Conv2D(32, (3, 3), activation='relu', strides=(1, 1),
cnn = MaxPool2D((2, 2))(cnn)
cnn = Conv2D(16, (2, 2), activation='relu', strides=(1, 1),
cnn = Conv2D(32, (2, 2), activation='relu', strides=(1, 1),
cnn = MaxPool2D((2, 2))(cnn)
cnn = Flatten()(cnn)
cnn = Dense(100, activation='relu')(cnn)
cnn = Dense(50, activation='relu')(cnn)
output1 = Dense(3, activation='softmax')(cnn)
model = Model(inputs=input1, outputs=output1)
After constructing the neural network using the code above, we can display the summary of our model by applying summary() to model object. Below is how our CNN model looks like in details. We can see here that we got 8 million params in total — which is a lot. Well, that’s why I run this code on Kaggle notebook.
Anyway, after the model being constructed, now we need to compile the neural net using categorical cross entropy loss function and Adam optimizer. So the loss function is used since it’s just the one that’s commonly used in multiclass classification task. Meanwhile, I choose Adam as the optimizer since it’s just the best one to minimize loss value in most neural network tasks.
Now it’s time to train the model! Here we are going to use fit_generator() instead of fit() because we are going to take the train data from train_gen object. — If you pay attention to the data augmentation part, you’ll notice that train_gen is created using both X_train and y_train_one_hot. Therefore, we don’t need to explicitly define the X-y pairs in the fit_generator() method.
history = model.fit_generator(train_gen, epochs=30,
What’s so special with train_gen is that the training process is going to be done using samples with some randomness. So all training data that we have in X_train is not directly fed into the neural network. Instead, those samples are going to be used as the basis of the generator to generate new image with some random transformations. Moreover, this generator produces different images in each epoch which is extremely good for our neural network classifier to better generalize samples in test set. And well, below is how the training process goes.
163/163 [==============================] - 19s 114ms/step - loss: 5.7014 - acc: 0.6133 - val_loss: 0.7971 - val_acc: 0.7228
163/163 [==============================] - 18s 111ms/step - loss: 0.5575 - acc: 0.7650 - val_loss: 0.8788 - val_acc: 0.7308
163/163 [==============================] - 17s 102ms/step - loss: 0.5267 - acc: 0.7784 - val_loss: 0.6668 - val_acc: 0.7917
163/163 [==============================] - 17s 104ms/step - loss: 0.4915 - acc: 0.7922 - val_loss: 0.7079 - val_acc: 0.8045
The entire training itself took my Kaggle notebook around 10 minutes. So be patient! After being trained, we can plot the improvement of accuracy score and the decrease of loss value like this:
According to the two figures above, we can say that the performance of the model keeps improving, even though both the testing accuracy and loss value look fluctuating within this 30 epochs. Another important thing to notice here is that this model does not suffer from overfitting thanks to the data augmentation method we applied in the earlier part of this project. We can see here that the accuracy on train and test data are 79% and 80% respectively at the final iteration.
Fun fact: before implementing data augmentation method, I got 100% accuracy on train data and 64% on test data, which is extremely overfitting. So we can clearly see here that augmenting train data is very effective to both improve test accuracy score while at the same time also reduces overfitting.