Next up, let’s talk about how the number of features affect classification performance. Different to the previous one, here I am going to use potato leaf disease dataset (it’s here). What basically I wanna perform here is to train two CNN models (again), where the first one is using max-pooling while the next one doesn’t. So below is how the first model looks like.
Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_10 (Conv2D) (None, 254, 254, 16) 448
_________________________________________________________________
conv2d_11 (Conv2D) (None, 252, 252, 32) 4640
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 63, 63, 32) 0
_________________________________________________________________
flatten_6 (Flatten) (None, 127008) 0
_________________________________________________________________
dense_18 (Dense) (None, 100) 12700900
_________________________________________________________________
dense_19 (Dense) (None, 50) 5050
_________________________________________________________________
dense_20 (Dense) (None, 3) 153
=================================================================
Total params: 12,711,191
Trainable params: 12,711,191
Non-trainable params: 0
_________________________________________________________________
As you can see the model summary above, the max-pooling layer is placed right after the last convolution layer. — Well, the size of this pooling layer is 4×4, but it’s just not mentioned in this summary. Also, note that the number of neurons in the flatten layer is only 127,008.
The output showing the training progress bar looks like below. Here you can see that the model is performing excellent since the accuracy towards validation data is reaching up to 93%.
Epoch 18/20
23/23 [==============================] - 1s 31ms/step - loss: 1.0573e-05 - acc: 1.0000 - val_loss: 0.3191 - val_acc: 0.9389
Epoch 19/20
23/23 [==============================] - 1s 31ms/step - loss: 9.6892e-06 - acc: 1.0000 - val_loss: 0.3215 - val_acc: 0.9389
Epoch 20/20
23/23 [==============================] - 1s 31ms/step - loss: 8.9155e-06 - acc: 1.0000 - val_loss: 0.3233 - val_acc: 0.9389
But remember that we want the model to be AS OVERFIT AS POSSIBLE. So my approach here is to modify the CNN such that it no longer contains max-pooling layer. Below is the detail of the modified CNN model.
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_6 (Conv2D) (None, 254, 254, 16) 448
_________________________________________________________________
conv2d_7 (Conv2D) (None, 252, 252, 32) 4640
_________________________________________________________________
flatten_4 (Flatten) (None, 2032128) 0
_________________________________________________________________
dense_12 (Dense) (None, 100) 203212900
_________________________________________________________________
dense_13 (Dense) (None, 50) 5050
_________________________________________________________________
dense_14 (Dense) (None, 3) 153
=================================================================
Total params: 203,223,191
Trainable params: 203,223,191
Non-trainable params: 0
_________________________________________________________________
Now, you can see the summary above that the number of neurons in the flatten layer is 2,032,128. This number is a lot more compared to the one that we see in the CNN which contains pooling layer in it. Remember, in CNN the actual classification task is done by the dense layer, which basically means that the number of neurons in this flatten layer acts kinda like the number of the input features.
Theoretically speaking, the absence of the pooling layer will cause the model to get more overfit due to the fact that the number of features is a lot higher compared to the previous CNN model. In order to prove, let’s just fit the model and see the result below.
Epoch 15/20
23/23 [==============================] - 2s 65ms/step - loss: 0.0016 - acc: 1.0000 - val_loss: 2.4898 - val_acc: 0.6667
Epoch 16/20
23/23 [==============================] - 2s 68ms/step - loss: 0.0014 - acc: 1.0000 - val_loss: 2.5055 - val_acc: 0.6667
Epoch 17/20
23/23 [==============================] - 2s 66ms/step - loss: 0.0013 - acc: 1.0000 - val_loss: 2.5193 - val_acc: 0.6667
Epoch 18/20
23/23 [==============================] - 1s 65ms/step - loss: 0.0011 - acc: 1.0000 - val_loss: 2.5339 - val_acc: 0.6611
Epoch 19/20
23/23 [==============================] - 2s 65ms/step - loss: 0.0010 - acc: 1.0000 - val_loss: 2.5476 - val_acc: 0.6611
Epoch 20/20
23/23 [==============================] - 2s 65ms/step - loss: 9.2123e-04 - acc: 1.0000 - val_loss: 2.5602 - val_acc: 0.6611
And yes, we’ve got to our goal. We can clearly see here that the validation accuracy is only 66% while at the same time the training accuracy is exactly 100%.
Credit: BecomingHuman By: Muhammad Ardi