There are several things to do in the model training stage. First, we need to split the data into train/test. This is pretty important to find out whether our model is overfitting during the training process. Luckily, we got a train_test_split() function which is taken from Scikit-Learn module. In this case I decided to use 20% of samples in the dataset as the test data while the rest is going to be used for training. Additionally, it’s good to know that when we use this train_test_split() function we no longer need to shuffle the data since the function does it automatically for us.
X_train, X_test, y_train, y_test = train_test_split(all_encoded_texts, all_labels, test_size=0.2, random_state=11)
Now, let’s begin to construct the neural network by defining a Sequential() model which then followed by adding 3 layers.
model = Sequential()model.add(Embedding(input_dim=35362, output_dim=32, input_length=500))
The first layer that we put in the neural network model is an embedding layer. The arguments of that layer are vocabulary size (as the input_dim), vector size (as the output_dim) and input length respectively. In this case, we can take the vocabulary size from the length of tokenizer.word_index, which shows that we got 35362 unique words in our dictionary. The vector size itself is free to choose, here I decided to represent each word in 32 dimensions — well this is essentially the main purpose of using an embedding layer. The last argument I think is pretty straightforward — it’s the number of words of each text sample. Here’s an article if you wanna read more about embedding layer.
The second layer to add consists of 100 LSTM cells. This type of neuron is commonly used to perform classification on sequential data. This is because an LSTM cell does not treat every data point (in this case a data point is a word) as an uncorrelated sample. Instead, input in the previous time steps will also be taken into account to update cell state and the next output value. Well, that’s just an LSTM explanation in a nutshell. If you are interested to understand the underlying math I suggest you to read it from this article.
Lastly, we will connect this LSTM layer with a fully-connected layer which contains 4 neurons where each of those are used to represent a single class name.
Now before the training process begin we need to compile the model like this:
In this case, I use categorical cross entropy loss function in which its values is going to be minimized using Adam optimizer. This loss function is chosen because this classification task has more than 2 classes. To the optimizer itself, I choose Adam since in many cases it just works better than any other optimizers. Below is the summary of our model looks like:
Now as the neural network has been compiled, we can start the training process. Notice that I store the learning history in history variable.
history = model.fit(X_train, y_train, epochs=12, batch_size=64, validation_data=(X_test, y_test))
Here is how my training progress goes:
Train on 2985 samples, validate on 747 samples
2985/2985 [==============================] - 75s 25ms/step - loss: 1.3544 - accuracy: 0.3652 - val_loss: 1.2647 - val_accuracy: 0.6466
2985/2985 [==============================] - 48s 16ms/step - loss: 0.4278 - accuracy: 0.9196 - val_loss: 0.4589 - val_accuracy: 0.8768
2985/2985 [==============================] - 49s 16ms/step - loss: 0.1058 - accuracy: 0.9759 - val_loss: 0.1859 - val_accuracy: 0.9438
2985/2985 [==============================] - 49s 16ms/step - loss: 0.0253 - accuracy: 0.9966 - val_loss: 0.1499 - val_accuracy: 0.9625
For the sake of simplicity, I decided to delete several epochs. But don’t worry, you can still see the training process history using the following code (both accuracy and loss value).
According to the 2 graphs above, we can see that the performance of our neural network classifier is pretty good. The model achieves 99.7% of accuracy on training data and 96.3% on test data. In fact, I’ve tried to increase the number of epochs to see whether the performance can still be improved. However, somehow the accuracy on both data are just fluctuating randomly at around 90% to 97%. So then I decided to restart the training and stop at epoch 12.