In this tutorial, we will code an AI project with TensorFlow and Keras that can recognize hand written digits. If this your first time with machine learning and you have no idea what TensorFlow is, do not worry, I will do my best to keep things simple. And you don’t have to understand everything in full detail just yet, I will post many tutorials and projects on machine learning and artificial intelligence that will make things understandable one tutorial at a time.
For now, you should know that TensorFlow is a very popular machine learning tool. By leveraging TensorFlow, we can get up and running with machine learning within 15 minutes.
In this tutorial we will do everything in Google Colab, which comes preinstalled with all the major tools to create machine learning models. What this means for you is that, you can just go to Google Colab website and start coding your first neural network within minutes, without installing anything on your computer. To get started you can follow the link down below and click on “new notebook”:
First things first, let’s import the necessary libraries. We will import two libraries for this tutorial. First one is TensorFlow, which is the major library we will use. We will import tensorflow as tf, meaning that, for us to use tensorflow in our code, we won’t need to write ‘tensorflow’ each time, but we will simply use the alias of ‘tf’. This is kind of the default way to import tensorflow. We will also import matplotlib, which is a very popular data visualization library.
# import the necessary librariesimport tensorflow as tfimport matplotlib.pyplot as plt
Next step is to get the data we are going to work with and split it to training and test datasets. Training set is the part of the data we will train our neural network on. Once training is complete, we will test its performance on the test dataset. In both of the datasets, we will have two things, the actual images of the hand written digits, and the corresponding labels for that digit.
# load the data and split the data to training set and test set(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
Next step is to scale down the pixel values of the images from 0–255 range to 0–1 range. Neural networks work much better with numbers that are close to zero. One of the most common ranges to convert is 0–1.
# scale down the value of the image pixels from 0-255 to 0-1train_images = train_images / 255.0test_images = test_images / 255.0
2. Feature Scaling in Machine Learning
3. Understanding Confusion Matrix
4. 8 Myths About AI in the Workplace
Before going ahead and creating our neural network, let us just see what our data looks like. Let’s also display the first image in the training dataset.
# visualize the dataprint(train_images.shape)print(test_images.shape)print(train_labels)
Running this code cell should give you something like this :
(60000, 28, 28)
(10000, 28, 28)
[5 0 4 … 5 6 8]
What this output tells us is that we have 60,000 images in the training set that are grayscale and are 28 x 28 pixels. If we were working with colored images, the shape of our training images would look like this: (60000, 28, 28, 3). The 3 here, would have come from 3 color channels (most likely: red, green, blue) that we would have needed to represent the image.
Creating a machine learning model consists of two steps. First, we will define the model, and then, we will compile the model. So, let’s start by defining the model.
# define the modelmy_model = tf.keras.models.Sequential()my_model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))my_model.add(tf.keras.layers.Dense(128, activation='relu'))my_model.add(tf.keras.layers.Dense(10, activation='softmax'))
Now let’s examine what we did in the code above. There are 3 ways to create our model and today we are using the most beginner friendly one, which is the “Sequential” model. Once we defined that we will use the Sequential model, we are creating our neural network model by stacking the neural network layers sequentially as you can see above.
First layer is the “Flatten” layer which will get the 28 x 28 square images and flatten them into 1×784 pixels. This step is required since the “Dense” layer we will use in the next layer can not work with square images, but it can work with the single line of pixels at a time. In other tutorials, we will see other types of neural network layers that can directly work with square images as well.
On the next layer, we are creating a “Dense” layer with 128 neurons. The name “Dense” comes from its connection type, because every neuron is connected to every other neuron in the next layer. This makes the network very densely connected. This is the most simple kind of neural network layer. We will also give it the activation function of “relu”, which is a short hand naming for “rectified linear unit”. This is one of the most popular activation functions due to its efficiency and effectiveness.
Then we are creating the last layer. When creating the last layer, it is highly crucial that the number of neurons match the number of classifications you have. In this case, we are using 10 neurons because we have classes between 0 and 9. We will also give it an activation function of “softmax” because the softmax function turns its inputs into a probabilistic range we can interpret. As a result, if we show the model a picture of an 8, if the model thinks that it is 91 percent sure that it is an 8, we would simply get 0.91 from the neuron that represents number 8.
Next step is to compile the model. When compiling the model, we will use one of the most popular optimizers and loss functions. The main reason they are so popular currently is due to their computational efficiency and effectiveness. As the optimizer, we will use the “adam optimizer” which can be written as “adam”. We also have to define a loss function, because we are doing a multiclass classification, we will use “sparse categorical crossentropy” function. This function is highly useful when we are classifying more than 2 classes. Then, in order to track the accuracy of the model, we are going to use the metrics.
# compile the modelmy_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
Next step is to train the model. Until now, our model had never seen the data before. In the training step, we will train the neural network model with the images of the hand written digits, as well as the corresponding labels for those images. We have 60,000 training images and we will show those images 3 times to our model. To communicate that, we will enter the “epochs” value as 3. As we train the model for more epochs, the accuracy of our model will increase until a certain point. Therefore we want to start with a small number of epochs. We can always train the model for more epochs.
# train the modelmy_model.fit(train_images, train_labels, epochs=3)
Running this code should give a result like this:
Epoch 1/3 1875/1875 [==============================] — 4s 2ms/step — loss: 0.4357 — accuracy: 0.8756
Epoch 2/3 1875/1875 [==============================] — 3s 2ms/step — loss: 0.1278 — accuracy: 0.9621
Epoch 3/3 1875/1875 [==============================] — 3s 2ms/step — loss: 0.0823 — accuracy: 0.9761
This means that, by the end of the first iteration on the dataset, the model achieves 87.56% accuracy on determining the right label for a given image. By the end of the third epoch, the model’s accuracy increases to 97.61%, which is pretty good, given that we used a very small neural network and we trained it only for 3 epochs. But this is the prediction accuracy on the data it has already seen before.
To get a more realistic view on the model’s performance, we should measure its accuracy on data it hasn’t seen before. In order to achieve that, we will use the test dataset. To test the model, we will write my_model.evaluate() and we will give it the testing images, as well as the corresponding labels for those images. We will also print the test accuracy on a separate line.
# check the model for accuracy on the test dataval_loss, val_acc = my_model.evaluate(test_images, test_labels)print("Test accuracy: ", val_acc)
Running this code should give us something like this:
313/313 [==============================] — 0s 994us/step — loss: 0.0817 — accuracy: 0.9742
Test accuracy: 0.9742000102996826
What this means is that our model classifies hand written digits with 97.42% accuracy. This is actually pretty close to the training accuracy, which was 97.61%. And this is something we wanted to achieve. We always want our training accuracy to be close to our testing accuracy. That is the whole point of training, getting better results on data it hadn’t seen before. That’s why your training data should be representative of the test data. If the models accuracy on the test dataset was around 85% or 75%, it would probably mean that we are overfitting with our model. This could require us to go back to our model and make some modifications and train our model with less epochs this time to prevent overfitting. Overfitting happens when the model starts to memorize the dataset examples rather than learning useful and generalizable insights.
Now that we have a model that can do something useful, let’s save our model, so that we can use it at other places as well. If we do not save the model, we will need to recreate it every time want to use it. To save our model, we will write my_model.save() and we will also give it a file directory to save the file.
# save the model for later usemy_model.save('my_mnist_model')
Let’s also retrieve the model from the file system to make sure that it works. To load the model, we need to locate the model in the file system and pass it to the .load_model() method. Since we are in the same notebook, we can just copy and paste the file address from the previous code cell, where we saved the model. In this example, I will name it as “my_nmist_model” and everything about our machine learning model will be saved under that file. Therefore, as long as we have access to that folder, we can retrieve the exact same machine learning model from the file system.
# load the model from file systemmy_new_model = tf.keras.models.load_model('my_mnist_model')
Let’s also test our new model with the test data to make sure that we are getting the same exact accuracy with the new model as well.
# check the new model for accuracy on the test datanew_val_loss, new_val_acc = my_new_model.evaluate(test_images, test_labels)print("New Test accuracy: ", new_val_acc)
Running this code should give you a result that looks like this:
313/313 [==============================] — 0s 1ms/step — loss: 0.0817 — accuracy: 0.9742
New Test accuracy: 0.9742000102996826
As we can see, both the original model and the new model we retrieved from the file system gives the exact same results, which is what we were expecting when we saved the model.
If you have come this far, CONGRATULATIONS! You have just built, trained, evaluated, saved, retrieved and re-evaluated your first machine learning model. If you didn’t understand some parts, that’s completely okay too. Take another look at the code and maybe watch the video below to see everything we did in a video format.
If you still have questions about why we did something, leave them as a comment and I will get back to you as fast as I can, with a solution. If your question requires a more extensive explanation, I can explain that in another article or a video, addressing your question.