We are going to start a new deep learning series — Image classification step by step! This is the first part of this tutorial series.
For a 2D image like this, after flattening it into a 1D vector, we are going to feed it into a model to classify the image into one of the single-digit numbers, ranging from 0 to 9.
Let’s just call this model a network for now. This simple network has only two layers. The input size is 6 times 6, which is 36 in total. The output size is 10. 10 is the number of classes we have.
Each connection between the input layer and the output layer is a model parameter. In this case, we have 360 parameters.
Now let’s look at how to do image classification with this model, assuming the model parameters are already optimized.
The first step is to perform a linear weighted sum of the input with the model weights. The output from this step is z, which is just a 1 by 10 vector which stores the scores for each output class.
Then we can feed these scores to a Softmax function. Softmax function will normalize the input vector into the range from 0 to 1, which can now be interpreted as a probability for each class. In this case, we have the highest probability for digit 7. So the model will predict the input image as 7.
This is almost all the math and concept you need to know. Here, I will walk you through the code about how to import the dataset, define the model in this part.
2. Using Artificial Intelligence to detect COVID-19
3. Real vs Fake Tweet Detection using a BERT Transformer Model in few lines of code
4. Machine Learning System Design
Let’s first have a look at the libraries we are using. Pytorch, Numpy and Sklearn, they are all very famous libraries commonly used in both research and industry.
The first step is to import the dataset. Torchvision from Pytorch allows us to import and download the MNIST dataset directly from its API. The MNIST dataset here is one of the most common datasets used for image classification. It is a dataset of handwritten digits, which contains 60 thousand training images and 10 thousand testing images.
Then, we use Dataloader to load the data into an iterator. During this process, we randomly shuffle the dataset and split the dataset into baches. Shuffling the data serves the purpose of reducing variance and making sure that models remain general and overfit less. For example, you need to shuffle your data if it is sorted by their class.
Here for each batch, there are 4 images in it. This line of code will help us to get 1 batch of images. And let’s have a closer look at one of the images. The image has 28 times 28 pixels, which is 784 in total. each pixel holds a number range from 0 to 1 that represents the greyscale value of the corresponding pixels.
Now it is time to define the model. This line of the code will help us to define the model structure. It is a simple two-layer network structure, The input dimension is 28 times 28, which is 784 in total. And the output dimension is 10. There is a connection between each pair of input and output, and each connection here represents a model parameter.
In the forward function, we define how the model is going to be run from input to output. First we need to reshape the 2d image into 1d, so the total number of input here is 784, which matches the network input. And the next step is to calculate the weighted sum of the input x with the model parameter w. Finally, we can return the output, .
So that’s an introduction and how do we import data and define the model for our image classification task. Here, I will also provide the source code with colab notebook for this part. In the next part, I’m going to talk about softmax and cross-entropy which is important concepts for model training.
The full video is also provided. Don’t forget to subscribe to us and ring the notification bell if you like the video!
Follow us on Youtube
Follow us on Twitter
Follow us on Instagram