The course deals with a problem of classifying dogs vs cats with 37 different categories. Image classification where more than 2 classes are there to classify is called as Fine-Grained classification. The main difference among various image classification datasets is the way they store the labels (in a csv file, in the name of the file itself, in form of a list) of categories. Fast AI has plenty of functions to deal with such problem.
This course is been taught using Jupyter notebook but you are free to use any editor of your choice such as PyCharm or Spyder. There are various options to set up the GPU, you can choose and set up yours accordingly. I used google colab which is a free GPU service from Google. Also, FastAI sits on top of PyTorch (popular library for deep learning apart from TensorFlow), but for this course, we do not need to have hands-on for PyTorch. To get the code used in this course, you can check out FastAI GitHub repo. I will mostly be explaining the bits and pieces of the FastAI library and the theory behind the scenes.
These days GPU are much in trend to perform deep learning, especially when it comes to images. But one of the shortcomings of the current deep learning technology is that the GPUs cannot identify the size of the images on their own and we explicitly need to provide the shape of the image.
The square image size of 224*224 (by cropping and resizing) is extremely common and accepted by most of the algorithms. Later in the series, we’ll see how to use the rectangle image size. In FastAI everything you’re gonna model is an ImageDatabunch object. The Data bunch object consists of a variety of datasets including training, validation, and testing (optional) datasets. These datasets need to be normalized (using the normalized function) to make the entire data of the same size. In case of images, normalization means making the mean and standard deviation(std) same for all the images, that is, the pixel values of the three channels(red, green, blue) gets normalized. If the data is not normalized, then it becomes difficult for the model to train well. So, if you’re having trouble training the model, one thing to check is that if you’ve normalized the data or not. The models in FastAI are designed in such a way that they end up giving a result of 7*7 and that’s why the optimal size if 244. We’ll learn about this later. Once the data is loaded in the databunch object,
data.show_batch() command can be used to have a look at the images in the data. The number of unique classes can be determined using the parameter ‘c’ present in the databunch object.
A Learner (a bunch of different models) is a general concept in Fast AI for a model to learn the data. Just like databunch is a general concept for data, a Learner is a general concept for models. There are sub-classes (consider them as different models) in a learner and the one particular subclass used for image classification is the ConvLearner which creates CNN for us.
model = ConvLearner(data, models.resnet34, metrics=error_rate, bs=64)
- data — databunch Object
- models.resnet34 — Resnet34 (Pretrained Model)
- error_rate — defines the error of the model on validation set
- bs — batch size
The first time we run this command, it downloads the resent34’s pre-trained weights. The resnet34 is been already trained on one and a half million images in ImageNet dataset and knows how to identify images among thousands of categories. This sort of learning is called Transfer Learning. One advantage of transfer learning is that even if we don’t have enough data then also the model can train itself really well because it has already been trained on some form of similar data.
This lets the model train in 1/100th of the original time. This method of learning also requires very less number of training images and still classifies the unseen images correctly.
To make sure that the model doesn’t overfit we use a validation set. Remember that the ImageDatabunch object already has a validation set and the model evaluates the error_rate metric for the predicted results of the validation set.
If you try training for more epochs, you’ll notice that we start to overfit, which means that our model is learning to recognize the specific images in the training set, rather than generalizing. One way to fix this is to effectively create more data, through data augmentation. This refers to randomly changing the images in ways that shouldn’t impact their interpretation, such as horizontal flipping, zooming, and rotating.
After the model is designed and compiled, FastAI uses fit_one_cycle(n) method instead of the generic fit method. The fit method is the “normal” way of training a neural net with a constant learning rate, whilst the fit_one_cycle method uses something called the 1 cycle policy, which basically changes the learning rate over time to achieve better results. n denotes the epochs (cycles for which the model goes over the data). Once the model is fitted on the training set, the weights (and other info) about the model can be saved using the following command and can be retrieved later.
model.save(‘stage-1’) - to save the model
model.load('stage-1') - to load the model
After the model is built, we can interpret the model using the Classification Interpretation object.
interp = ClassificationInterpretation.from_learner(model)interp.plot_top_losses(9, fig_size=(15,11))interp.plot_confusion_matrix(figsize=(12,12), dpi=60)interp.most_confused(min_value=2)
For more info about the functions, we can always use doc(function_name).