As I progress through my journey in Deep Learning (DL), I notice that it becomes more challenging to find answers to questions by searching through websites and fora. So I find it even more important to blog about my experience. It will facilitate learning for myself and others.
For those who are taking their first steps into DL using Fast.ai, I suggest to refer to these previous blogs as well:
- Testing the waters of Deep Learning contains:
a. Information on the preparations necessary for DL: Colab set up, installations and imports;
b. Instructions on downloading and gathering images from the internet;
c. Steps that can be taken to process, train, visualize results, and launch a simple application.
2. Developing a taste for Deep Learning contains:
a. Simple definitions to introduce DL jargon;
b. Good resources where you can find datasets to play with;
c. Examples on how images can be transformed;
d. Illustrations on ways to improve a model by playing with learning rates, number of epochs and model depth.
The first two blogs dealt on assigning classification on an image based on a Single predominant object on the photo. In this blog, we will try to train a Multilabel or Multicategory Classification model. We will aim to come up with a model that can predict one or More objects or No predominant target object in an image.
Here we go:
- Identify and gather data.
There are various ways of gathering data. Fastai provides a library with toy datasets that contain friendly formatting. I chose to use one from an external site, so that I can get acquainted with the challenges of using real-world data.
Check that there are annotations (that is, labels) associated with the images. Sometimes the annotations come in a different file or folder, sometimes they are derived from the filename or source. Or you can annotate it yourself if you have time.
For this mini-project, I had to find one that contained multiple labels on some of the images. It dealt with living animals that can be found in an Aquarium, you can check out the source here.
2. Make data accessible to your coding platform.
I use Google Colab for coding. Here are the steps to enable Colab to find your data:
a. Upload your data into Google Drive. The ‘path’ will pertain to the hierarchy where your data is placed within the Google Drive.
b. Open a Colab Notebook and do the installations and imports:
c. One possible alternate/ complementary step is to Mount Drive in Colab:
i. click on Command Palette at the bottom left of the screen, or from the Tools at the toolbar.
ii. find Mount Drive
iii. Give access to Google Drive by clicking Connect.
3. Give instructions on where to find the data (that is, path).
4. Initial exploration of the data.
The filename is long and the contents don’t look contributory apart from the IMG_number. However, from my experience in playing with this set, there was no need to clean this text. We will leave the df.filename as is.
The data is presented in an encoded manner. There are 7 possible labels. 1 encodes presence, and 0 encodes absence of the specified object label. We see an early example here of multi-labelling: Index 1 is labelled both as ‘fish’ and ‘shark’.
This mini-project will follow the Fast.ai lesson 06 on Multicategory Classification. We will format our data similar to the Pascal dataset which was used in the lesson. In the Pascal dataset, the labels were found in a single column, so we will reverse the encoding.
1. Why Corporate AI projects fail?
2. How AI Will Power the Next Wave of Healthcare Innovation?
3. Machine Learning by Using Regression Model
4. Top Data Science Platforms in 2021 Other than Kaggle
5. Preparing the Labels.
a. Indicate the object name, where it is labelled to be present.
b. Combine the labels into one column.
c. Account for the possibility of No identifiable objects in an image.
d. Clean the Labels column.
Remove the 0’s and the white space.
We now have collected all the positive labels in one df.animals column.
6. Prepare the DataBlock.
DataBlocks are instructions that will enable you to build Datasets and DataLoaders. These are based on PyTorch structuring. The idea is to have the data codes separate from the model codes. Datasets will store the data and labels, and being a tuple, it cannot be changed. The data found in the Datasets will undergo processing in the DataLoaders. Model training is applied on the DataLoaders.
In the first two blogs, we played with a fully constructed DataBlock and checked the output on DataLoaders.
In this blog, we will construct the elements of the DataBlock one at a time, and check the output on Datasets first, before proceeding to the DataLoaders.
a. Baseline DataBlock.
Starting with an empty DataBlock (no explicit instruction given yet), notice that it already partitions the data into training and validation sets. There are options to control the split yourself, but for this mini-project, we will keep the default on splitting.
b. Set the dependent (y) and independent variables (x).
- For get_x: returning just the filename will not generate the image data. It has to be told where to find the data by specifying the path.
Note: This is the only interaction with the df.filename. Thus, spending time on cleaning the text of the df.filename is not necessary, and might even lead to misdirections.
- For get_y: ‘animals’ pertain to the column name of my target labels which we have prepared in the previous steps. The split function is necessary to give the appropriate text/character formation in our results, as we will see later.
Note: The get_y should theoretically also be able to process data, however, my attempt to include the cleaning process in this step did not give me the expected result. Therefore, I moved the cleaning step before assigning the data to the DataBlock.
- See from the output that we still have blank entries that we can clean. But for now, we will move forward.
c. Check a sample visually to see if we are progressing well with the DataBlock instructions.
The dsets output indicated labels of fish and shark, and we can confirm the veracity of this labelling by checking with our own eyes. I see objects that could represent a fish and/or a shark.
- Note: At this time, we will not debate whether the annotations were good or bad.
d. Indicate the types of data that we will handle.
- ImageBlock enables us to work on images (technically, with the Python Imaging Library (PILImage) ).
- MultiCategoryBlock enables us to work on data that has multi-label categorical targets.
- Note that the TensorMultiCategory output is an encoding of the labels.
e. Check if it is able to recognize the labels as expected.
We see from the vocab output that all the 7 labels are recognized (plus the white space that we mentioned earlier).
Using the ‘where’ function will enable us to locate and potentially map the index with the vocab.
Note: If we did not place the split function in the DataBlock, the vocab will give us individual characters instead of words.
We are now reasonably happy the data, and can proceed.
f. Complete the DataBlock.
The splitter, item_tfms, and batch_tfms were discussed in the previous blog.
Once we are happy with the data stored in the Datasets, we can step over to the DataLoaders and start thinking about the model.
7. Visualize the contents of the DataLoaders.
DataLoaders can process data by batches, we will use the default (1).
8. Start Learning!
We employed the CNN Learner.
Since we have multiple possible labels, we expect a binary (0 or 1) prediction for each of the labels for each image. The accuracy_multi metric uses the threshold default of 0.5 in a sigmoid distribution. If the item’s prediction probability threshold is more than 0.5, it is assigned the label. Zero otherwise.
The ‘partial’ is a Python function that enables distribution of the method among the different items.
9. Compare various Learning models and hyperparameters.
We utilized various approaches to see the effects of individual components.
From the summary table above, we see that increasing the depth from resnet18 to resnet34 did not result in a good model. However, going deeper to resnet50 provided better losses and accuracies.
At resnet50, increasing the number of epochs where the middle layers were frozen showed better results. Increasing the epochs from 4 to 8 showed an even bigger improvement.
Choosing the resnet50 gave us accuracies from 81– 89 %.
10. Visualize results.
The function does not show which are the actual and predicted labels. Checking the code in the docs will give us a clue that the top label is the actual label.
While the red labelling looks alarming, it does not indicate a bad model. The accuracy_multi takes into account the prediction for each object, not for each image. Likewise, it penalizes the accuracy point for both under- and over-prediction. Further explanation for this may be read in Tackling the accuracy_multi metric.
Considering the difficult underwater image environment, correctly labelling 4 out of 5 objects is reasonably good!
11. Evaluate for possible model improvement.
This reflects that further text cleaning of the df.animals column by removing blank entries/ whitespace may lead to a more accurate model.
Based on the top losses results, it seems that the model is generous in predicting stingrays and starfish. A review on the distribution of each object and their annotations might show some imbalance.
The None target label is under-represented.
These issues may be resolved by balancing the distribution among the objects, including None.
We were able to use an external data in as input to a pre-trained Deep Learning model. We were able to discuss the inputs and instructions needed by the model. We compared various model depths and hyperparameters. And we were able to make a model that can accurately label multiple objects in a challenging aquarium image.
To the founders of Fast.ai for creating a wonderful platform for learning.
To the contributors in the Fast.ai, StackOverflow and SharpestMinds fora for helping a beginner out.
To Medium for facilitating information dissemination and acquisition.
To @ai_fast_track for the mentorship.