As part of my 2019 resolutions, I have decided to study Deep Learning through Fastai’s MOOC. Lesson 1 focuses on image classification and creating a State of the Art (SOTA) classifier. As homework, below is my implementation of Fastai’s standard methodology where I created a model that recognizes 40+ characters from The Simpsons.
1. Creating a basic image classifier
The data has been compiled by Alexander Attia and uploaded on Kaggle. Therefore, many thanks to him for making his dataset public.
I only used the training set which consists of ~20,000 images distributed between 42 characters from ‘The Simpsons’. Moreover, the images are of various sizes, scenes, could be cropped from other characters and were extracted from episodes (season 4 to 24).
Trending AI Articles:
1. Ten trends of Artificial Intelligence (AI) in 2019
2. Bursting the Jargon bubbles — Deep Learning
3. How Can We Improve the Quality of Our Data?
4. Machine Learning using Logistic Regression in Python with Code
I started by setting up my framework in Google Colab and installed the necessary packages by running:
!curl https://course.fast.ai/setup/colab | bash
from fastai.vision import *
from fastai.metrics import error_rate
Retrieving the data
To retrieve the dataset from Kaggle, I installed its API and pasted a script to enable interactions between Colab and Kaggle. I could then download the dataset, move it to the desired directory called
data and unzip the file. Finally, I created a variable to store the path to the dataset.
nb: To use the Kaggle API, you need to have a Kaggle account and create a ‘New API Token’. The ‘Create New API Token’ button will download a file called ‘kaggle.json’, containing your credentials. Finally, in Google Drive, store the json file in a subfolder called ‘Kaggle’ in the ‘Colab notebook’ folder.
For more details on this topic, check out this detailed tutorial by Timm Derrickson on Medium or this step-by-step approach.
!pip install kaggle
# if it's your first time uploading a dataset with Kaggle's API into Colab, uncomment the 2 lines below
# !mkdir -p ~/.kaggle
# !cp kaggle.json ~/.kaggle/
# enable interactions between Colab and Kaggle's API
from googleapiclient.discovery import build
import io, os
from googleapiclient.http import MediaIoBaseDownload
from google.colab import auth
drive_service = build('drive', 'v3')
results = drive_service.files().list(
q="name = 'kaggle.json'", fields="files(id)").execute()
kaggle_api_key = results.get('files', )
filename = "/root/.kaggle/kaggle.json"
request = drive_service.files().get_media(fileId=kaggle_api_key['id'])
fh = io.FileIO(filename, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
# download move and unzip the dataset
!kaggle datasets download -d alexattia/the-simpsons-characters-dataset
path = Path('/content/data/simpsons_dataset'); path
Next, it’s important to see what the dataset looks like. This means understanding how the directories are structured, the labelling system used and looking at a sample of the images.
fnames = get_image_files(path/'homer_simpson')
In this particular dataset, labels are stored in the folders names.
- 42 folders, each representing one of the characters.
- The most popular characters (ie: The Simpson family) each have ~1,000 images, while lesser known characters have ~20–30 images.
To classify the images into the correct categories, I needed to first extract the labels from the folders names. To do so, I used the factory method
data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(do_flip=False),size=224,bs=64,train='simpsons_dataset',valid_pct=0.2).normalize(imagenet_stats)
I was then able to inspect a sample of the images and check that the labels were correctly extracted.
['abraham_grampa_simpson', 'agnes_skinner', 'apu_nahasapeemapetilon', 'barney_gumble', 'bart_simpson', 'carl_carlson', 'charles_montgomery_burns', 'chief_wiggum', 'cletus_spuckler', 'comic_book_guy', 'disco_stu', 'edna_krabappel', 'fat_tony', 'gil', 'groundskeeper_willie', 'homer_simpson', 'kent_brockman', 'krusty_the_clown', 'lenny_leonard', 'lionel_hutz', 'lisa_simpson', 'maggie_simpson', 'marge_simpson', 'martin_prince', 'mayor_quimby', 'milhouse_van_houten', 'miss_hoover', 'moe_szyslak', 'ned_flanders', 'nelson_muntz', 'otto_mann', 'patty_bouvier', 'principal_skinner', 'professor_john_frink', 'rainier_wolfcastle', 'ralph_wiggum', 'selma_bouvier', 'sideshow_bob', 'sideshow_mel', 'snake_jailbird', 'troy_mcclure', 'waylon_smithers']
Creating a model
Following Fastai standard methodology, I have then built a model based on resnet34, which takes images as input and outputs the predicted probability for each of the categories (in this case, it will have 42 outputs).
The model was trained for 4 epochs (4 cycles through all the data).
learn = create_cnn(data, models.resnet34, metrics=error_rate) learn.model
After 4 cycles, this model had an error rate of 0.08, implying an accuracy of 92%, which is pretty amazing considering I had mostly used default parameters.
2. Fine-tuning the model
Interpreting the results
Though the model had an accuracy of 92%, I made sure that the classifier worked properly by looking at the categories (characters) the model confused with one another, to see if the mistakes were reasonable. To do this, I used methods from a Fastai class called
ClassificationInterpretation which let me visualize the images the model was most confident about but got wrong and plot a confusion matrix.
interp = ClassificationInterpretation.from_learner(learn) interp.plot_top_losses(9, figsize=(15,11))
The text above each images denotes: the prediction / the actual / the loss / the probability.
Doh! Some images were mis-labeled
At first, I was startled to see among the most incorrect images those of Kent Brockman and Lisa Simpson, as they were characters with the most images. Looking closer at the path of these images, it turned out that some of them were placed in the wrong folder, and were therefore mis-labeled after being loaded in
The confusion matrix traced a diagonal, meaning the model correctly recognized most of the characters, even those of the least popular ones (unbalanced classes), with only 20–40 images. This was made possible thanks to a technique called transfer learning.
Indeed, I fitted a model that was pre-trained, meaning it already knew how to recognize images.
- When I called
create_cnn(data, models.resnet34, metrics=error_rate), it loaded a pre-trained resnet34 model which had already been trained to classify 1.5mn pictures across 1,000 different categories of objects (ie: plants, animals, people…).
Before fine-tuning, I cleaned up the noise in the dataset which helped improve the model’s accuracy to 93% from 92% and then saved the model.
Unfreezing the model
A convolutional neural network (CNN) is made of several layers where the first layers can find lines/gradients and simple shapes; while the deeper layers can identify complex combinations of lines and shapes (from objects to animal eyeballs…).
When I first trained the model, I added a few layers on top of the pre-existing ones. But to improve the predictions, I had to
unfreeze all the layers, meaning train the model as a whole.
nb: It was unlikely that I could have improved layer 1 and layer 2, as the definition of a diagonal line wouldn’t probably change for any of the Simpsons characters. But maybe I could have improved earlier layers (ie: from 3 through 5) that would recognize cartoons characters better. That is why I unfroze the model.
For additional explanations about NN layers, check out the notes from Fastai’s course, written by Poonam Ligade.
Finding the optimal learning rate
As each layer represents different levels of complexity, all the layers cannot be trained at the same speed. Therefore, finding the optimal learning rates (LR) is key. To do this, I used the method
.lr_find() and plotted the results.
plt.title('Loss vs. Learning rate')
The LR indicates how fast the model’s parameters are being updated. The plot above shows how the speed of the LR affects the training loss. Based on this plot and Jeremy Howard’s (the lecturer) advice, I picked a LR well before the loss got worse:
Furthermore, there’s no point in training all the layers at the same speed, as the last layers were first trained at the default speed of
3e-3. Therefore, when fitting the model I passed a range of LR where the first layers are trained at
1e-6 and the last layers at
1e-4 (slower than the default because I was fine-tuning) and distributed the other layers across that range.
For additional explanations about LR, check out my post where I experimented with different LR.
Results after fine-tuning
After removing the mis-labeled images from the dataset and fine-tuning the LR, I was able to reach an error rate of 5% on the validation set from 8%, implying an accuracy of 95% (vs. 92% previously).
3. Testing the model
Finally, I tested the model on a random picture taken from Google Images and was really happy to see the model correctly identified Homer.
from google.colab import files
uploaded = files.upload()
test_img = open_image('/content/test.PNG')
pred_class, pred_idx, outputs = learn.predict(test_img); pred_class
- Fastai for its great MOOC and library which you can read more about here
- Google Colab for providing a free cloud service based on Jupyter Notebooks that supports free GPU
- Alexander Attia for the dataset which you can download from Kaggle