In this article we will look at another application of computer vision known as image regression. In image regression, we have a dataset that’s annotated in a certain way. For example, for every image in our dataset, we would have the co-ordinates of the eyes of that person. We then train a model to predict these co-ordinates for new images.
But why do we do this? If you do a quick Google search, you might find interesting applications such as finding the height of a person from an image or predicting the steering angle for a self driving car. However, the application that instantly pops up in my mind is Snapchat filters.
Trending AI Articles:
1. Deep Learning Book Notes, Chapter 1
2. Deep Learning Book Notes, Chapter 2
3. Machines Demonstrate Self-Awareness
4. Visual Music & Machine Learning Workshop for Kids
Snapchat predicts the co-ordinates of your nose before it can turn you into a cat. And I was so frustrated by this, that I decided to turn cats into humans.
I found a dataset on Kaggle and I decided to go ahead with it. One reason I chose this dataset was because there was not much work done on it.
The dataset had images of a bunch of cats, and their corresponding annotations stored in a
.cat file in the format shown below.
I decided to simplify it by starting with only the co-ordinates for the mouth and then eventually going for everything. Hence, I did a little bit of pre-processing that would give me the co-ordinates of the mouths for all the cats as PyTorch tensors. I visualized the first cat and things looked good. (Notice the red dot.)
However, as I went ahead, I noticed that the annotations for a lot of images were incorrect. In fact for some images, the annotations weren’t even in the image itself but outside it.
I made sure that the
scale parameter was set to
True which means that if we increase or decrease the size of the image, the point would move accordingly. And it was. So the problem was with the data itself (guess that’s why there was not much work done on it after all!)
And this is an important part of deep learning. There are not many quality, ready-to-use datasets available out there so, in most cases, companies have to create their own datasets before they can work with it. In this case, the company would need to annotate the data themselves.
Back to level 0
Full Jupyter notebook
Since we couldn’t turn cats to humans, we’d have to stick to humans. In the dataset I eventually used, we would be predicting the centre of a person’s face as shown.
In this case, the co-ordinates are not directly given to us. We are given some calibration numbers in a file.
Then they’ve provided formulae to get the actual co-ordinates for every image. These steps are part of the README and we don’t need to worry about them too much.
Now that we have the actual co-ordinates we can work with them. Note that these co-ordinates have got nothing to do with pixel values or anything else related to the image itself. These co-ordinates are plain XY values.
Hence we need a model that predicts 2 floats for every input in our dataset. And how do we create this model? Same as always.
Importing the library
We start by importing the fastai library for vision.
We’ve already derived the co-ordinates for all the points so we can go ahead and create a DataBunch. This time we would be creating a
The images are stored in a lot of folders, and we are just going to randomly select a folder (folder number
13 in this case) to be our validation set. We label from the function we created before, apply our transforms and create a DataBunch. Data augmentation on this dataset looks as follows:
For our model, as usual, we create a Learner (using ResNet34 as the pretrained model), find the learning rate, and train our model.
After just 3 epochs, our model is doing pretty well. We can plot the predicted vs actuals to confirm this.
After training deep learning models, we save them as
.pth files so that we do not have to train them again. Usually we don’t just save one model, but we save the weights at various stages of training. This helps us to move in a different direction from a certain stage if we want to.
That would be it for this article.
If you enjoyed the article, give it atleast 50 claps :p