The full notebook can be found here.
The what and the why
Image segmentation is an application of computer vision where in we color code every pixel in an image. Each pixel then represents a particular object in that image. If you look at the images above, every street is coded in violet, every building is orange, every tree is green and so on. Why do we do this and how is it different from object detection?
Image segmentation is usually used when we care about edges and regions, when we want to separate important objects from the background. We want to know the specifics of an object and conduct further analysis on it from there on. Think about it in terms of a self driving car. A self driving car will not only want to identify a street, but also know it’s edges or curves in order to make the correct turn.
Image segmentation has a lot of significance in the field of medicine. Parts that need to be studied are color coded and viewed in scans taken from different angles. They are then used for things like automatic measurement of organs, cell counting, or simulations based on the extracted boundary information.
Trending AI Articles:
1. Bursting the Jargon bubbles — Deep Learning
2. How Can We Improve the Quality of Our Data?
3. How to train a neural network to code by itself ?
4. Machine Learning using Logistic Regression in Python with Code
We treat image segmentation as a classification problem, where for every pixel in the image, we try to predict what it is. Is it a bicycle, road line, sidewalk, or a building? In this way we produce a color coded image where every object has the same color. One point to note here is that these colors are in the form of integers and not floating point numbers.
As usual we start by importing the fastai libraries.
Let’s first take a look at one of the images. We do so by using the functions
open_image to open the image and
show to display the image.
Next, we take a look at what the image looks like after segmentation. Since the values in the labelled image are integers, we cannot use the same functions to open it. Instead, we use
show to display the image.
Notice the function
open_mask . In every segmentation problem, we are given 2 sets of images, normal ones and labelled ones. We need to match the labelled images with the normal ones. We do this using the filenames. Let’s take a look at the filenames of some of the images.
We see that the filenames of the normal images and the labelled images are the same except, the corresponding labelled image has an
_P towards the end. Hence we write a function which for every image, identifies its corresponding labelled counterpart.
We also have a file called
codes.txt that tells us what object the integers in our labelled image correspond to. Let’s open the data for our labelled image.
Now let’s check the codes file for the meaning of these integers.
The labelled data had a lot of 26s in it. Counting from index 0 in our codes file we see that the object referred by the integer 26 is a tree.
Now that we’ve understood our data, we can move on to creating a data bunch and training our model.
We will not use the whole data, and we will also keep the batch size relatively small since classifying every pixel in every image is a resource intensive task.
As usual we create our data bunch. Reading the above code:
- Create the data bunch from a folder
- Split the data into training and testing based on filenames mentioned in
- Find the labelled images using the function
get_y_fnand use the codes as classes to be predicted.
- Apply transforms on the image (note the
tfm_y = Truehere. This means that whatever transform we apply on our dependent images should also be applied on our target image. (Example: If we flip an image horizontally, we should also flip the corresponding labelled image))
For training, we will use an architecture called U-Net instead of a convolutional neural network.
Before explaining what U-Net is, notice the metrics used in the above code. What’s
The accuracy in an image segmentation problem is the same as that in any classification problem.
Accuracy = no of correctly classified pixels / total pixels
However in this case, some pixels are labelled as
Void (this label also exists in
codes.txt) and shouldn’t be considered when calculating the accuracy. Hence we make a new function for accuracy where we avoid those labels.
The way a CNN works is it breaks down an image into smaller and smaller parts until is has just one thing to predict (left part of the U-Net architecture shown below). A U-Net then takes this and makes it bigger and bigger again and it does this for every stage of the CNN.
As always we find the learning rate and train our model. Even with half the data set, we get a pretty good accuracy of 92%.
Checking some of the results.
The ground truths are the actual targets and the predictions are what our model labelled. Looks good enough to me.
If you liked this article please give it at least 50 claps.