Pneumonia is lung inflammation caused by infection with virus, bacteria, fungi or other pathogens. According to National Institutes of Health (NIH), chest x ray is the best test for pneumonia diagnosis. However, reading x ray images can be tricky and requires domain expertise and experience. It would be nice if we can just ask a computer to read the images and tell us the results. In this story, we will use deep learning to train an AI algorithm that analyzes chest x ray images and detects pneumonia.
Convolutional Neural Network (CNN)
Convolutional neural network (CNN) is a class of deep neural networks that specializes in analyzing images and thus is widely used in computer vision applications such as image classification and clustering, object detection, and neural style transfer. To understand CNN, let’s first look at what convolution is. Say we have a 5×5 grayscale image and a 3×3 matrix called filter or kernel, and we want to “convolve” the image by performing element-wise matrix multiplication for each 3×3 block of the image with the filter and then taking the sum of the products. The sums will become the elements of the resulting 3×3 image. In the example below, we calculate the first element of the new image, and the rest can be calculated using the same strategy.
Trending AI Articles:
1. How ethical is Artificial Intelligence?
2. Predicting buying behavior using Machine Learning
3. Understanding and building Generative Adversarial Networks(GANs)
4. Building a Django POST face-detection API using OpenCV and Haar Cascades
You can see how this works on real images by reading Victor Powell’s nice blog post.
In contrast to grayscale images, color images have multiple channels such as red, blue, and green. Thus, we need to use filters that have the same number of channels and perform convolution for each channel and then take the sum. This is essentially a single convolutional layer.
Another building block of CNN is pooling layer. For instance, max pooling just takes the maximum of each sub-block, whereas average pooling takes the average. In practice, max pooling more commonly used. As we can see, pooling layer can reduce the number of dimensions and thus accelerate computation.
The final component of CNN is fully-connected layer. The output of the final convolutional and pooling layer will be flattened into a 1 dimensional vector and fed into one or more fully-connected layers. Taken together, an example CNN looks like something below:
One important idea of CNN is that the earlier layers may only be able to detect simple features like edges, but the later layers can gradually learn to detect complex objects like human faces.
Of course, this is just a brief introduction to CNN, to learn more about CNN and its applications I would recommend Andrew Ng’s course available on Coursera.
Residual Networks (ResNets)
Residual Networks (ResNets) are built on residual blocks as shown below.
The main idea of ResNets is to add the activation of layer l to the output of layer l+2 before applying non-linear activation. Thus, the activation of layer l can follow a shortcut to later layers in addition to going through layer l+1. This would allow us to train much deeper neural networks without suffering from gradient vanishing and exploding problems. In this project, we will utilize the the 34-layer ResNet (ResNet34) pre-trained on the ImageNet classification dataset, which is a technique called transfer learning.
Building a CNN classifier for pneumonia detection
I downloaded 5,863 chest x ray images from Kaggle which are labeled as either normal or pneumonia and are divided into train, validation, and test sets by the contributor. Let’s take a look at some example images.
It looks like the normal lungs are more clear than the ones with pneumonia, but it is difficult to make a diagnosis without any experience. Instead of learning how to read x ray images ourselves, let’s train an image classifier using CNN to detect pneumonia for us. To this end, we will take advantage of fast.ai, which is a high-level deep learning library running on top of PyTorch.
First, we need to define a method to load the images and perform data augmentation. We will use the ResNet34 architecture.
We can then load the data and build a CNN based on ResNet34, again this is fairly easy using fast.ai. I also use a trick taught by Jeremy Howard to first train the model with smaller 64 images and then gradually increase the image size.
Let’s train the model for 2 epochs by calling the fit method.
Noticed that we have only trained the last fully connected layer added on top of ResNet34. Next, we will fine-tune the entire model by using differential learning rates. The logic behind it is that the earlier convolutional layers may recognize basic image features and thus need less training than later layers. Thus, we will use smaller learning rates for earlier layers and larger ones for later layers.
We will repeat this learning strategy using 128×128 and 256 ×256 images.
After training, a separate test set was used to evaluate the performance of the CNN classifier. The test set is a little unbalanced as 63% of the test chest x ray images are pneumonia. I also use this number as baseline accuracy. The CNN classifier achieved an accuracy of 91%, which is substantially better than the baseline accuracy.
By analyzing the confusion matrix, we can see that the CNN classifier is particularly good at identifying pneumonia images as 387 out of 390 pneumonia images were correctly classified. Nevertheless, the CNN classifier has a modest tendency to label normal images as pneumonia images, resulting in a 23% false positive rate.
Based upon the confusion matrix, one obvious direction is to improve the CNN classifier’s performance on normal images. Adding more normal chest x ray images to the dataset is likely to help. Additionally, it would be a good idea to train the algorithm to first detect areas that are dramatically different between normal and pneumonia images, and then focus on analyzing those areas instead of the whole image.
I’m also curious about how well the CNN classifier performs by comparison with doctors? Can the algorithm do better than doctors?
This project’s source code can be found at: https://github.com/naity/pneumonia