A convolutional neural network (CNN) is a particular implementation of a neural network used in machine learning that exclusively processes array data such as images, and is thus frequently used in machine learning applications targeted at medical images.
A convolutional neural network typically consists of the following three components although the architectural implementation varies considerably 5–7:
- feature extraction
- classification and output
The most common input is an image, although considerable work has also been performed on so-called 3D convolutional neural networks that can process either volumetric data (3 spatial dimensions) or video (2 spatial dimensions + 1 temporal dimension).
In most implementations, the input needs to be processed to match the particulars of the CNN being used. This may include cropping, reducing the size of the image, identification of a particular region of interest, as well as normalizing pixel values to particular regions.
The feature extraction component of a convolutional neural network is what distinguishes CNNs from other multilayered neural networks. It typically comprises of repeating sets of these sequential steps:
- Input (image) is convoluted by application of numerous kernels
- Each kernel results in a distinct feature map
- Each feature map is downsized to a smaller matrix by pooling the values in adjacent pixels
Non-linear activation unit
- The activation of each neuron is then computed by the application of this non-linear function to the weighted sum of its inputs and an additional bias term. This is what gives the neural network the ability to approximate almost any function.
- A popular activation unit is the rectified linear unit (ReLU).
- During convolution and pooling processes results in some pixels in the matrix having negative values.
- The rectified linear unit ensures all negative values are at a zero.
These three steps are then repeated many times, each convolution layer acting upon the pooled and rectified feature maps from the preceding layer. The result is an ever smaller matrix size with activation dependent on more and more complex features due to the cumulative interaction of numerous prior convolutions.
Classification and output
The final pooled and rectified feature maps are then used as the input of fully connected layers just like in a fully connected neural network, and thus discussed separately.
Kernel can be understood as a small 2-D matrix which is used for the case of establishing a relationship of the center pixel with respect to its neighbouring pixels. Kernels are organised in odd dimension matrices i.e. 3×3, 5×5 and so on. A 3×3 kernel is shown below –
The simple movement of kernel over the larger image/matrix with the intension of changing the middle pixel in the larger image/matrix is known as convolution. It is the basis of CNN. A simple example of convolution is shown in the figure above. The change in center pixel and the convolution is explained below –
This was the original sliced larger matrix on which the kernel was. So the middle value, i.e. 252, on convolution becomes —
93*(-1) + 139*(0) + 101*(+1) + 26*(-2) + 252*(0) + 196*(+2) + 135*(-1) + 230*(0) + 18*(+1) = 231
So on convolution of the kernel matrix on the above matrix will led to change in the value of 252 to 231.
Strides reflect the number of positions by which the kernel matrix shifts after applying a convolution on the middle element. A stride value of 1 indicates that the kernel matrix gets shifted only by 1 pixel in the either direction, right or down.
To make the kernel convolute on each of the pixel, we have to add a layer of zeros along the edges of the larger matrix/image. This addition of zeros on each side of the matrix is known as padding. If NxN is the dimension of the kernel matrix, we need to add floor(N/2) zero layers to the edges.
Most frequently convolutional neural networks in radiology undergo supervised learning. During training both the weighting factors of the fully connected classification layers and the convolutional kernels undergo modification (backpropagation).