Learn to create your own Deep ConvNet. Here I will explain each term and concept of a typical ConvNet architecture.
Hello World! In the previous article, I introduced an abstract view of ConvNets. In this article, I will dive into details of ConvNets.
A ConvNet is made up of Layers. Every Layer has a simple API: It transforms an input 3D volume to an output 3D volume with some differentiable function that may or may not have parameters.
Building blocks of ConvNet
Conv layer is the core building block of CNN. All the heavy computations are performed in this layer. In this layer, the filters ( also called kernels ) are convoluted to the input matrix. Let’s visualize this concept with a simple example.
Consider an input volume with size [32 x 32 x 3], (RGB image). If the filter size is 5×5, then each neuron in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a total of 5 x 5 x 3 = 75 weights (and +1 bias parameter). The number of filters used here will determine the depth of the output layer.
Hyperparameters of Conv layer— Depth, Stride, Padding.
The number of filters used is called Depth of the Conv layer. It determines the depth of output volume.
Trending AI Articles:
1. My solution to achieve top 1% in a novel Data Science NLP Competition
2. Application of Deep Learning in Front-end Web Development (HTML + CSS)
3. Investigating Focal and Dice Loss for the Kaggle 2018 Data Science Bowl
4. Paper repro: “Self-Normalizing Neural Networks”
Until now we have used a stride of 1 step, i.e. we have moved the filters 1 pixel at a time. Stride is the step we take to multiply with a filter. When the stride is 2 then the filter jumps 2 pixels at a time when we slide them.
Sometimes it is convenient to pad the input volume with zeros around the border. The size of this zero-padding is a hyperparameter. Padding allows us to control the spatial size of the output volumes. Mostly we use it to exactly preserve the spatial size of the input volume so the input and output width and height are the same.
Let ‘p’ is the padding amount then the resultant matrix will be of size [n + 2p -f + 1]. Where ’n’ is input volume and ‘f’ is filter size.
With stride of size ‘s’ the output volume size will be [(n + 2p – f) / s + 1].
It’s used to reduce the number of parameters and computation in the network. Max pooling is most commonly used. As the name suggests the maximum element is filtered into the new volume. Average pooling is also another type of pooling.
A max pooling of size [2×2] is applied to a volume of size [4×4] which results into a volume of [2×2]. In general, a volume is downsampled to [(n- f ) / s + 1]. Where the symbols have there usual meaning.
A fully-connected layer is just another neural network layer. They are fully connected to all the activations in the previous layers.
So far we have seen that the ConvNets are made up of three types of layers: Conv layer, Pool layer and FC (fully connected).
An activation function is used to calculate the activations in the Conv layer. Relu is the most commonly used activation function. Sigmoid and Tanh are some other commonly used activation functions.
Layer pattern: Conv(Relu) -> POOL -> FC