Thursday, February 25, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Neural Networks

Understanding the basics of CNN with image classification.

October 8, 2019
in Neural Networks
Understanding the basics of CNN with image classification.
585
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Finally, you compute the sum of all the elements in Z to get a scalar number, i.e. 3+4+0+6+0+0+0+45+2 = 60.

Convolution(Conv) operation (using an appropriate filter) detects certain features in images, such as horizontal or vertical edges. For example- in the image given below, in the convolution output using the first filter, only the middle two columns are nonzero while the two extreme columns (1 and 4) are zero. This is an example of vertical edge detection.

You might also like

How to Make Data Annotation More Efficient? | by ByteBridge | Feb, 2021

How Is Machine Learning Revolutionizing Supply Chain Management | by Gina Shaw | Feb, 2021

Statistical Concepts behind A/B Testing | by Sarvagya Dasgupta | Feb, 2021

Similarly above filter with 1’s placed horizontally and 0s in the middle layer can be used for horizontal edge detection.

Image convolution with filter.

During Convolution, Image(224*224*3) is convolved with a 3*3 filter and a stride of 1, to produce 224*224 array-like shown below.

Convolved image

The o/p(24*24)is passed to the Relu activation function to remove the non-linearity and produces feature maps(24*24) of the image.

2. Pooling+Relu

Pooling + Relu layer

The pooling layer looks at larger regions (having multiple patches) of the image and captures an aggregate statistic (max, average, etc.) of each region to make the n/w invariant to local transformations.

The two most popular aggregate functions used in pooling are ‘max’ and ‘average’.

  • Max pooling: If any one of the patches says something strongly about the presence of a certain feature, then the pooling layer counts that feature as ‘detected’.
  • Average pooling: If one patch says something very firmly but the other ones disagree, the pooling layer takes the average to find out.

3. Fully Connected(FC) layer

The o/p of a pooling layer is flattened out to a large vector. It contains a softmax activation function, which outputs a probability value from 0 to 1 for each of the classification labels the model is trying to predict.

summing up above points, the final convolutional neural network looks like –

CNN network

For more details on the above, please refer to here.

There are various techniques used for training a CNN model to improve accuracy and avoid overfitting.

  1. Regularization.

For better generalizability of the model, a very common regularization technique is used i.e. to add a regularization term to the objective function. This term ensures that the model doesn’t capture the ‘noise’ in the dataset or does not overfit the training data.

Objective function = Loss Function (Error term) + Regularization term

Hence the objective function can be written as:

Objective function = L(F(xi),θ) + λf(θ)

where L(F(xi),θ) is the loss function expressed in terms of the model output F(xi) and the model parameters θ. The second term λf(θ) has two components — the regularization parameter λ and the parameter norm f(θ).

There are broadly two types of regularization techniques(very similar to one in linear regression) followed in CNN:

  • L1 norm: λf(θ) = ||θ||1 is the sum of all the model parameters
  • L2 norm: λf(θ) = ||θ||2 is the sum of squares of all the model parameters

2. Dropout.

A dropout operation is performed by multiplying the weight matrix Wl with an α mask vector as shown below.

Then, the shape of a vector α will be (3,1). Now if the value of q(the probability of 1) is .66, the α vector will have two 1s and one 0.Hense, the α vector can be any of the following three: [1 1 0] or [1 0 1] or [0 1 1].

One of these vectors is then chosen randomly in each mini-batch. Let’s say that, in some mini-batch, the mask α=[1 1 0] is chosen. Hence, the new(generalized) weight matrix will be:

Dropouts.

All elements in the last column become zero. Thus few neurons(shown in the image below) which were of less importance are discarded, making the network to learn more robust features and thus reducing the training time for each epoch.

Dropout in a neural network.

3. Batch Normalization.

This technique allows each layer of a neural network to learn by itself a little bit more independently of other previous layers. For example- In a feed-forward neural network

h4=σ(W4.h3+b4)=σ(W4.(σ(W3.(σ(W2.(σ(W1.x+b1))+b2))+b3))+b4)

h4 is a composite function of all previous networks(h1,h2,h3). Hense when we update the weights (say) W4, it affects the output h4, which in turn affects the gradient ∂L/∂W5. Thus, the updates made to W5 should not get affected by the updates made to W4.

Thus Batch normalization is performed on the output of the layers of each batch, H(l). O/p layer is normalized by the mean vector μ and the standard deviation vector ^σ computed across a batch.

Understanding the above techniques, we will now train our CNN on CIFAR-10 Datasets.

CIFAR-10 dataset has 10 classes of 60,000 RGB images each of size (32, 32, 3). The 10 classes are an airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. This dataset can be downloaded directly through the Keras API.

To experiment with hyperparameters and architectures (mentioned above) for better accuracy on the CIFAR dataset and draw insights from the results.

  • Adding and removing dropouts in convolutional layers
  • Batch Normalization (BN)
  • L2 regularisation
  • Increasing the number of convolution layers
  • Increasing the number of filters in certain layers

Approach:

Initially, to start with, we have a simple model with dataset set to train and test expected to run for 100 epochs and classes set to 10. A simple sequential network is built with 2 convolution layers having 32 feature maps each followed by the activation layer and pooling layer.

  1. Dropouts After Conv and FC layers

A dropout of .25 and .5 is set after convolution and FC layers. A Training accuracy of 84% and a validation accuracy of 79% is achieved.

2. Remove the dropouts after the convolutional layers (but retain them in the FC layer) and use the batch normalization(BN) after every convolutional layer.

Training accuracy ~98% and validation accuracy ~79%. This is a case of overfitting now as we have removed the dropouts. With high training accuracy, we can say that the dataset has learned the data.

3. Use dropouts after Conv and FC layers, use BN:

  • Training accuracy ~89%, validation accuracy ~82%

Significant improvement in validation accuracy with the reduced difference between training and test. We can say that our model is being able to generalize well.

4. Remove dropouts from Conv layers, use L2 + dropouts in FC, use BN:

  • Training accuracy ~94%, validation accuracy ~76%.

A significant gap between training and test dataset is found. L2 regularization is only trying to keep the redundant weights down but it’s not as effective as using the dropouts alone.

5. Dropouts after Conv layer, L2 in FC, use BN after convolutional layer

Train accuracy ~86%, validation accuracy ~83%

The gap has reduced and the model is not overfitting but the model needs to be complex to classify images correctly. Hence we shall add more layers as we go forward.

6. Add a new convolutional layer to the network.

Along with regularization and dropout, a new convolution layer is added to the network.

Train accuracy ~89%, validation accuracy ~84%

Though training and validation accuracy is increased but adding an extra layer increases the computational time and resources.

7. Adding feature maps.

Add more feature maps to the Conv layers: from 32 to 64 and 64 to 128.

Instead of adding an extra layer, we here add more feature maps to the existing convolutional network. The choice between the above two is situational.

  1. Add an extra layer when you feel your network needs more abstraction.
  2. Add more feature maps when the existing network is not able to grasp existing features of an image like color, texture well.

Train accuracy ~92%, validation accuracy ~84%

Though the accuracy is improved, the gap between train and test still reflects overfitting.

On adding more feature maps, the model tends to overfit (compared to adding a new convolutional layer). This shows that the task requires learning to extract more (new) abstract features- by adding more complex dense network, rather than trying to extract more of the same features.

Conclusion:

The performance of CNNs depends heavily on multiple hyperparameters — the number of layers, number of feature maps in each layer, the use of dropouts, batch normalization, etc. Thus, it’s advisable to first fine-tune your model hyperparameters by conducting lots of experiments. Once the right set of hyperparameters are found, the model should be trained with a larger number of epochs.

The source code that created this post can be found here. I would be pleased to receive feedback or questions on any of the above.

Credit: BecomingHuman By: Sneha Bhatt

Previous Post

Zero-day published for old Joomla CMS versions

Next Post

using machine learning to prevent 50,000 cancer deaths

Related Posts

How to Make Data Annotation More Efficient? | by ByteBridge | Feb, 2021
Neural Networks

How to Make Data Annotation More Efficient? | by ByteBridge | Feb, 2021

February 25, 2021
How Is Machine Learning Revolutionizing Supply Chain Management | by Gina Shaw | Feb, 2021
Neural Networks

How Is Machine Learning Revolutionizing Supply Chain Management | by Gina Shaw | Feb, 2021

February 25, 2021
Statistical Concepts behind A/B Testing | by Sarvagya Dasgupta | Feb, 2021
Neural Networks

Statistical Concepts behind A/B Testing | by Sarvagya Dasgupta | Feb, 2021

February 24, 2021
Generating Music Using LSTM Neural Network | by Linan Chen | Jan, 2021
Neural Networks

Generating Music Using LSTM Neural Network | by Linan Chen | Jan, 2021

February 24, 2021
Convolutional Neural Networks with Keras | by Krishnakumar Karancherry
Neural Networks

Convolutional Neural Networks with Keras | by Krishnakumar Karancherry

February 24, 2021
Next Post
using machine learning to prevent 50,000 cancer deaths

using machine learning to prevent 50,000 cancer deaths

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

More than 6,700 VMware servers exposed online and vulnerable to major new bug
Internet Security

More than 6,700 VMware servers exposed online and vulnerable to major new bug

February 25, 2021
Everything You Need to Know About Evolving Threat of Ransomware
Internet Privacy

Everything You Need to Know About Evolving Threat of Ransomware

February 25, 2021
Machine learning speeding up patent classifications at USPTO
Machine Learning

Machine learning speeding up patent classifications at USPTO

February 25, 2021
How to Make Data Annotation More Efficient? | by ByteBridge | Feb, 2021
Neural Networks

How to Make Data Annotation More Efficient? | by ByteBridge | Feb, 2021

February 25, 2021
How to Nail Virtual and Digital Communication
Marketing Technology

How to Nail Virtual and Digital Communication

February 25, 2021
Google funds Linux kernel developers to work exclusively on security
Internet Security

Google funds Linux kernel developers to work exclusively on security

February 25, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • More than 6,700 VMware servers exposed online and vulnerable to major new bug February 25, 2021
  • Everything You Need to Know About Evolving Threat of Ransomware February 25, 2021
  • Machine learning speeding up patent classifications at USPTO February 25, 2021
  • How to Make Data Annotation More Efficient? | by ByteBridge | Feb, 2021 February 25, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates