Thursday, February 25, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Neural Networks

Initializing neural networks – Becoming Human: Artificial Intelligence Magazine

September 13, 2019
in Neural Networks
Initializing neural networks – Becoming Human: Artificial Intelligence Magazine
585
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Setup

Let’s start by grabbing the MNIST dataset. Since we do this a lot, we will define a function to do so.

Let’s now calculate the mean and standard deviation of our data.

You might also like

Label a Dataset with a Few Lines of Code | by Eric Landau | Jan, 2021

How to Make Data Annotation More Efficient? | by ByteBridge | Feb, 2021

How Is Machine Learning Revolutionizing Supply Chain Management | by Gina Shaw | Feb, 2021

We want our mean to be 0 and standard deviation to be 1 (more on this later). Hence we normalize our data by subtracting the mean and dividing by the standard deviation.

Notice that we normalize the validation set with train_meanand not valid_mean to keep the training and the validation sets on the same scale.

Since the mean (or sd) will never exactly be 0 (or 1), we also define a function to test if they are close to 0 (with some threshold).

Next, let’s initialize our neural network

Problems with initialization

Initializing neural networks is an important part of deep learning. It is at the heart of why we can make our neural networks as deep as they are today. Initializing determines if we converge well and converge fast.

We want to initialize our weights in such a way that the mean and variance are preserved as we pass through various layers. This does not happen with our current initialization.

We can see that after just one layer the values of our activations (output of a layer) are so far off. If we repeat this process for a number of layers it will lead to gradient exploding as shown below.

The activations of our models grow so far beyond reasonable values that they reach infinity. And it doesn’t even take 100 multiplications for this to happen.

So how do we deal with this? Maybe we can scale them down by a factor to keep them from exploding.

And that doesn’t work as well. While the idea was right, choosing a wrong factor can lead to diminishing gradients (values reaching 0).

Choosing the right scaling factor — Xavier init

What should the value of the scaling factor be?

The answer is (1 /⎷input). This initialization technique is known as Xavier initialization. If you want to learn about the math behind the same, you can read the original paper or one of the reference articles mentioned at the end of this article. One good tip when it comes to reading research papers is to search for articles that summarize them.

And dividing by ⎷input does work. Note that if we want to preserve the gradients in the backward pass we would divide by ⎷output.

The Xavier paper also provides a number of good visualizations as shown below.

Problem with Xavier init

The Xavier paper assumes that our activation functions are going to be linear (which they are not). Hence it ignores the effect of our activation functions on the mean and variance. Let’s think about ReLU.

A ReLU takes all the negative values in our distribution and turns them into 0s. That certainly does not preserve the mean and variance of our data. If anything, it makes them half their original value. And this happens every layer so that 1/2s are going to add up.

The solution to this problem was suggested in a paper called Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.

The simple idea is that, since our values are reducing by half every time, we just add an extra 2 in the numerator to cancel it out. This initialization technique is known as kaiming initialization or he initialization.

Even though our mean is not so good, it certainly helps our standard deviation. And it is amazing what good initialization can do. There is a paper called Fixup initialization where the authors trained a 10,000 layer neural network without any normalization just by careful initialization. That should be enough to convince you that initializing neural networks well is important.

If you liked this article give it atleast 50 claps :p

If you want to learn more about deep learning check out my series of articles on the same:

References:

  1. Deep learning from the foundations, fast.ai
  2. Understanding Xavier initialization in deep neural networks.
  3. How to initialize deep neural networks?
  4. Notes on weight initialization for deep neural networks
  5. Variance of product of multiple random variables

Credit: BecomingHuman By: Dipam Vasani

Previous Post

How to Build Revenue-Driving Holiday Campaigns Beyond Black Friday

Next Post

Bringing Augmented Reality to life with 'virtual humans' using Artificial Intelligence – the mission of Scanta

Related Posts

Label a Dataset with a Few Lines of Code | by Eric Landau | Jan, 2021
Neural Networks

Label a Dataset with a Few Lines of Code | by Eric Landau | Jan, 2021

February 25, 2021
How to Make Data Annotation More Efficient? | by ByteBridge | Feb, 2021
Neural Networks

How to Make Data Annotation More Efficient? | by ByteBridge | Feb, 2021

February 25, 2021
How Is Machine Learning Revolutionizing Supply Chain Management | by Gina Shaw | Feb, 2021
Neural Networks

How Is Machine Learning Revolutionizing Supply Chain Management | by Gina Shaw | Feb, 2021

February 25, 2021
Statistical Concepts behind A/B Testing | by Sarvagya Dasgupta | Feb, 2021
Neural Networks

Statistical Concepts behind A/B Testing | by Sarvagya Dasgupta | Feb, 2021

February 24, 2021
Generating Music Using LSTM Neural Network | by Linan Chen | Jan, 2021
Neural Networks

Generating Music Using LSTM Neural Network | by Linan Chen | Jan, 2021

February 24, 2021
Next Post
Bringing Augmented Reality to life with ‘virtual humans’ using Artificial Intelligence – the mission of Scanta

Bringing Augmented Reality to life with 'virtual humans' using Artificial Intelligence – the mission of Scanta

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Ukraine reports cyber-attack on government document management system
Internet Security

Ukraine reports cyber-attack on government document management system

February 25, 2021
KPMG, BitGo, and Coin Metrics launch combined offering for public blockchains
Blockchain

KPMG, BitGo, and Coin Metrics launch combined offering for public blockchains

February 25, 2021
IBM Reportedly Retreating from Healthcare with Watson 
Artificial Intelligence

IBM Reportedly Retreating from Healthcare with Watson 

February 25, 2021
Using machine learning to identify blood biomarkers for early diagnosis of autism
Machine Learning

Using machine learning to identify blood biomarkers for early diagnosis of autism

February 25, 2021
Label a Dataset with a Few Lines of Code | by Eric Landau | Jan, 2021
Neural Networks

Label a Dataset with a Few Lines of Code | by Eric Landau | Jan, 2021

February 25, 2021
How to Identify and Prioritize Marketing Ideas
Marketing Technology

How to Identify and Prioritize Marketing Ideas

February 25, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Ukraine reports cyber-attack on government document management system February 25, 2021
  • KPMG, BitGo, and Coin Metrics launch combined offering for public blockchains February 25, 2021
  • IBM Reportedly Retreating from Healthcare with Watson  February 25, 2021
  • Using machine learning to identify blood biomarkers for early diagnosis of autism February 25, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates