The jupyter notebook related to this article.
Suppose you are trying to fit a line for linear data as shown.
You start with a random line and gradually, with the help of gradient descent, you try to minimize the error. However, there is another character in the story that plays a crucial role in helping you train your model well and train your model fast. This character is known as learning rate.
In order to fit a model well and fit it quickly, we need to start with a higher learning rate and gradually decrease it as we train more and more. This is much like playing golf, where initially since the hole is far away, you swing hard and try to reach as close to it as possible. Once you are near it, you hit slowly. It doesn’t matter if you cross over and reach on the other side of the hole as long as the distance between the ball and the hole decreases and you eventually put the ball in. Let’s take a look at the loss function (and forgive my awful drawings) to understand the things I am talking about.
Trending AI Articles:
1. Bursting the Jargon bubbles — Deep Learning
2. How Can We Improve the Quality of Our Data?
3. How to train a neural network to code by itself ?
4. Machine Learning using Logistic Regression in Python with Code
We start somewhere near the top and we want to move towards the bottom most point which is known as the global minimum.
First of all we don’t want our learning rate too low, otherwise we will only crawl towards our result.
Take a look at the decrease in error in this line of code to understand it better.
We also don’t want our learning rate to be really high, otherwise we will cross over to the other side, our error will keep increasing and eventually go out of bounds. Our model will get worse instead of getting better.
As seen above, the error just spirals out and reaches infinity.
What we really want is to make a big initial leap, and then crawl our way towards the answer. This is why we should periodically plot our learning rates and try to choose the best one.
Another issue that occurs is our loss function is rarely smooth as shown. In reality, it tends to have bumps all over it. Because of these bumps, we tend to reach what is known as local minimum.
The model performs well for the points around it but not overall. The earlier concepts of very high and very low learning rates are still applicable here.
Hence, when training a model we should really experiment with a bunch of learning rates. Our learning rates’ graph should ideally increase and then decrease as shown. This approach will help us train our model well
If you found this article useful, give it at least 50 claps.