Understanding of LSTM can be a bit challenging, I recommend you to take one step at a time and move to next part only when you are comfortable with the previous one. By the end of this article you will have a good understanding of LSTM and when they should be preferred over RNN.
Long short term memory (LSTM) is a special type of Recurrent Neural Network (RNN), RNN is a type of Neural Network in which output from the previous step are fed as input to the current step. RNN is used mainly in time series forecasting problems.
The major appeal behind RNN is the usage of previous values to predict the present values, but RNN suffers from a drawback , as the number of previous value increases RNN tends to perform poorly. It could be clear with the help of an example: “The fishes are in the water” , if we want to predict the last word from the sentence which is water, we can predict it from the given sentence and no further context is needed, in these cases the gap is small and RNN performs well. RNN performs well in short-term dependency.
In case of long term dependency the performance of RNN is not good and RNN becomes unable to learn to connect information. “I grew up in German.. I speak fluent German” , in the above language model if we want to predict the last word we need more previous layer and the gap grows so RNN fails.
To overcome the shortcomings of RNN, especially those of vanishing gradient,LSTM was introduced, LSTM was designed to avoid long term dependency problem, remembering information for long period of time is their default behaviour.
RNN only has a single tanh layer in its structure, while LSTM structure consist of 4 intersecting layers.