If it is hard to figure out from the code how to use these models in real-life data (and this is normal), so here are some visualizations of the real-time evaluation:
The top chart shows the original data with true anomalies and detected anomalies. On the bottom chart, we can see the error of a model with the purple static threshold line.
And here is the visualization of the same process with the dynamic threshold.
As you can see, the dynamic threshold adapts to the dispersion of the error. That is why the threshold is low when the error deviates a bit and high otherwise.
Finally, we can compare the metrics to be sure that we correctly put the LSTM onto the first place. We are using F2-score to decide which model is the best. Precision and recall are shown separated for the understanding of weak and strong sides of our models.
However, ARIMA performs slightly better with the static threshold, and the neural networks outperforms it with dynamic threshold — especially LSTM.
Lastly, I would like to emphasize that these models can already be taken for production with not so much effort.
Nevertheless, these models are far from their limits and can be enhanced via:
- Increasing the amount of training data
- Adding other metrics such as memory, network, etc
- Combination of LSTM and CNN architectures
- Feature Engineering
Thank you very much for your attention, I hope that this tutorial gave you some understanding and hints on implementation.
And don’t stop looking for anomalies!