If you are into deep learning and machine learning, most of the time you will have the problem of reaching good quality data and using that for training Neural Networks. With current developments in Cryptocurrency market, hot topic is applying deep learning models into trading and then predicting the price trends using those models and trading automatically with bots.
Why not making some money while doing some neural network magic, right? Wrong. It might sound easy but actually the stuff is much harder than you think. Most importantly, forget about getting a high performing model without good quality data. The reason is deep learning is different than traditional machine learning and it is highly dependent on how much and how good your data is.
When I first wanted to train a model using Python and Keras, I realized the problem and started to gather data first. Which is useful for all sorts of different uses. Not only predicting future price predictions, deep learning can also be used for uncovering the price changes between different exchanges, unrevealing arbitrage opportunities before it happens.
Here We Go:
- So to start gathering data, first step is fetching it from exchanges. That means we have to connect at least few of the exchanges and gather this in to the memory. To do that you have first have to reach this exchanges through their APIs. It is fairly simple to that if you would like to go through each of the documentation of the APIs but this will consume lot of time and we need to gather at least 10 different exchanges to unravel connections between the price movements and depending on exchanges. (For example when price starts dipping on major exchanges like GDAX, it might take some time to affect other smaller exchanges in some other countries).
So in this case your friend is an amazing python library called CCXT which has already most of the APIs already embedded into it. You just have to call a function and voila! Data you need is there. Here is the link to that: https://github.com/ccxt/ccxt
2. You need to write this data into somewhere and store it in an orderly fashion to use it later on for your money printing trading bot 🙂 For this you can use an MySQL server on digitalocean for pretty cheap and then implement SQL writing commands in your python code. I hate meddling around with SQL commands so I went ahead with a SQL library which is called dataset by pudo which you can check it out from here: https://github.com/pudo/dataset
3. I assume you don’t want to run your laptop 7/24 therefore so you need to set up a server. You got two options; expensive and easy or cheap and fast but hard to set up: So it is Heroku or Dokku on Digitalocean. I chose the path of digitalocean and set up a dokku server because for database intense applications it gets pretty expensive pretty quick. Have a look at the dokku tutorial from digitalocean: https://www.digitalocean.com/community/tutorials/how-to-use-the-digitalocean-dokku-application
So in these 3 steps you will be ready to start your own data collection server. Check out digitalocean tutorials for the basics of the server setup and learn how Procfile, heroku buildpacks work. (Since this topic is vast I skip in this tutorial.) Procfile and runtimes are already included in the github version. So all you have to do is set up the server and input your mysql login credentials.
So when we look at our structure it is a fairly simple architecture:
I will not go very detailed into whole code review but I want to show you some functions from the code:
I have used tickers to fetch data and use this data to create candles in database. There is already candle function inside ccxt library but it doesn’t function on many exchanges so ticker was way to go. (Although some exchanges doesn’t support OHLCV availability was much higher than using built-in OHLCV function.) Here is the code snippet from main.py file:
I guess you also noticed there is variable name called delta. There is another function inside main.py file which calculates price changes over 7 day and 24h timespan. Here you can have a look at the delta function. It uses the already saved data inside our SQL server therefore it will take some time for this function to work properly. It is optional to use it.
So I think you wonder how it works after few weeks. I am running the script for nearly for 3 months now and it created a massive 250 MB CSV file including dizzying number of columns just with numbers in it. The granularity is 5 minutes and there are 16 different exchanges that includes 10 to 20 different pairs inside generating hundred thousands of rows. Here how it looks like when exported from MySQL server: