Credit: BecomingHuman
What is a proper way to do it?
As I mentioned before, it’s pretty easy to get MLflow running on your machine, I mean, the local machine. You pip install
it and you are almost done. But what if you need it to get in a more, let’s say, production ready way? Well, nothing is closer to production than having a MLflow remote tracking server with either SFTP or S3 storage configure. And to be honest, that’s where we lack a lot of information and code from the existing stories.
Before we dig into the code, let’s have a look at the architecture:
The whole idea, to mimic a real production environment, is to have the User
using the services in a remote fashion. Hence, I built the environment using Docker containers with one container per service.
Although one might say that training models from within a Docker container in a MacBook is not better than training on the MacBook itself, it was done in this way to ease the communication between the services.
The idea behind the diagram above it to have the user running a few command lines and a whole environment is ready to be used, or tested. After cloning the repository, what one has to do is the following:
- Open a terminal window
- Type
docker-compose up
- Open a terminal tab
- Type
./scripts/copy_known_hosts.sh
- Type
./scripts/create_experiments.sh
- Got to http://localhost:9000 and create a bucket called
ai-models
After that, just go to JupyterLab (http://localhost:8888) and play around with the one existing notebook there.
This gives you a whole environment, from remote tracking server to storage (with two different protocols). Not to mention the JupyterLab playground.
How to get to production from here?
That’s an easy one. If you already have an S3/SFTP compatible storage, just configure the MLflow image to use that storage instead of the one from the example.
If you don’t have a storage, then get one! You cannot use your local disk as storage. However, if you cannot afford the storage money, just use the setup I created and make sure you backup the storage
directory into an external disk so you can restore it if you need to.
All in all, do not get locked in because it doesn’t take much to get a proper system running. In a team of five, at least one person should know how things work.
Are you still reading?
Thanks a lot for your time. I hope you have enjoyed it and that it can save you some time and help you to get your models properly versioned. Now, to the code and some more reading, please follow this link: Artificial Intelligence Engineering.
I have some other stories on DL + NLP + AWS that you might want to look at. So, stop reading now, clap and go to the next story. 😉
Don’t forget to give us your 👏 !
Credit: BecomingHuman By: Wilder Rodrigues