From Bayesian statistics to product managers, a machine learning project has a lot of pieces and they all have to work together to be successful. Your team will be made up of people with different specialties and that’s one of the things that makes machine learning so cool. Anytime something cool happens people go a little crazy with it and machine learning teams are no different. Some businesses have taken on the thought that if they have enough data, they can throw a team at it and get incredibly valuable results.
There are some companies that go as far as buying massive amounts of data before they know what they are going to use it for. Machine learning isn’t the answer to every problem, especially if you aren’t asking the right questions of your data. To make sure you have a practical understanding of the flow of a machine learning project, we’ll cover five basic steps that you need to include.
Having a lot of data and hoping something useful pops out of it isn’t a good strategy. You should have a specific issue you are trying to learn more about or predict. Without covering this first step you could wind up spending an obscene amount of time and money spinning everyone’s wheels. Machine learning, at the moment, doesn’t “think” in general terms. You can throw information at a person all day and they’ll start to make patterns out of it, whether they are real or not.
You can’t do that with a program yet. You still have to tell it exactly what you are looking for based on the information you are giving it. Remember, data is not a solution. It’s a tool. Throwing data at a machine learning team without a specific purpose is like using bananas to wire circuits. You’ll definitely get something, but it’s probably not going to be what you expected.
Once you have a defined problem, your team can get to work. Now they need to figure out what data they need and how they can get it from what they have available. This will be things like the properties of a car. You might work for a car dealership and they want to know how to maximize their profits for a certain model of car. So you might look at the customer demographics of people who have bought the car before, the time of day, the weather, and maybe the most popular colors.
Those are just some quick examples of parameters you might use for your algorithms. This step is crucial in determining how much data and what kind of data you need and how long it will take to train your machine learning model. The more parameters you have, the more accurate your results will be. Although too many parameters can make your model super slow which leads to other problems.
You know what problem you’re trying to solve and you know what parameters you need to start working on it. Now you need to get the right training data. It wouldn’t do you any good to use data from Tennessee if you are selling cars in Alaska. Your data has to align with the issue you are trying to solve. This is the time for a company to look into buying data or collecting data from users.
Your machine learning model is only going to be as good as the data you give it. Most people focus on the algorithms because of how cutting edge they are and they forget about or ignore the importance of good, clean data. It’s one of those things that tends to get overlooked until the project is in full swing and can really bite you if you wait too long for corrections.
This is where your smart machine learning people come in. They should be able to take the parameters you are working with and figure out what algorithms to use or how to tweak them to fit your needs. They’ll do comparisons between training time, error ratios, and predicted values. Depending on the resources and time they have, they will choose what algorithm will do the best job through small tests.
After more statistical analysis, they’ll start hammering out the math that will represent your algorithm. Then they’ll do more testing and analysis. Once they have the error margin within an acceptable range and they’ve talked through the details with the rest of the team, the developers take over. It’s also possible that your machine learning people will write the code for the algorithms.
This is when things get more familiar. You still go through similar processes with your development team. There will be sprints and code reviews and deployments. This group connects all of the theory that led up to this moment. They’ll write the code that actually trains the machine learning model and they’ll connect it with the data used for training. At this point things are really moving and it’s harder to make fundamental changes to the project.
When the development team is finished you should have some software that will give you predictions or optimizations based on the input parameters you give it. That’s what this whole process boils down to. What you decide to do with the information the software spits out depends on that problem you set out to solve.
You made it! The project is done and you have some interesting results to look at. First, you might want to check that your results make sense. Use completely new data to see if your model holds up. This can happen slowly over time as you get more user data or you can speed it up by using a different set of data than what you trained with.
Make sure your results make sense to you and the rest of the team before you do any big presentations. Keep your tests relevant to the initial problem as well. We aren’t quite advanced enough in machine learning for a piece of software to think critically, so it’s up to you to do a sanity check.
There are many more nuts and bolts behind this process and they all get interesting really fast. Plus, with libraries like TensorFlow and Brainjs it’s easier for web developers to start testing out the machine learning waters. The field is wide open for anyone brave enough (or crazy enough) to jump in.
Do you think machine learning is worth all of the hype? I think it gives us some new solutions to old problems and with more time it’ll get better. But that tends to lead to philosophical questions like, can a machine think? And how far should we really go with machine learning?