Some companies are creating departments/teams to handle the Machine Learning part of their products. I do understand the theory behind this. You assemble a team that has experience with ML and then let them do the different products ML-specific code. The question is do this actually works? Let’s dig down to the basics to answer this question.
Machine Learning can be divided into two parts Supervised Learning and Unsupervised Learning. Supervised Learning is used when you have input and output training data and you want to find a formula for this relationship, for example, sentiment analysis and image classification. Unsupervised Learning on the other hand is used to find features in the data without any prior training data for example clustering and anomaly detection. I will not dig down to these in detail so if you want to learn more you can find a link at the bottom of the story.
This story will focus on Supervised Learning since it is more relevant to this story. Use the links at the end to learn more about this.
Machine Learning development cycle
The model development cycle can be broken down into 7 steps:
- Gathering Data
- Data preparation
- Data Wrangling
- Analyse Data
- Train the model
- Test the model
Note that some of these steps will be iterated sometimes until we have the desired result. I will not dig down to these in detail so if you want to learn more you can find a link at the bottom of the story.
When you are in the beginning phase it is easy to be eager to start building and training the model but you need to hold yourself. You have a lot of steps that you need to walk through to get to this point. To actually get to this point you will need to have a strong knowledge of the domain, and some statistical knowledge doesn’t hurt! If you don’t have this knowledge it will be hard for you to understand the data and in the end, understand what the algorithm does and this is dangerous. If we don’t understand on some level what the algorithm is doing it will be hard for us to evaluate it.
2. How AI Will Power the Next Wave of Healthcare Innovation?
3. Machine Learning by Using Regression Model
4. Top Data Science Platforms in 2021 Other than Kaggle
In my previous company, they understood that ML is the next big thing and also that it could bring great value to their products but they didn’t have the people with that kind of knowledge. They started a new team in the organization and recruited people to handle the company’s ML development. They got the people and were eager to start working. They reached out to other products to see if anyone was interested to start with an ML project. So far so good. They sold the idea that the product will not have to do anything, the ML team will handle everything. This sounds too good to be true! One project jumped into this and started working, or at least the ML team did. They worked for a while and the product team was forced to help them a lot which wasn’t the deal… After a while, the project was forced to quit since it did not work out. Why did this not work? This can be caused by multiple factors but let me walk through my theory!
As stated above the ML development cycle includes a lot of steps before the actual implementation. You will need to analyze the data, clean it, etc. The people in the ML team had knowledge about statistics and ML, but what they lacked is domain knowledge. You can come a long way with the ML knowledge but you need the domain knowledge to actually visualize and clean the data. So to sell it to the product team that they will not have to do anything is not the right way. You need to work tightly to make this work, or you need to basically be the same team. This is why, in my opinion, the project failed. I have no purpose to flame my old company. Failed projects happen!. As I understand they have learned from this mistake and are working better now with their ML projects. The most important is to learn from your mistakes!
When working with ML projects don’t just outsource this functionality and think that the other team/company will handle this on their little island. They need to have some domain knowledge or at least working tightly together with the people that have that. In my opinion, the best thing is if the product team can handle this by themself but of course, all teams don’t have Data Scientists or ML developers. The second best thing is to merge these teams together during this project and work closely between the teams. Don’t sell in this by saying that the products team can spend their time on other stuff. The product team will have to spend a lot of time to make this work!
The conclusion is just my theory but hopefully, you can see that I connect it with the ML cycle and hopefully agree.