Have you heard about the Darwin Awards? Jump on YouTube and have a look. It’s pretty funny in general. It’s a naughty honor that sets people up for the most sophisticated attempts to do something they think is cool. One is taking a selfie with a wounded bear, another is screwing an engine to a skate. These bold actions lead to serious mistakes with serious consequences and funny comments. Spoiler alarm – unfortunately – they all die. You do not want your startup to “die” from the mistakes of machine learning.
For the last 25 years, I have seen thousands of times when a person makes mistakes – but never when a machine makes a mistake. Nowadays, a mistake in the learning projects can cost businesses millions and years of useless work. For this reason, the most common mistakes in machine learning in terms of data, metrics, validation, and technology are collected here.
The likelihood of making a mistake when working with data is high. It’s easier to pass a minefield successfully than to make a mistake while working with the dataset. In addition, there can be several common mistakes:
- Unprocessed data, Unprocessed data is garbage that you can not be sure the model you have created is appropriate. Therefore, only preprocessed data should be the basis of an AI project.
- anomalies, To review and eliminate data about deviations and anomalies. Eliminating errors is one of the priorities of any machine learning project. The data may always be incomplete or incorrect or some information may be lost for a certain period of time.
- Incomplete data, Perhaps the easiest way to perform 10 experiments and get the result, but still not the right one. A small and unbalanced amount of data would lead to a conclusion that is far from the truth. So, if you have to train the network to distinguish Brilliant Penguins from Spectacled, some bear photos will not fly. Even if there are thousands of penguin pictures.
- A lot of data, Sometimes limiting the amount of data is the only correct solution. For example, you can get the most objective picture possible of future human action. Our world and humanity are incredibly unpredictable. It is usually like reading tea leaves when predicting a person’s response to his behavior in 1998. The result, which is pretty much the same, will be far from reality.
accuracy is a significant metric in machine learning. The pointless search for absolute accuracy, however, can be a problem for an AI project. In particular, if the goal is to create a predictive recommendation system. It is obvious that the accuracy can reach an incredible 99% if the grocery online supermarket offers to buy milk. I’ll bet a buyer will accept it and the referral system will work. But I’m afraid he would buy it anyway, so such a recommendation makes little sense. In the case of a city dweller who buys milk on a daily basis, such systems require an individual approach and promotion of goods (which were not previously in the cart).
A child learning the alphabet gradually masters letters, simple words, and phrases. He learns and processes information at a certain level. At the same time, the analysis of scientific work is incomprehensible to the toddler, although the words in the articles consist of the same letters that he learned.
The model of an AI project also learns from a particular record. However, the project will not try to verify the quality of the model in the same record. In order to estimate the model, it is necessary to use information specifically selected for the review which has not been used in training. In this way, the most accurate evaluation of the model quality can be achieved.
The choice of technology in an AI project is still a common mistake that leads, albeit not to serious, grave consequences that affect the efficiency and time of the project deadline.
No wonder that in machine learning, because of its universal algorithm, which is suitable for any task, there is hardly a higher-hyped topic than neural networks. However, this tool is not the most effective and fastest one for any task.
The brightest example is the Kaggle competition. Neural networks do not always come first. on the contrary, random tree networks have more chances of winning; it mainly refers to tabular data.
Neurons are more commonly used to analyze visual information, speech and more complex data.
Nowadays, the simplest solution is to use a neural network as a guide. At the same time, the project team should clearly understand which algorithms are suitable for a particular task.
I strongly believe that hype about machine learning will not be wrong, exaggerated and unfounded. Machine learning is another technical tool that makes our lives easier and more comfortable and gradually changes it for the better.
For many large projects, this article may just be a nostalgic look back at the mistakes they have made, but they have managed to survive and overcome serious difficulties on the way to becoming a product company.
But for those just starting their AI business, this is an opportunity to understand why it is not the best idea to take a selfie with a wounded bear and how not to fill the endless lists of “dead” startups.
Credit: Google News