During the study of Data Science, I met a batch of new algorithms and libraries useful for data analysis and predictions. All of them have their cons and pros. It is not a surprise. But only one algorithm has a reputation as lazy and greedy from the first word. You can put in this definition some different variants. Also, we can debate about the laziest algorithm! But today, I talk about the k-nearest neighbors’ algorithm, where symbol k means the number of nearest neighbors. A lot of blogs started from weaknesses and forgot about the advantages.
KNN is an effective classification and regression algorithm that uses nearby points to generate a prediction.
Why do we call KNN lazy?
KNN model does not need any training data points for model generation. That means it doesn’t learn a discriminative function from the training data but rather memorizes all training data instead. It makes training faster and brings a real problem — memory and time greedy for the testing phase.
Why does it happen with KNN?
The KNN algorithm assumes that similar things exist close, similar things are near to each other.
If you are similar to your neighbors, then you are one of them.
- All of the training data must be present in memory to calculate the closest K neighbors.
- Sometimes the cost of the calculating distance between the new point and each existing point becomes higher.
Is KNN has something good?
- We can start from data because this algorithm has no assumptions about data.
- It is easy to understand, so it is a simple algorithm.
- We can use it for classification and regression (For regression — when the output variable takes continuous values. For classification — when the output variable takes class labels.)
- Usually, this algorithm has high accuracy (relatively).
2. Generating neural speech synthesis voice acting using xVASynth
3. Top 5 Artificial Intelligence (AI) Trends for 2021
4. Why You’re Using Spotify Wrong
Where can we use KNN?
- KNN one of the most popular algorithm for text categorization or text mining.
- KNN can help in forecasting in agriculture when we need to predict climate. Also, it can use for estimating soil water parameters. Evaluate forest inventories and for estimating forest variables.
- Finance. For example, when we collect financial characteristics vs. comparing people with similar finance features to a database. We can use this information to: predict credit rating, manage loans, bank customer profiling, money laundering analyses, etc.
- Medicine. Algorithms based on clinical and demographic variables can identify the risk factors for cancer, predict hospitalization due to a heart attack, etc. Or we can estimate the amount of glucose in the blood of a diabetic person from the infrared absorption spectrum of that person’s blood. (Hello, new apple watch!)
And it is only a few examples that I can remember from real life. The KNN algorithm is a robust and versatile classifier that simple to use and easy to understand.
In this blog, I do not give you any hard definitions and formulas, just my thought about one algorithm and its reputation. Thank you for reading!