A review outside the common datasets for Machine Learning
When you begin in the Machine Learning field, you usually use the common datasets such as MNIST, Iris, the 20 newsgroups, … But there are hundreds of rare and interesting datasets that can be found online. At Immune Technology Institute we have asked our teachers to create a list of the most strange datasets they have found. Here we go!!
This is a repository which contains a registry of the historical marijuana prices, which shows significant differentiation at the state level in prices. The question here is how the data has been collected?
Although it may seem a useless dataset, it may be very relevant in the times we live in, as many countries are considering legalizing marijuana.
If you have never asked you, as is normal, what is the optimal length of chopsticks, no worries, someone has asked this question before. A researcher team tried to evaluate the effects of the length of the chopsticks on the food-serving performance of adults and children. For this reason, they created this dataset for finding the optimal length of chopsticks.
They concluded that the food-pinching performance was considerably affected by the length of the chopsticks. The researchers suggested that families with children should provide both 240 and 180 mm long chopsticks. In addition, restaurants could provide 210 mm long chopsticks, considering the trade-offs between ergonomics and cost.
A dataset which contains more than of 3500 rice grain’s images of 2 different species. Different properties were extracted from each grain of rice, such as:
- The longest line that can be drawn on the rice grain
- The shortest line that can be drawn on the rice grain
- Or the perimeter of each grain.
Did you know that the most popular dog name in Sweden is Molly?
This dataset collects the most popular dog names in Sweden in 2018 by number of animals. Bella ranked the second most popular name, with almost six thousand animals, followed by the name Charlie, reaching a number of approximately 4600.
I am pretty sure that Sheldon will love this one… This dataset contains details of various nations and their flags, such as:
- The religion of each country.
- The predominant colour in the flag.
- If the flag contains a crescent moon or sunstars.
- If it contains an eagle, a tree, …
Maybe it is interesting for predicting the religion of a country from its size and the colours in its flag.
1. Microsoft Azure Machine Learning x Udacity — Lesson 4 Notes
2. Fundamentals of AI, ML and Deep Learning for Product Managers
3. Roadmap to Data Science
4. Work on Artificial Intelligence Projects