Ever wondered about the term “Generalization” for ML models? Generalization in Machine Learning means, the model which you built using your data, gives better results on testing data compared to the training data.
How to achieve generalization? By simply changing the random state at the time of splitting the data into training and validation data you can achieve generalization.
Let’s take an example of the iris dataset. Iris dataset has features as sepal length, sepal width, petal length, petal width. The labels are Setosa, Versicolor, and Virginica. It has 150 rows.
Just loop random state in a range from 0 to 99 and calculate the train and test score of the models created. In the loop specify the condition if the score for test data is better than training data, then append the random state, training score, and testing score in a list called scores.
The models generated with the random states appended in the list scores are all generalized models as they perform better for testing data set compared to training data set. If you are doing a regression problem you can use other metrics like RMSE and this time the goal will be testing error should be less than the training error. Since it is a classification problem we will be using the accuracy metric.
2. Generating neural speech synthesis voice acting using xVASynth
3. Top 5 Artificial Intelligence (AI) Trends for 2021
4. Why You’re Using Spotify Wrong
Sort scores in descending order to get the generalized model with the highest accuracy on testing data.
With random state 0, I am getting the highest accuracy on testing data, so I will be selecting the model with random state 0.
The GitHub link for this tutorial is as below: