Monday, March 1, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Neural Networks

Overfitting vs Underfitting: The Guiding Philosophy of Machine Learning | by Iot Lab KIIT | Jan, 2021

February 20, 2021
in Neural Networks
Overfitting vs Underfitting: The Guiding Philosophy of Machine Learning | by Iot Lab KIIT | Jan, 2021
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter
Photo by Mike van Schoonderwalt from Pexels

Let’s face it, even before we were properly exposed to data science we had probably heard both of these terms: overfitting and underfitting. The reason these two terms shall be regarded as the guiding philosophy of machine learning is that every machine learning model in existence conforms to the trade-off between both of these, which in turn dictates their performance and therefore every machine learning algorithm seeks to create models that offer the best trade-off between them.

Whenever we model any data using machine learning, the end objective is that the trained model should be able to correctly predict the output label when a previously unseen set of inputs is provided to it. So how does the model achieve this? It does so by learning a decision mapping or simply a function from the set of inputs to the output label during the training process.

You might also like

How AI Can Be Used in Agriculture Sector for Higher Productivity? | by ANOLYTICS

Future Tech: Artificial Intelligence and the Singularity | by Jason Sherman | Feb, 2021

Tackling ethics in AI algorithms: the case of Salesforce | by Iflexion | Feb, 2021

Therefore, the validity and performance of our model can only be realized when it is evaluated using previously unseen data. So how can we say if a model will give good predictions for previously unseen data or not? It depends on the ‘generalizability’ of the model i.e. if the decision mapping learned by the model during the training remains valid for the previously unseen data as well so that it produces correct predictions for them, the model can be regarded as generalizable.

Big Data Jobs

As we would learn, both overfitting and underfitting are hindrances towards a model’s generalizability; a perfectly generalized model wouldn’t suffer from any overfitting or underfitting. Although in reality, it’s impossible to achieve a perfectly generalized model with no overfitting and underfitting. Instead, we rely on a trade-off between them where we strive to reduce both of them to a point where we achieve a ‘best fit’, the maximum possible generalizability for a model.

Before understanding overfitting and underfitting, we must understand what a model is. In the realm of statistics and data science,

A model can be understood as an abstract representation of the real world which is created only using the data that we are provided, which is otherwise called a ‘sample’.

As an analogy, if we want to make a generic model of a tangible physical classroom, then each physical aspect of the classroom such as the no. of benches, the no. of desks, the dimensions of the whiteboard, etc., is the information or the data associated with it which we can use to model it.

A model can also be thought of as a mathematical function that maps a set of inputs to an output. This set of inputs and the output are different ‘aspects’ of our model and through machine learning, we attempt to establish a relationship between the set of inputs and the output. As an example, given the number of benches and desks in a classroom, we can easily establish a relationship that will compute the number of students who can attend the class simultaneously.

1. Write Your First AI Project in 15 Minutes

2. Generating neural speech synthesis voice acting using xVASynth

3. Top 5 Artificial Intelligence (AI) Trends for 2021

4. Why You’re Using Spotify Wrong

The Notion of Overfit, Good fit, and Underfit [source]

So how does this notion extend to the idea of overfitting and underfitting? Let’s consider a scenario where a student needs a tailor-made school uniform. For this purpose, the tailor needs some information about the student’s physique first, so that the uniform will fit the student properly. But there’s a catch, although the tailor is extremely skillful and can take very accurate measurements of the student’s physique and tailor the dress perfectly as per that data, the type of fabric used by the tailor is found to have shrunk by some amount when washed. Due to this, there’s always a degree of uncertainty regarding how well the dress will fit the student since the amount of shrinkage can’t be predetermined.

So how can the tailor accommodate this uncertainty while tailoring the uniform so that it still fits the student? If the tailor decides to make the uniform very loosely fit as compared to the measurements, then even after shrinking the uniform will loosely fit or ‘underfit’ the student. If the tailor decides to make the uniform absolutely as per the measurements, then the uniform is bound to be tightly fit or ‘overfit’ the student after shrinking. So what’s the solution? The tailor shall leave only a judicious amount of margin for shrinkage while tailoring the uniform so that even after washing it offers a perfect fit or ‘best fit’ for the student.

Let us perform a simple experiment. To understand the notion of underfitting and overfitting, we will try to fit a few regression models to a set of data points. Our very first step will be to import a few Python libraries to enable us to fit the regression models and plot them:

Let us get ourselves some sample data points to fit the model upon:

Here’s how they appear in a scatterplot:

Scatter plot of 10 data points in a 2D plane

We will, first of all, fit a basic linear regression line to these data points and calculate the regression formula, represented by ‘a’ here, using the fit model’s coefficients and y-intercept. Thereafter, we plot the regression line along with the data points:

Upon executing the above piece of code we obtain the following result:

Linear Regression Model fitting the 10 data points

A basic linear regression line like the one shown above seems to model our data points just alright, but we can see it doesn’t do a very good job of capturing the overall trend; we obtain an r-squared score of 0.85. But can we do better? Let’s try to fit a polynomial regression line of degree 2 and plot it:

The above piece of code generates the following plot:

Polynomial Regression Model of degree 2 fitting the 10 data points

This time we obtain an r-squared score of 0.87. Although we don’t have a major improvement in the score, now it appears that upon increasing the degree of the polynomial, the r-squared score is increasing and the overall trend of the data is getting captured more accurately. Thus, one may argue that we should keep increasing the degree of the polynomial as it is improving the score. We’ll try to find out what happens then. Let’s try to fit a polynomial regression line of degree 9 and plot it:

Polynomial Regression Model of degree 9 fitting the 10 data points

Our model produces an r-squared score of 0.99 this time! That appears to be an astoundingly good regression model with such an impressive score. Or is it the case?

As we read earlier that the goodness of any model is determined by its generalizability. So to determine how generalized this model is, let’s add five additional observations to our synthetic dataset which supposedly belong to the same distribution as the original data sample. It is worth noting that the model hasn’t been trained on these data points. Here’s how the model behaves for the newly added data points:

Polynomial Regression Model of degree 9 being tested for 5 additional data points

So what just happened? We had obtained the best r-squared score for the polynomial regression line of degree 9 earlier, yet it failed to model any of these new points, and we have received a negative r-squared score this time, which indicates it is an extremely inefficient model. Thus, it can be concluded that it is not a generalized model; although it performs supremely in an attempt to model the training data, it is unable to model any new data point which it hasn’t been trained on. Thus, this model can be regarded as an overfitting model or a high variance model.

According to Wikipedia, overfitting refers to

“the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably.”

As evident in our experiment, the polynomial regression model with degree 9 conformed the training data to such an extent that it lost its capability to generalize or in other words, the model picked up the random fluctuations in the overall trend i.e. the noise in the data. Hence, the model was unable to predict the previously unseen points correctly as it hasn’t learned the general trend from the data but only picked up the noise. This particular situation is regarded as overfitting.

In this particular case, the model kept coming up with complex and even complex decision rules with the objective of modeling all the training data points perfectly. But in this process, it totally disregards the notion of generalizing, and hence, those decision rules fail to model the unseen data points.

Now one may come up with the intuition that complex decision rules always lead to overfitting and hence one shall stick to the simplest possible decision rules with the hope that it will help the model generalize very well. But is it the case? Let’s find out.

We will try to model the same five additional observations using the linear regression model this time and by logic, they should be mapped extremely well as the model should be more generalized this time. But instead, this happens:

Linear Regression Model being tested for 5 additional data points

We obtain an r-squared score of 0.65 this time and as evident, the new data points still haven’t been modeled correctly by the linear regression model, and instead of an improvement in the performance, it has degraded by some amount. Thus, this model can be regarded as an underfitting model or a highly biased model.

According to Wikipedia,

Underfitting occurs when a statistical model or machine learning algorithm cannot adequately capture the underlying structure of the data.

As evident in our experiment as well, linear regression was never able to model the data correctly, it didn’t fit the training data well and it also failed to model the unseen data.

Usually, the model is found to be underfitting when there’s not enough training data for the model to learn from or when the model itself is unable to capture the trend from the data due to its underlying nature.

So what is the solution then? The only possible solution to this dilemma is that we meet somewhere in between where the model neither overfits nor underfits and we have a model that has a “good fit”. If we try to model our five unseen data points using the polynomial regression model of degree 2, we obtain the following result:

Polynomial Regression Model of degree 2 being tested for 5 additional data points

We obtain an r-squared score of 0.90! This time the model is able to correctly predict and model the overall trend of the data, which is confirmed by the increase in the r-squared score after the addition of the five unseen data points.

The decision mapping learned by this particular model is generalized enough so that it can map the data points that it hasn’t been trained upon as well.

Bias-Variance Tradeoff Curve

Overfitting and underfitting are two governing forces that dictate every aspect of a machine learning model. Although there’s no silver bullet to evade them and directly achieve a good bias-variance tradeoff, we are continually evolving and adapting our machine learning techniques on the data-level as well as algorithmic-level so that we can develop models that will be less prone to overfitting and underfitting.

Credit: BecomingHuman By: Iot Lab KIIT

Previous Post

Take security to the Zero Trust Edge

Next Post

The 11 Best Machine Learning Courses on LinkedIn Learning to Consider

Related Posts

How AI Can Be Used in Agriculture Sector for Higher Productivity? | by ANOLYTICS
Neural Networks

How AI Can Be Used in Agriculture Sector for Higher Productivity? | by ANOLYTICS

February 27, 2021
Future Tech: Artificial Intelligence and the Singularity | by Jason Sherman | Feb, 2021
Neural Networks

Future Tech: Artificial Intelligence and the Singularity | by Jason Sherman | Feb, 2021

February 27, 2021
Tackling ethics in AI algorithms: the case of Salesforce | by Iflexion | Feb, 2021
Neural Networks

Tackling ethics in AI algorithms: the case of Salesforce | by Iflexion | Feb, 2021

February 27, 2021
Creative Destruction and Godlike Technology in the 21st Century | by Madhav Kunal
Neural Networks

Creative Destruction and Godlike Technology in the 21st Century | by Madhav Kunal

February 26, 2021
How 3D Cuboid Annotation Service is better than free Tool? | by ANOLYTICS
Neural Networks

How 3D Cuboid Annotation Service is better than free Tool? | by ANOLYTICS

February 26, 2021
Next Post
Onfido white papers explain power of AI and machine learning to defeat identity fraud

The 11 Best Machine Learning Courses on LinkedIn Learning to Consider

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Benefits of Data Integration – Data Science Central
Data Science

Benefits of Data Integration – Data Science Central

March 1, 2021
Machine learning could aid mental health diagnoses: Study – ETCIO.com
Machine Learning

Machine learning could aid mental health diagnoses: Study – ETCIO.com

March 1, 2021
The Bayesian vs frequentist approaches: implications for machine learning – Part two
Data Science

The Bayesian vs frequentist approaches: implications for machine learning – Part two

March 1, 2021
Google’s deep learning finds a critical path in AI chips
Machine Learning

Google’s deep learning finds a critical path in AI chips

March 1, 2021
9 Tips to Effectively Manage and Analyze Big Data in eLearning
Data Science

9 Tips to Effectively Manage and Analyze Big Data in eLearning

March 1, 2021
Machine Learning & Big Data Analytics Education Market 2021 Global Industry Size, Reviews, Segments, Revenue, and Forecast to 2027 – NeighborWebSJ
Machine Learning

Machine Learning & Big Data Analytics Education Market 2021 Global Industry Size, Reviews, Segments, Revenue, and Forecast to 2027 – NeighborWebSJ

March 1, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Benefits of Data Integration – Data Science Central March 1, 2021
  • Machine learning could aid mental health diagnoses: Study – ETCIO.com March 1, 2021
  • The Bayesian vs frequentist approaches: implications for machine learning – Part two March 1, 2021
  • Google’s deep learning finds a critical path in AI chips March 1, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates