Monday, April 19, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Data Science

Model Selection: Adjusted Coefficient of Determination-Variance Tradeoff

July 13, 2020
in Data Science
Model Selection: Adjusted Coefficient of Determination-Variance Tradeoff
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

In my previous article, we analyzed the COVID-19 data of Turkey and selected the cubic model for predicting the spread of disease. In this article, we will show in detail why we selected the cubic model for prediction and see whether our decision was right or not.

When we analyze the regression trend models we should consider overfitting and underfitting situations; underfitting indicates high bias and low variance while overfitting indicates low bias and high variance.

You might also like

DSC Weekly Digest 12 April 2021

6 Limitations of Desktop System That QuickBooks Hosting Helps Overcome

Robust Artificial Intelligence of Document Attestation to Ensure Identity Theft

Bias is the difference between the expected value of fitted values and observed values:

The variance of fitted values is the expected value of squared deviation from the mean of fitted values:

E[(hat y-E(hat y))^2]

The adjusted coefficient of determination is used in the different degrees of polynomial trend regression models comparing. In the below formula p denotes the number of explanatory terms and n denotes the number of observations.

bar R^2=1-frac{SSE} {SST} (frac {n-1} {n-p-1})

SSE is the residual sum of squares:

sum (y-hat y)^2

SST is the total sum of squares:

sum (y-bar y)^2

When we examine the above formulas, we can notice the similarity between SSE and bias. We can easily say that if bias decreases, SSE will decrease and the adjusted coefficient of determination will increase. So we will use the adjusted R-squared instead of bias to balance with variance and find the optimal degree of the polynomial regression.

The dataset and data frame(tur) we are going to use can be found from the previous article. First of all, we will create all the polynomial regression models which we’re going to compare.

a <- 1:15 models <- list()  
for (i in seq_along(a)){
models[[i]] <- assign(paste0('model_', i),lm(new_cases ~ poly(index, i), data=tur))
}

The variances of fitted values of all the degrees of polynomial regression models:

variance <- c()  
for (i in seq_along(a)) {
variance[i] <- mean((models[[i]][["fitted.values"]]-mean(models[[i]][["fitted.values"]]))^2)
}

To create an adjusted R-squared object we first create a summary object of the trend regression models; because the adj.r.squared feature is calculated in summary function.

models_summ <- list() 
adj_R_squared <- c()
for (i in seq_along(a)) {
models_summ[[i]] <- summary(models[[i]])
adj_R_squared[i] <- (models_summ[[i]][["adj.r.squared"]])
}

Before analyzing numeric results of variance and adjusted R-squared we will show all the trend regression lines in separate plots for comparing models.

# Facetting models by degree for finding the best fit

library(ggplot2)
dat <- do.call(rbind, lapply(1:15, function(d) {
x <- tur$index
preds <- predict(lm(new_cases ~ poly(x, d), data=tur), newdata=data.frame(x=x))
data.frame(cbind(y=preds, x=x, degree=d)) }))

ggplot(dat, aes(x,y)) +
geom_point(data=tur, aes(index, new_cases)) +
geom_line(color="steelblue", lwd=1.1)+
facet_wrap(~ degree,nrow = 3)

When we examine the above plots, we should pay attention to the curved of the tails; because it indicates overfitting which shows extreme sensitivity to the observed data points. In light of this approach, the second and third-degree of models appear to be more convenient to the data.

Let’s examine the adjusted R-squared-variance tradeoff on the plot we created below.

library(gridExtra) 

plot_variance <- ggplot(df_tradeoff,aes(degree,variance))+
geom_line(size=2,colour="orange")+
scale_x_continuous(breaks = 1:15)+
theme_bw(base_line_size = 2)

plot_adj.R.squared <- ggplot(df_tradeoff,aes(degree,adj_R_squared))+
geom_line(size=2,colour="steelblue")+
labs(y="Adjusted R-Squared")+
scale_x_continuous(breaks = 1:15)+
theme_bw(base_line_size = 2)

grid.arrange(plot_variance,plot_adj.R.squared,ncol=1)

As we can see above, adjusted R-squared and variance have very similar trend lines. The more adjusted R-squared means the more complexity and low bias, but we have to take into account the variance; otherwise, we fall into the overfitting trap. So we need to look for low bias (or high adjusted R-squared) and low variance as much as possible for optimal selection.

When we examine the variance plot, we can see that there is not much difference between second and third-degree; but after the third degree, there seems to be a more steep increase; it’s some kind of breaking. This might lead to overfitting. Thus the most reasonable option seems to be the 3rd degree. This approach is not a scientific fact but could be used for an optimal solution.

Original article can be found here


Credit: Data Science Central By: Selcuk Disci

Previous Post

Fujitsu Develops World's First AI Technology to Accurately Capture Characteristics of High-Dimensional Data

Next Post

This botnet has surged back into action spreading a new ransomware campaign via phishing emails

Related Posts

DSC Weekly Digest 01 March 2021
Data Science

DSC Weekly Digest 12 April 2021

April 14, 2021
6 Limitations of Desktop System That QuickBooks Hosting Helps Overcome
Data Science

6 Limitations of Desktop System That QuickBooks Hosting Helps Overcome

April 13, 2021
Robust Artificial Intelligence of Document Attestation to Ensure Identity Theft
Data Science

Robust Artificial Intelligence of Document Attestation to Ensure Identity Theft

April 13, 2021
Trends in custom software development in 2021
Data Science

Trends in custom software development in 2021

April 13, 2021
Epoch and Map of the Energy Transition through the Consensus Validator
Data Science

Epoch and Map of the Energy Transition through the Consensus Validator

April 13, 2021
Next Post
This botnet has surged back into action spreading a new ransomware campaign via phishing emails

This botnet has surged back into action spreading a new ransomware campaign via phishing emails

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Machine Learning Helps Optimize Therapeutic Antibodies
Machine Learning

Machine Learning Helps Optimize Therapeutic Antibodies

April 18, 2021
Researchers at MIT DAI Lab Have Recently Built Cardea: A Machine Learning Framework That Turns Health Care Data Into Insights
Machine Learning

Researchers at MIT DAI Lab Have Recently Built Cardea: A Machine Learning Framework That Turns Health Care Data Into Insights

April 18, 2021
Automating Drug Discovery With Machine Learning
Machine Learning

Automating Drug Discovery With Machine Learning

April 18, 2021
Twitter aims to fight bias by examining its own machine learning algorithms
Machine Learning

Twitter aims to fight bias by examining its own machine learning algorithms

April 18, 2021
Make Machine Learning Interpretable with Shapash
Machine Learning

Make Machine Learning Interpretable with Shapash

April 18, 2021
Why the Patent Classification System Needs an Update
Machine Learning

Why the Patent Classification System Needs an Update

April 18, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Machine Learning Helps Optimize Therapeutic Antibodies April 18, 2021
  • Researchers at MIT DAI Lab Have Recently Built Cardea: A Machine Learning Framework That Turns Health Care Data Into Insights April 18, 2021
  • Automating Drug Discovery With Machine Learning April 18, 2021
  • Twitter aims to fight bias by examining its own machine learning algorithms April 18, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates