Tuesday, January 19, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Data Science

Advanced cross-validation tips for time series

March 29, 2019
in Data Science
Advanced cross-validation tips for time series
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Credit: Data Science Central

This article was written by Datapred.

You might also like

Get Hired as a Data Scientist in 2021: Six Checkpoints

Machine Learning / Stats / BI: Mini Translation Dictionary

Advantages and Disadvantages of Automated Machine Learning

 

In a previous post, we explained the concept of cross-validation for time series, aka backtesting, and why proper backtests matter for time series modeling.

The goal here is to dig deeper and discuss a few coding tips that will help you cross-validate your predictive models correctly.

Introduction – The problem of future leakage 

The key to efficient time series modeling is not model sophistication, but avoiding “future leakage”: information that should be on the right side of the (moving) train/test partition, but that is leaking to the left side – thus corrupting model performance.

The problem is that future leakage, while easy to understand, is often hard to detect.

 

First life saver – Training window management

The first priority to avoid future leakage is to make sure your model stops training as soon as it catches up with the prediction target.

That is quite simple in principle, but remembering and coding it every time you build a machine learning solution for time series is cumbersome and risky. It is much better to automate it, if you can.

 

Second life saver – Feature shifting

Managing the prediction target correctly is not enough – you also need to handle your features correctly. With time series, this often involves lots of data shifting along the time axis.

The reason is that some features, while technically in the past, may include information about the future. For example a marketing plan disclosed last month may specify the company’s marketing spend for the next 12 months. In that case:

  • The feature “marketing plan”, while time-stamped at t – 30, actually informs us today about the next 12 months.
  • So you want to shift that feature to today, and use it to predict the next few months.

Such shifts complexify training window management considerably, thus increasing the danger of future leakage.

Coding them from scratch with Python or R is challenging, especially with multiple project contributors and/or for solutions that require hyper-parameter optimization. Ideally, you want to automate and recycle the shifts as much as possible.

If you can’t do that, the next section may save your life.

 

Third life saver – The zero test

Knowing how to avoid future leakage is great, but how can you quickly check that your code is safe?

For that, we use a simple and effective technique that we call the “zero test”. It consists in running your model twice: once with the regular target, and once where target values following a certain date are set to zero:

  • artificial_target = real_target.copy()
  • artificial_target[zero_date:] *= 0

The predictions of the resulting machine learning pipelines should be identical up to zero_date + [prediction horizon], and differ after that. If they start differing before that date, congratulations – you have detected future leakage.

 

To read the whole article, with examples and illustrations, click here.

 

DSC Resources

Follow us: Twitter | Facebook

 

 

 

 


Credit: Data Science Central By: Andrea Manero-Bastin

Previous Post

Researchers develop a new way to test machine learning algorithms that control self-driving cars

Next Post

Committee pushes 'cyber taskforce' for security of Australia's election system

Related Posts

Get Hired as a Data Scientist in 2021: Six Checkpoints
Data Science

Get Hired as a Data Scientist in 2021: Six Checkpoints

January 19, 2021
Machine Learning / Stats / BI: Mini Translation Dictionary
Data Science

Machine Learning / Stats / BI: Mini Translation Dictionary

January 19, 2021
Advantages and Disadvantages of Automated Machine Learning
Data Science

Advantages and Disadvantages of Automated Machine Learning

January 19, 2021
How to become a Digital Strategy Leader
Data Science

How to become a Digital Strategy Leader

January 19, 2021
The Importance and Benefits of Fintech Apps
Data Science

The Importance and Benefits of Fintech Apps

January 19, 2021
Next Post
Australian government computing network reset following security ‘incident’

Committee pushes 'cyber taskforce' for security of Australia's election system

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Ninety Percent of Large Pharma Companies Initiated Artificial Intelligence/Machine Learning Projects In 2020 | Business
Machine Learning

Ninety Percent of Large Pharma Companies Initiated Artificial Intelligence/Machine Learning Projects In 2020 | Business

January 19, 2021
Microsoft Defender is boosting its response to malware attacks by changing a key setting
Internet Security

Microsoft Defender is boosting its response to malware attacks by changing a key setting

January 19, 2021
New Educational Video Series for CISOs with Small Security Teams
Internet Privacy

New Educational Video Series for CISOs with Small Security Teams

January 19, 2021
Get Hired as a Data Scientist in 2021: Six Checkpoints
Data Science

Get Hired as a Data Scientist in 2021: Six Checkpoints

January 19, 2021
Project MEDAL to apply machine learning to aero innovation
Machine Learning

Project MEDAL to apply machine learning to aero innovation

January 19, 2021
Australia’s tangle of electronic surveillance laws needs unravelling
Internet Security

Australia’s tangle of electronic surveillance laws needs unravelling

January 19, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Ninety Percent of Large Pharma Companies Initiated Artificial Intelligence/Machine Learning Projects In 2020 | Business January 19, 2021
  • Microsoft Defender is boosting its response to malware attacks by changing a key setting January 19, 2021
  • New Educational Video Series for CISOs with Small Security Teams January 19, 2021
  • Get Hired as a Data Scientist in 2021: Six Checkpoints January 19, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates