Saturday, April 17, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Data Science

Reimagining Reinforcement Learning – Upside Down

December 25, 2019
in Data Science
Reimagining Reinforcement Learning – Upside Down
588
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

Summary:  For all the hype around winning game play and self-driving cars, traditional Reinforcement Learning (RL) has yet to deliver as a reliable tool for ML applications.  Here we explore the main drawbacks as well as an innovative approach to RL that dramatically reduces the training compute requirement and time to train.

 

You might also like

DSC Weekly Digest 12 April 2021

6 Limitations of Desktop System That QuickBooks Hosting Helps Overcome

Robust Artificial Intelligence of Document Attestation to Ensure Identity Theft

Ever since Reinforcement Learning (RL) was recognized as a legitimate third style of machine learning alongside supervised and unsupervised learning we’ve been waiting for that killer app to prove its value.

Yes RL has had some press-worthy wins in game play (Alpha Go), self-driving cars (not here yet), drone control, and even dialogue systems like personal assistants but the big breakthrough isn’t here yet.

RL ought to be our go-to solution for any problem requiring sequential decisions and these individual successes might make you think that RL is ready for prime time but the reality is that it’s not.

 

Shortcomings of Reinforcement Learning

Romain Laroche, a Principal Researcher in RL at Microsoft points out several critical shortcomings.  And while there are several, the most severe problems to be overcome Laroche points out are these:

  • “They are largely unreliable. Even worse, two runs with different random seeds can yield very different results because of the stochasticity in the reinforcement learning process.”
  • “They require billions of samples to obtain their results and extracting such astronomical numbers of samples in real world applications isn’t feasible.”

In fact, if you read our last blog closely about the barriers to continuously improving AI, you would have seen that the increasing compute power necessary to improve the most advanced algorithms is rapidly approaching the point of becoming uneconomic.  And, that the most compute hungry among the examples tracked by OpenAI is AlphaGoZero, an RL game play algorithm requiring orders of magnitude more compute than the next closest deep learning application.

While Laroche’s research has lately focused on the reliability problem and he’s making some headway, if we don’t solve the compute requirement problem RL can’t take its rightful place as an important ML tool.

 

Upside Down Reinforcement Learning (UDRL)

Two recent papers out of AI research organizations in Switzerland describe a unique and unexpected approach to this by literally turning the RL learning process upside down (Upside Down Reinforcement Learning UDRL).  Jürgen Schmidhuber and his colleagues say:

“Traditional Reinforcement Learning (RL) algorithms either predict rewards with value functions or maximize them using policy search. We study an alternative: Upside-Down Reinforcement Learning (Upside-Down RL or UDRL), that solves RL problems primarily using supervised learning techniques.”

 

In its traditional configuration using value functions or policy search the RL algorithm essentially conducts a completely random search of the state space to find an optimum solution.  The fact that it is in fact a random search accounts for the extremely large compute requirement for training.  The more sequential steps in the learning process, the greater the search and compute requirement.

The new upside down approach introduces gradient descent from supervised learning which promises to make training orders of magnitude more efficient.

 

How It Works

Using rewards as inputs, UDRL observes commands as a combination of desired rewards and time horizons.  For example “get so much reward within so much time” and then “get even more reward within even less time”.

As in traditional RL UDRL learns by simply interacting with its state space except that these unique commands now create learning based on gradient descent using these self-generated commands.

In short this means training occurs against trials that were previously considered successful (gradient descent) as opposed to completely random exploration.

On complex problems with many sequential steps UDRL proved both more accurate and most important much quicker to train than traditional RL (see steepness of green line on the left in the chart from the paper below).

 

This does leave open the exploration/exploitation question since gradient descent techniques can hang on local optima, underfitting or overfitting.

 

A Main Application – Learn by Imitation

One of the most interesting applications of UDRL that speaks directly to reduced training time and compute is its ability to be used in train-by-example, or train-by-imitation techniques in robotics.

For example, a robotic arm could be manipulated by a human through the steps of a complex operation such as the assembly of an entire electronic device.  The process would be repeated several times but each would be regarded as a successful example. 

A video of the process would be divided into separate frames for training and as input to an RNN model resulting in supervised learning which the robot must learn to imitate.  The task will be the command in the form of a reward which the robot will map with an action.

While this is new research yet to be proven in commercial application, the reduction in required compute and time to train goes a long way to resolving one of RLs biggest drawbacks.

 

Other articles by Bill Vorhies

 

About the author:  Bill is Contributing Editor for Data Science Central.  Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001.  His articles have been read more than 2 million times.

He can be reached at:

[email protected] or [email protected]


Credit: Data Science Central By: William Vorhies

Previous Post

Machine Learning Algorithms Help Couples Conceive

Next Post

Twitter bans animated PNG files from tweets because of epilepsy safety concerns

Related Posts

DSC Weekly Digest 01 March 2021
Data Science

DSC Weekly Digest 12 April 2021

April 14, 2021
6 Limitations of Desktop System That QuickBooks Hosting Helps Overcome
Data Science

6 Limitations of Desktop System That QuickBooks Hosting Helps Overcome

April 13, 2021
Robust Artificial Intelligence of Document Attestation to Ensure Identity Theft
Data Science

Robust Artificial Intelligence of Document Attestation to Ensure Identity Theft

April 13, 2021
Trends in custom software development in 2021
Data Science

Trends in custom software development in 2021

April 13, 2021
Epoch and Map of the Energy Transition through the Consensus Validator
Data Science

Epoch and Map of the Energy Transition through the Consensus Validator

April 13, 2021
Next Post
Twitter bans animated PNG files from tweets because of epilepsy safety concerns

Twitter bans animated PNG files from tweets because of epilepsy safety concerns

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Teslafan, a Blockchain-Powered Machine Learning Technology Project, Receives Investment Prior to the ICO
Machine Learning

Teslafan, a Blockchain-Powered Machine Learning Technology Project, Receives Investment Prior to the ICO

April 17, 2021
The “Blue Brain” Project-A mission to build a simulated Brain | by The A.I. Thing | Mar, 2021
Neural Networks

The “Blue Brain” Project-A mission to build a simulated Brain | by The A.I. Thing | Mar, 2021

April 17, 2021
A new collective to fight adtech fraud: Friday’s daily brief
Digital Marketing

A new collective to fight adtech fraud: Friday’s daily brief

April 17, 2021
Cyberattack on UK university knocks out online learning, Teams and Zoom
Internet Security

Cyberattack on UK university knocks out online learning, Teams and Zoom

April 17, 2021
SBI Sumishin Net Bank partners with DLT Labs on supply chain financing network
Blockchain

SBI Sumishin Net Bank partners with DLT Labs on supply chain financing network

April 16, 2021
Machine learning approach identifies more than 400 genes tied to schizophrenia
Machine Learning

Machine learning models may predict criminal offenses related to psychiatric disorders

April 16, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Teslafan, a Blockchain-Powered Machine Learning Technology Project, Receives Investment Prior to the ICO April 17, 2021
  • The “Blue Brain” Project-A mission to build a simulated Brain | by The A.I. Thing | Mar, 2021 April 17, 2021
  • A new collective to fight adtech fraud: Friday’s daily brief April 17, 2021
  • Cyberattack on UK university knocks out online learning, Teams and Zoom April 17, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates