Monday, March 8, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Data Science

Introduction to Various Reinforcement Learning Algorithms

September 13, 2019
in Data Science
Introduction to Various Reinforcement Learning Algorithms
588
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

This article was written by Steeve Huang.

 

You might also like

An Easy Way to Solve Complex Optimization Problems in Machine Learning

A Plethora of Machine Learning Articles: Part 2

The Effect IoT Has Had on Software Testing

Reinforcement Learning (RL) refers to a kind of Machine Learning method in which the agent receives a delayed reward in the next time step to evaluate its previous action. It was mostly used in games (e.g. Atari, Mario), with performance on par with or even exceeding humans. Recently, as the algorithm evolves with the combination of Neural Networks, it is capable of solving more complex tasks, such as the pendulum problem.

 

Although there are a great number of RL algorithms, there does not seem to be a comprehensive comparison between each of them. It gave me a hard time when deciding which algorithms to be applied to a specific task. This article aims to solve this problem by briefly discussing the RL setup, and providing an introduction for some of the well-known algorithms.

 

Reinforcement Learning 101

Typically, a RL setup is composed of two components, an agent and an environment.

Then environment refers to the object that the agent is acting on (e.g. the game itself in the Atari game), while the agent represents the RL algorithm. The environment starts by sending a state to the agent, which then based on its knowledge to take an action in response to that state. After that, the environment send a pair of next state and reward back to the agent. The agent will update its knowledge with the reward returned by the environment to evaluate its last action. The loop keeps going on until the environment sends a terminal state, which ends to episode.

Most of the RL algorithms follow this pattern. In the following paragraphs, I will briefly talk about some terms used in RL to facilitate our discussion in the next section.

Definition

  1. Action (A): All the possible moves that the agent can take
  2. State (S): Current situation returned by the environment.
  3. Reward (R): An immediate return send back from the environment to evaluate the last action.
  4. Policy (π): The strategy that the agent employs to determine next action based on the current state.
  5. Value (V): The expected long-term return with discount, as opposed to the short-term reward R. Vπ(s) is defined as the expected long-term return of the current state sunder policy π.
  6. Q-value or action-value (Q): Q-value is similar to Value, except that it takes an extra parameter, the current action a. Qπ(s, a) refers to the long-term return of the current state s, taking action a under policy π.

Model-free v.s. Model-based

The model stands for the simulation of the dynamics of the environment. That is, the model learns the transition probability T(s1|(s0, a)) from the pair of current state s0 and action a to the next state s1. If the transition probability is successfully learned, the agent will know how likely to enter a specific state given current state and action. However, model-based algorithms become impractical as the state space and action space grows (S * S * A, for a tabular setup).

On the other hand, model-free algorithms rely on trial-and-error to update its knowledge. As a result, it does not require space to store all the combination of states and actions. All the algorithms discussed in the next section fall into this category.

On-policy v.s. Off-policy

An on-policy agent learns the value based on its current action a derived from the current policy, whereas its off-policy counter part learns it based on the action a* obtained from another policy. In Q-learning, such policy is the greedy policy. (We will talk more on that in Q-learning and SARSA)

 

To read the whole article, with illustrations, click here. This article describes the following techniques:

  • Q-Learning
  • State-Action-Reward-State-Action (SARSA)
  • Deep Q Network (DQN)
  • Deep Deterministic Policy Gradient (DDPG)


Credit: Data Science Central By: Andrea Manero-Bastin

Previous Post

Employing AI to Enhance Returns on 5G Network Investments

Next Post

'Screaming car wreck' of internet routing needs a fire brigade: Geoff Huston

Related Posts

An Easy Way to Solve Complex Optimization Problems in Machine Learning
Data Science

An Easy Way to Solve Complex Optimization Problems in Machine Learning

March 8, 2021
A Plethora of Machine Learning Articles: Part 2
Data Science

A Plethora of Machine Learning Articles: Part 2

March 4, 2021
The Effect IoT Has Had on Software Testing
Data Science

The Effect IoT Has Had on Software Testing

March 3, 2021
Why Cloud Data Discovery Matters for Your Business
Data Science

Why Cloud Data Discovery Matters for Your Business

March 2, 2021
DSC Weekly Digest 01 March 2021
Data Science

DSC Weekly Digest 01 March 2021

March 2, 2021
Next Post
‘Screaming car wreck’ of internet routing needs a fire brigade: Geoff Huston

'Screaming car wreck' of internet routing needs a fire brigade: Geoff Huston

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Dataiku named as Gartner Leader for Data Science and Machine Learning
Machine Learning

Dataiku named as Gartner Leader for Data Science and Machine Learning

March 8, 2021
Bill establishing cyber abuse takedown scheme for adults enters Parliament
Internet Security

eSafety defends detail of Online Safety Bill as the ‘sausage that’s being made’

March 8, 2021
An Easy Way to Solve Complex Optimization Problems in Machine Learning
Data Science

An Easy Way to Solve Complex Optimization Problems in Machine Learning

March 8, 2021
Machine Learning Patentability In 2019: 5 Cases Analyzed And Lessons Learned Part 4 – Intellectual Property
Machine Learning

Podcast: Non-Binding Guidance: FDA Regulatory Developments In AI And Machine Learning – Food, Drugs, Healthcare, Life Sciences

March 8, 2021
Here’s an adorable factory game about machine learning and cats
Machine Learning

Here’s an adorable factory game about machine learning and cats

March 8, 2021
How Machine Learning Is Changing Influencer Marketing
Machine Learning

How Machine Learning Is Changing Influencer Marketing

March 8, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Dataiku named as Gartner Leader for Data Science and Machine Learning March 8, 2021
  • eSafety defends detail of Online Safety Bill as the ‘sausage that’s being made’ March 8, 2021
  • An Easy Way to Solve Complex Optimization Problems in Machine Learning March 8, 2021
  • Podcast: Non-Binding Guidance: FDA Regulatory Developments In AI And Machine Learning – Food, Drugs, Healthcare, Life Sciences March 8, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates