Reinforcement Learning



Table of Contents

Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.

Reinforcement learning differs from standard supervised learning in that correct input/output pairs need not be presented, and sub-optimal actions need not be explicitly corrected. Instead the focus is on performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).

Basic Reinforcement

Basic reinforcement is modeled as a Markov decision process:

  • a set of environment and agent states, \(S\);
  • a set of actions, \(A\), of the agent;
  • \(P_{a}(s,s')=Pr(s_{t+1}=s'|s_{t}=s,a_{t}=a)\) is the probability of transition from state \(s\) to state \(s'\) under action \(a\)
  • \( R_{a}(s,s')\) is the immediate reward after transition from \(s\) to \(s′\) with action \(a\)
  • rules that describe what the agent observes

Rules are often stochastic. The observation typically involves the scalar, immediate reward associated with the last transition. In many works, the agent is assumed to observe the current environmental state (full observability). If not, the agent has partial observability. Sometimes the set of actions available to the agent is restricted (a zero balance cannot be reduced).