Markov Decision Processes and Dynamic Programming
Formalising the RL interface
Markov Decision Process (MDP)
a mathematical formulation of the agent-environment interaction
the objective and how to achieve it
a simplifying assumption
assume the environment is fully observable
Almost all RL problems can be formalised as MDPs, e.g.,
- Optimal control primarily deals with continuous MDPs
- Partially observable problems can be converted into MDPs
- Bandits are MDPs with one state
Joint Distributions
Alternative Definition
Markov Property The future is independent of the past given the present
Markov Property in a MDP
Formalising the objective
Returns
Discounted Return
Most Markov decision processes are discounted
Policies
Goal of an RL agent
Value Functions
value function & (state-)action values
Optimal Value Function
solution | state-value function | action-value function
Optimal Policy
Define a partial ordering over policies
Theorem (Optimal Policies)
Finding an Optimal Policy
Bellman Equations
Value Function
Action values | state-action values
Bellman Equations
The Bellman Optimality Equations
Problems in RL prediction vs control
Bellman Equation in Matrix Form
There are iterative methods for larger problems
- Dynamic programming
- Monte-Carlo evaluation
- Temporal-Difference learning
Solving the Bellman Optimality Equation
The Bellman optimality equation is non-linear
Cannot use the same direct matrix solution as for policy optimisation (in general)
Many iterative solution methods
Using models / dynamic programming
Value iteration
Policy iteration
Using samples
Monte Carlo
Q-learning
Sarsa
Dynamic Programming
Dynamic programming refers to a collection of algorithms that can be used
to compute optimal policies given a perfect model of the environment as a
Markov decision process (MDP). — Sutton & Barto 2018
dynamic programming methods:
solve MDPs
two important parts: policy evaluation and policy improvement
Policy evaluation:
- this algorithm always converges
Policy Improvement:
Policy Iteration
Value Iteration
…
Preliminaries → Functional Analysis
Normed Vector Spaces
Contraction Mapping
Fixed point
Banach Fixed Point Theorem
…- Title:
- Author: wy
- Created at : 2023-07-20 15:03:26
- Updated at : 2023-07-22 22:39:00
- Link: https://yuuee-www.github.io/blog/2023/07/20/RL/step3/RLstep3/
- License: This work is licensed under CC BY-NC-SA 4.0.