RL - step3 - RLstep3
Markov Decision Processes and Dynamic Programming
Formalising the RL interface
Markov Decision Process (MDP)
a mathematical formulation of the agent-environment interaction
the objective and how to achieve it
a simplifying assumption
assume the environment is fully observable
Almost all RL problems can be formalised as MDPs, e.g.,
Optimal control primarily deals with continuous MDPs
Partially observable problems can be converted into MDPs
Bandits are MDPs with one state

Joint Distributions

Alternative Definition

Markov Property The future is independent of the past given the present

Markov Property in a MDP

Formalising the objective
Returns

Discounted Return

Most Markov decision processes are discounted

Policies
Goal of an RL agent


Value Functions
value function & (state-)action values

Optimal Value Function
solution | state-value function | action-value function

Optimal Policy
Define a partial ordering over policies

Theorem (Optimal Policies)

Finding an Optimal Policy

Bellman Equations
Value Function

Action values | state-action values

Bellman Equations

The Bellman Optimality Equations


Problems in RL prediction vs control

Bellman Equation in Matrix Form

There are iterative methods for larger problems
Dynamic programming
Monte-Carlo evaluation
Temporal-Difference learning
Solving the Bellman Optimality Equation
The Bellman optimality equation is non-linear
Cannot use the same direct matrix solution as for policy optimisation (in general)
Many iterative solution methods
Using models / dynamic programming
Value iteration
Policy iteration
Using samples
Monte Carlo
Q-learning
Sarsa
Dynamic Programming
Dynamic programming refers to a collection of algorithms that can be used
to compute optimal policies given a perfect model of the environment as a
Markov decision process (MDP). — Sutton & Barto 2018
dynamic programming methods:
solve MDPs
two important parts: policy evaluation and policy improvement
Policy evaluation:

- this algorithm always converges
Policy Improvement:

Policy Iteration
Value Iteration


…
Preliminaries → Functional Analysis
Normed Vector Spaces

Contraction Mapping

Fixed point

Banach Fixed Point Theorem

…
Title:
Author: wy
Created at
: 2023-07-20 15:03:26**Updated at :** 2023-07-22 22:39:00**Link:** https://yuuee-www.github.io/blog/2023/07/20/RL/step3/RLstep3/** License: ** This work is licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0). [Prev posts](/2023/07/22/RL/step4/RLstep4/) [Next posts](/2023/07/10/RL/Intro/RLIntro/) Comments On this page
Preliminaries → Functional Analysis
© 2022 - 2024 [wy](/) 24 posts in total VISITOR COUNT TOTAL PAGE VIEWS POWERED BY [Hexo](https://hexo.io) THEME [Redefine v2.6.4](https://github.com/EvanNotFound/hexo-theme-redefine) Blog up for days hrs Min Sec
-
-
-
-
-
-
-
- Title: RL - step3 - RLstep3
- Author: wy
- Created at : 2023-07-20 07:03:26
- Updated at : 2023-07-22 14:39:00
- Link: https://yue-ruby-w.site/2023/07/20/2023-07-20-RL-step3-RLstep3/
- License: This work is licensed under CC BY-NC-SA 4.0.