RL - step3 - RLstep3

wy Lv3

Markov Decision Processes and Dynamic Programming

Formalising the RL interface

Markov Decision Process (MDP)

a mathematical formulation of the agent-environment interaction

the objective and how to achieve it

a simplifying assumption

assume the environment is fully observable

Almost all RL problems can be formalised as MDPs, e.g.,

  • Optimal control primarily deals with continuous MDPs

  • Partially observable problems can be converted into MDPs

  • Bandits are MDPs with one state

Joint Distributions

Alternative Definition

Markov Property The future is independent of the past given the present

Markov Property in a MDP


Formalising the objective

Returns

Discounted Return

Most Markov decision processes are discounted

Policies

Goal of an RL agent

Value Functions

value function & (state-)action values

Optimal Value Function

solution | state-value function | action-value function

Optimal Policy

Define a partial ordering over policies

Theorem (Optimal Policies)

Finding an Optimal Policy

Bellman Equations

Value Function

Action values | state-action values

Bellman Equations

The Bellman Optimality Equations


Problems in RL prediction vs control

Bellman Equation in Matrix Form

There are iterative methods for larger problems

  • Dynamic programming

  • Monte-Carlo evaluation

  • Temporal-Difference learning

Solving the Bellman Optimality Equation

The Bellman optimality equation is non-linear

Cannot use the same direct matrix solution as for policy optimisation (in general)

Many iterative solution methods

Using models / dynamic programming

Value iteration

Policy iteration

Using samples

Monte Carlo

Q-learning

Sarsa

Dynamic Programming

Dynamic programming refers to a collection of algorithms that can be used

to compute optimal policies given a perfect model of the environment as a

Markov decision process (MDP). — Sutton & Barto 2018

dynamic programming methods:

solve MDPs

two important parts: policy evaluation and policy improvement

Policy evaluation:

  • this algorithm always converges

Policy Improvement:

Policy Iteration

Value Iteration

Preliminaries → Functional Analysis

Normed Vector Spaces

Contraction Mapping

Fixed point

Banach Fixed Point Theorem


  • Title:

  • Author: wy

  • Created at
    :
    2023-07-20 15:03:26

  •           **Updated at
                  :** 2023-07-22 22:39:00
          
      
      
    
  •       **Link:** https://yuuee-www.github.io/blog/2023/07/20/RL/step3/RLstep3/
      
      
    
  •       **
              License:
          **
          
    
          
              This work is licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0).
          
      
    
    
          
      
    
      
    
      
    
      
          
              
                  
                      [Prev posts](/2023/07/22/RL/step4/RLstep4/)
                  
              
              
                  
                      [Next posts](/2023/07/10/RL/Intro/RLIntro/)
                  
              
          
      
    
      
          
              
    
    
      Comments
    
    
    
      
          
    
    
    
      
    
    
          
      
    
    
    
      
          
    
      On this page
    
  1. Returns

  2. Policies

  3. Value Functions

  4. Bellman Equations

  5. Solving the Bellman Optimality Equation

  6. Preliminaries → Functional Analysis

         ©
         
           2022
           -
         
         2024    [wy](/)
         
             
             
    
                 
                     24 posts in total
                 
                 
             
    
         
     
     
         
         
             
                 
                     VISITOR COUNT
                     
                 
             
             
                 
                     TOTAL PAGE VIEWS
                     
                 
             
         
     
     
         POWERED BY [Hexo](https://hexo.io)
         THEME [Redefine v2.6.4](https://github.com/EvanNotFound/hexo-theme-redefine)
     
     
     
         
             Blog up for  days  hrs  Min  Sec
    

-

-

-

-

-

-

-

  • Title: RL - step3 - RLstep3
  • Author: wy
  • Created at : 2023-07-20 07:03:26
  • Updated at : 2023-07-22 14:39:00
  • Link: https://yue-ruby-w.site/2023/07/20/2023-07-20-RL-step3-RLstep3/
  • License: This work is licensed under CC BY-NC-SA 4.0.