RL - step4 - RLstep4

wy Lv3

Model-Free Prediction

Monte Carlo Algorithms

Bandits

Bandits with States

Value Function Approximation

Agent state update

Linear Function Approximation

Feature Vectors

Linear Value Function Approximation

Table Lookup Features

Monte Carlo Algorithms

Bandits with States

Monte-Carlo Policy Evaluation

Temporal-Difference Learning

Temporal Difference Learning by Sampling Bellman Equations

Temporal difference learning

Dynamic Programming Backup | Monte-Carlo Backup | Temporal-Difference Backup

Bootstrapping and Sampling

Temporal difference learning

Bias/Variance Trade-Off

….

Batch Learning

….

Between MC and TD’:’ Multi-Step TD

Mixed Multi-Step Returns

Eligibility traces

….


Model-Free Control

Optimise the value function of an unknown MDP

Model-Free Policy Iteration Using Action-Value Function

Generalised Policy Iteration with Action-Value Function

Monte-Carlo Generalized Policy Iteration

Model-free control

Greedy in the Limit with Infinite Exploration (GLIE)

Temporal-Difference Learning For Control

Updating Action-Value Functions with SARSA

Off-policy TD and Q-learning

We discussed several dynamic programming algorithms

TD learning

On and Off-Policy Learning

Off-Policy Learning

Q-Learning Control Algorithm

….

Q-learning overestimation

  • Title:

  • Author: wy

  • Created at
    :
    2023-07-22 22:44:28

  •           **Updated at
                  :** 2023-07-23 15:35:12
          
      
      
    
  •       **Link:** https://yuuee-www.github.io/blog/2023/07/22/RL/step4/RLstep4/
      
      
    
  •       **
              License:
          **
          
    
          
              This work is licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0).
          
      
    
    
          
      
    
      
    
      
    
      
          
              
                  
                      [Prev posts](/2023/07/23/RL/step5/RLstep5/)
                  
              
              
                  
                      [Next posts](/2023/07/20/RL/step3/RLstep3/)
                  
              
          
      
    
      
          
              
    
    
      Comments
    
    
    
      
          
    
    
    
      
    
    
          
      
    
    
    
      
          
    
      On this page
    
  1. Model-Free Control

  2. Temporal-Difference Learning For Control

  3. Off-policy TD and Q-learning

       ©
       
         2022
         -
       
       2024    [wy](/)
       
           
           
    
               
                   24 posts in total
               
               
           
    
       
    
    
       
       
           
               
                   VISITOR COUNT
                   
               
           
           
               
                   TOTAL PAGE VIEWS
                   
               
           
       
    
    
       POWERED BY [Hexo](https://hexo.io)
       THEME [Redefine v2.6.4](https://github.com/EvanNotFound/hexo-theme-redefine)
    
    
    
       
           Blog up for  days  hrs  Min  Sec
    

-

-

-

-

-

-

-

  • Title: RL - step4 - RLstep4
  • Author: wy
  • Created at : 2023-07-22 14:44:28
  • Updated at : 2023-07-23 07:35:12
  • Link: https://yue-ruby-w.site/2023/07/22/2023-07-22-RL-step4-RLstep4/
  • License: This work is licensed under CC BY-NC-SA 4.0.