Model-Free Prediction

Monte Carlo Algorithms

Bandits

Bandits with States

Value Function Approximation

Agent state update

Linear Function Approximation

Feature Vectors

Linear Value Function Approximation

Table Lookup Features

Monte Carlo Algorithms

Bandits with States

Monte-Carlo Policy Evaluation

Temporal-Difference Learning

Temporal Difference Learning by Sampling Bellman Equations

Temporal difference learning

Dynamic Programming Backup | Monte-Carlo Backup | Temporal-Difference Backup

Bootstrapping and Sampling

Temporal difference learning

Bias/Variance Trade-Off

….

Batch Learning

….

Between MC and TD’:’ Multi-Step TD

…

Mixed Multi-Step Returns

…

Eligibility traces

….

Model-Free Control

Optimise the value function of an unknown MDP

Model-Free Policy Iteration Using Action-Value Function

Generalised Policy Iteration with Action-Value Function

Monte-Carlo Generalized Policy Iteration

Model-free control

Greedy in the Limit with Infinite Exploration (GLIE)

Temporal-Difference Learning For Control

Updating Action-Value Functions with SARSA

Off-policy TD and Q-learning

We discussed several dynamic programming algorithms

TD learning

On and Off-Policy Learning

Off-Policy Learning

Q-Learning Control Algorithm

….

Q-learning overestimation

…

Title:
Author: wy
Created at
: 2023-07-22 22:44:28

          **Updated at
              :** 2023-07-23 15:35:12

      **Link:** https://yuuee-www.github.io/blog/2023/07/22/RL/step4/RLstep4/

      **
          License:
      **
      

      
          This work is licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0).
      
  


      
  

  

  

  
      
          
              
                  [Prev posts](/2023/07/23/RL/step5/RLstep5/)
              
          
          
              
                  [Next posts](/2023/07/20/RL/step3/RLstep3/)
              
          
      
  

  
      
          


  Comments



  
      



  


      
  



  
      

  On this page

Model-Free Control
Temporal-Difference Learning For Control

Off-policy TD and Q-learning

   ©
   
     2022
     -
   
   2024    [wy](/)
   
       
       

           
               24 posts in total
           
           
       

   


   
   
       
           
               VISITOR COUNT
               
           
       
       
           
               TOTAL PAGE VIEWS
               
           
       
   


   POWERED BY [Hexo](https://hexo.io)
   THEME [Redefine v2.6.4](https://github.com/EvanNotFound/hexo-theme-redefine)



   
       Blog up for  days  hrs  Min  Sec

hola

RL - step4 - RLstep4

Model-Free Control

Temporal-Difference Learning For Control

Off-policy TD and Q-learning