Model-Free Prediction
Monte Carlo Algorithms
Bandits
Bandits with States
Value Function Approximation
Agent state update
Linear Function Approximation
Feature Vectors
Linear Value Function Approximation
Table Lookup Features
Monte Carlo Algorithms
Bandits with States
Monte-Carlo Policy Evaluation
Temporal-Difference Learning
Temporal Difference Learning by Sampling Bellman Equations
Temporal difference learning
Dynamic Programming Backup | Monte-Carlo Backup | Temporal-Difference Backup
Bootstrapping and Sampling
Temporal difference learning
Bias/Variance Trade-Off
….
Batch Learning
….
Between MC and TD’:’ Multi-Step TD
…
Mixed Multi-Step Returns
…
Eligibility traces
….
Model-Free Control
Optimise the value function of an unknown MDP
Model-Free Policy Iteration Using Action-Value Function
Generalised Policy Iteration with Action-Value Function
Monte-Carlo Generalized Policy Iteration
Model-free control
Greedy in the Limit with Infinite Exploration (GLIE)
Temporal-Difference Learning For Control
Updating Action-Value Functions with SARSA
Off-policy TD and Q-learning
We discussed several dynamic programming algorithms
TD learning
On and Off-Policy Learning
Off-Policy Learning
Q-Learning Control Algorithm
….
Q-learning overestimation
…
- Title:
- Author: wy
- Created at : 2023-07-22 22:44:28
- Updated at : 2023-07-23 15:35:12
- Link: https://yuuee-www.github.io/blog/2023/07/22/RL/step4/RLstep4/
- License: This work is licensed under CC BY-NC-SA 4.0.