
Model-Free Prediction
Monte Carlo Algorithms

Bandits

Bandits with States

Value Function Approximation

Agent state update

Linear Function Approximation
Feature Vectors

Linear Value Function Approximation

Table Lookup Features

Monte Carlo Algorithms
Bandits with States

Monte-Carlo Policy Evaluation

Temporal-Difference Learning
Temporal Difference Learning by Sampling Bellman Equations

Temporal difference learning

Dynamic Programming Backup | Monte-Carlo Backup | Temporal-Difference Backup

Bootstrapping and Sampling

Temporal difference learning


Bias/Variance Trade-Off

….
Batch Learning
….
Between MC and TD’:’ Multi-Step TD
…
Mixed Multi-Step Returns
…
Eligibility traces
….
Model-Free Control
Optimise the value function of an unknown MDP
Model-Free Policy Iteration Using Action-Value Function

Generalised Policy Iteration with Action-Value Function

Monte-Carlo Generalized Policy Iteration

Model-free control

Greedy in the Limit with Infinite Exploration (GLIE)


Temporal-Difference Learning For Control
Updating Action-Value Functions with SARSA


Off-policy TD and Q-learning
We discussed several dynamic programming algorithms

TD learning

On and Off-Policy Learning

Off-Policy Learning

Q-Learning Control Algorithm

….
Q-learning overestimation
…
- Title:
- Author: wy
- Created at : 2023-07-22 22:44:28
- Updated at : 2023-07-23 15:35:12
- Link: https://yuuee-www.github.io/blog/2023/07/22/RL/step4/RLstep4/
- License: This work is licensed under CC BY-NC-SA 4.0.