wy Lv3

Model-Free Prediction

Monte Carlo Algorithms

Bandits

Bandits with States

Value Function Approximation

Agent state update

Linear Function Approximation

Feature Vectors

Linear Value Function Approximation

Table Lookup Features

Monte Carlo Algorithms

Bandits with States

Monte-Carlo Policy Evaluation

Temporal-Difference Learning

Temporal Difference Learning by Sampling Bellman Equations

Temporal difference learning

Dynamic Programming Backup | Monte-Carlo Backup | Temporal-Difference Backup

Bootstrapping and Sampling

Temporal difference learning

Bias/Variance Trade-Off

….

Batch Learning

….

Between MC and TD’:’ Multi-Step TD

Mixed Multi-Step Returns

Eligibility traces

….


Model-Free Control

Optimise the value function of an unknown MDP

Model-Free Policy Iteration Using Action-Value Function

Generalised Policy Iteration with Action-Value Function

Monte-Carlo Generalized Policy Iteration

Model-free control

Greedy in the Limit with Infinite Exploration (GLIE)

Temporal-Difference Learning For Control

Updating Action-Value Functions with SARSA

Off-policy TD and Q-learning

We discussed several dynamic programming algorithms

TD learning

On and Off-Policy Learning

Off-Policy Learning

Q-Learning Control Algorithm

….

Q-learning overestimation

  • Title:
  • Author: wy
  • Created at : 2023-07-22 22:44:28
  • Updated at : 2023-07-23 15:35:12
  • Link: https://yuuee-www.github.io/blog/2023/07/22/RL/step4/RLstep4/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments