• Off-policy and multi-step learningOne-step off-policy Multi-step off-policy Off-policy corrections for policy gradients
  • Approximate Dynamic ProgrammingUnder the 2 sources of error (estimation + function approximation), what can we say about resu...
  • Policy Gradients and Actor Critics Model-based RL Value-based RL Policy-based RL Policy-Based Reinforcement Learning model-fr...
  • rl-Convergence and divergenceConvergence Questions Convergence of MC Convergence of TD Theorem: TD is not a gradien...
  • Function approximation in reinforcement learning (And deep reinforcement learning) Value Function Approximation Agent state...
  • Model-Free Prediction Monte Carlo Algorithms Bandits Bandits with States Value Function Approximation Agent state upd...
  • Markov Decision Processes and Dynamic Programming Formalising the RL interface Markov Decision Process (MDP) a mathematical f...
  • see notes
  • Exploration and ExploitationSetting Learning agents need to trade off two things Exploitation: Maximise performance based on...
  • Topics OOD Image Processing RL-step1 RL-step2 RL-step3 RL-step4 RL-step5 RL-step6 RL-step7 RL-step8 RL-step9 RL-step10 RL...
123