rl-Convergence and divergenceConvergence Questions
Convergence of MC
Convergence of TD
Theorem:
TD is not a gradient
Example of divergence
use TD only on this transition
Residual Bellman up...
Function approximation in reinforcement learning (And deep reinforcement learning)
Value Function Approximation
Agent state update
Classes of Function Approximation
(Deep) neural nets often perf...
Model-Free Prediction
Monte Carlo Algorithms
Bandits
Bandits with States
Value Function Approximation
Agent state update
Linear Function Approximation
Feature Vectors
Linear Value Function Ap...
Markov Decision Processes and Dynamic Programming
Formalising the RL interface
Markov Decision Process (MDP)
a mathematical formulation of the agent-environment interaction
the objective and how to...
see notes
Title:
Author: wy
Created at : 2023-07-10 16:21:37
**Updated at
:** 2023-07-10 16:24:07
**Link:** https://yuuee-www.gi...
Exploration and ExploitationSetting
Learning agents need to trade off two things
Exploitation: Maximise performance based on current knowledge
Exploration: Increase knowledge
The Multi-Armed Bandi...
Topics
OOD
Image Processing
RL-step1
RL-step2
RL-step3
RL-step4
RL-step5
RL-step6
RL-step7
RL-step8
RL-step9
RL-step10
RL-SAC
env
Miscellanies
Entropy
AUC (AUROC)
AUC vs Acc
T...
UCL-Year3
See notion.
Title: Image Processing Notes
Author: wy
Created at : 2023-07-03 20:07:59
**Updated at
:** 2023-07-07 10:52:01
...
About meHi, I’m Yue Wang, a Computer Science student passionate about Machine Learning and Computer Vision.
EducationUCL
Master of Engineering with Honours - MEng (Hons)
2020 - 2024
Publicat...
Information theory (Shannon)-
Information provides an answer to a question (eg. whether a coin will land heads or tails).
-
The information conveyed by a message x depends on its probability p(x)...