hola - Ruby's Blog

rl-Convergence and divergence

rl-Convergence and divergenceConvergence Questions Convergence of MC Convergence of TD Theorem: TD is not a gradient Example of divergence use TD only on this transition Residual Bellman up...

2023-07-23

RL - step5 - RLstep5

Function approximation in reinforcement learning (And deep reinforcement learning) Value Function Approximation Agent state update Classes of Function Approximation (Deep) neural nets often perf...

2023-07-23

RL - step4 - RLstep4

Model-Free Prediction Monte Carlo Algorithms Bandits Bandits with States Value Function Approximation Agent state update Linear Function Approximation Feature Vectors Linear Value Function Ap...

2023-07-22

Markov Decision Processes and Dynamic Programming Formalising the RL interface Markov Decision Process (MDP) a mathematical formulation of the agent-environment interaction the objective and how to...

2023-07-20

RL - Intro - RLIntro

see notes

2023-07-10

Exploration and Exploitation

Exploration and ExploitationSetting Learning agents need to trade off two things Exploitation: Maximise performance based on current knowledge Exploration: Increase knowledge The Multi-Armed Bandi...

2023-07-10

Topics

Topics OOD Image Processing RL-step1 RL-step2 RL-step3 RL-step4 RL-step5 RL-step6 RL-step7 RL-step8 RL-step9 RL-step10 RL-SAC env Miscellanies Entropy AUC (AUROC) AUC vs Acc

2023-07-05

Image Processing Notes

UCL-Year3 See notion.

2023-07-03

personal

About meHi, I’m Yue Wang, a Computer Science student passionate about Machine Learning and Computer Vision. EducationUCL Master of Engineering with Honours - MEng (Hons) 2020 - 2024 Publicat...

2023-07-03

Entropy

Information theory (Shannon)- Information provides an answer to a question (eg. whether a coin will land heads or tails). - The information conveyed by a message x depends on its probability p(x)...

2023-06-21