• rl-Convergence and divergence

    rl-Convergence and divergenceConvergence Questions Convergence of MC Convergence of TD Theorem: TD is not a gradient Example of divergence use TD only on this transition Residual Bellman up...
  • RL - step5 - RLstep5

    Function approximation in reinforcement learning (And deep reinforcement learning) Value Function Approximation Agent state update Classes of Function Approximation (Deep) neural nets often perf...
  • RL - step4 - RLstep4

    Model-Free Prediction Monte Carlo Algorithms Bandits Bandits with States Value Function Approximation Agent state update Linear Function Approximation Feature Vectors Linear Value Function Ap...
  • RL - step3 - RLstep3

    Markov Decision Processes and Dynamic Programming Formalising the RL interface Markov Decision Process (MDP) a mathematical formulation of the agent-environment interaction the objective and how to...
  • RL - Intro - RLIntro

    see notes Title: Author: wy Created at : 2023-07-10 16:21:37 **Updated at :** 2023-07-10 16:24:07 **Link:** https://yuuee-www.gi...
  • Exploration and Exploitation

    Exploration and ExploitationSetting Learning agents need to trade off two things Exploitation: Maximise performance based on current knowledge Exploration: Increase knowledge The Multi-Armed Bandi...
  • Topics

    Topics OOD Image Processing RL-step1 RL-step2 RL-step3 RL-step4 RL-step5 RL-step6 RL-step7 RL-step8 RL-step9 RL-step10 RL-SAC env Miscellanies Entropy AUC (AUROC) AUC vs Acc T...
  • Image Processing Notes

    UCL-Year3 See notion. Title: Image Processing Notes Author: wy Created at : 2023-07-03 20:07:59 **Updated at :** 2023-07-07 10:52:01 ...
  • personal

    About meHi, I’m Yue Wang, a Computer Science student passionate about Machine Learning and Computer Vision. EducationUCL Master of Engineering with Honours - MEng (Hons) 2020 - 2024 Publicat...
  • Entropy

    Information theory (Shannon)- Information provides an answer to a question (eg. whether a coin will land heads or tails). - The information conveyed by a message x depends on its probability p(x)...
1234