Using a Jump Server for Internal Network AccessIn scenarios where direct access to internal network resources is restricted due to security policies, a jump server (also known as a bastion host) ac...
Learning Notebook on Hugging Face and TOFUThis notebook is used to explore the functionalities of Hugging Face, a leading platform for natural language processing (NLP), and TOFU, a framework focus...
Learning Notes on Optimal Transport1. IntroductionOT allows to definemeaningful distancesbetween point clouds (ordatasets), hence isapplicable in most ML settings.
2. Mathematical Formulation2.1 Mo...
Hexo Blog Setup Issues SummaryIssue 1: Web Page Fails to Render After DeploymentDescriptionAfter deploying the Hexo blog to a hosting platform such as GitHub Pages, the web page may not render prop...
envTensorFlow - GPU - dockerdocker
https://docs.docker.com/engine/install/ubuntu/#set-up-the-repository
tensorflow gpu
https://www.tensorflow.org/install/docker?hl=zh-cn
https://www.tensorflow.org/...
Soft Actor-Critic (SAC)Maximum Entropy Reinforcement learning
off policy
stochastic policy and not deterministic policy (Only one action is considered optimal in each state)
Codes:
rail-berkeley...
Deep RLRecap: Value function approximation
Deep value function approximation
JAX
Deep Q-learning
Deep Q-learning in JAX
General Value FunctionsThe reward hypothesis (Sutton and Barto 2018)...
Off-policy and multi-step learningOne-step off-policy
Multi-step off-policy
Off-policy corrections for policy gradients
Title:
Author: wy
Created at : 2023-07-23 18:50:21
...
Approximate Dynamic ProgrammingUnder the 2 sources of error (estimation + function approximation), what can we say about resulting estimates?
The Bellman Optimality Operator
The Bellman Expectatio...
Policy Gradients and Actor Critics
Model-based RL Value-based RL Policy-based RL
Policy-Based Reinforcement Learning
model-free reinforcement learning
previous
Now,
parametrise the policy directly...