hola - Ruby's Blog

Access to Internal Server From Jump Server

Using a Jump Server for Internal Network AccessIn scenarios where direct access to internal network resources is restricted due to security policies, a jump server (also known as a bastion host) ac...

2024-07-13

summer notes2

Learning Notebook on Hugging Face and TOFUThis notebook is used to explore the functionalities of Hugging Face, a leading platform for natural language processing (NLP), and TOFU, a framework focus...

2024-07-13

summer notes

Learning Notes on Optimal Transport1. IntroductionOT allows to definemeaningful distancesbetween point clouds (ordatasets), hence isapplicable in most ML settings. 2. Mathematical Formulation2.1 Mo...

2024-07-12

hexo_blog_issues

Hexo Blog Setup Issues SummaryIssue 1: Web Page Fails to Render After DeploymentDescriptionAfter deploying the Hexo blog to a hosting platform such as GitHub Pages, the web page may not render prop...

2024-07-12

env

envTensorFlow - GPU - dockerdocker https://docs.docker.com/engine/install/ubuntu/#set-up-the-repository tensorflow gpu https://www.tensorflow.org/install/docker?hl=zh-cn https://www.tensorflow.org/...

2023-07-24

Soft Actor-Critic (SAC)

Soft Actor-Critic (SAC)Maximum Entropy Reinforcement learning off policy stochastic policy and not deterministic policy (Only one action is considered optimal in each state) Codes: rail-berkeley&#...

2023-07-23

Deep RL

Deep RLRecap: Value function approximation Deep value function approximation JAX Deep Q-learning Deep Q-learning in JAX General Value FunctionsThe reward hypothesis (Sutton and Barto 2018)...

2023-07-23

Off-policy and multi-step learning

Off-policy and multi-step learningOne-step off-policy Multi-step off-policy Off-policy corrections for policy gradients

2023-07-23

Approximate Dynamic Programming

Approximate Dynamic ProgrammingUnder the 2 sources of error (estimation + function approximation), what can we say about resulting estimates? The Bellman Optimality Operator The Bellman Expectatio...

2023-07-23

RL - step7 - RLstep7

Policy Gradients and Actor Critics Model-based RL Value-based RL Policy-based RL Policy-Based Reinforcement Learning model-free reinforcement learning previous Now, parametrise the policy directly...

2023-07-23