Soft Actor-Critic (SAC)
Maximum Entropy Reinforcement learning
off policy
stochastic policy and not deterministic policy (Only one action is considered optimal in each state)
Codes:
- rail-berkeley/softlearning (tenserslow)
- [rail-berkeley/rlkit]https://github.com/rail-berkeley/rlkit (pytorch)
- vitchyr/rlkit
- openai/spinningup
- hill-a/stable-baselines
- Title:
- Author: wy
- Created at : 2023-07-23 20:16:31
- Updated at : 2023-07-24 17:09:37
- Link: https://yuuee-www.github.io/blog/2023/07/23/RL/step11/RLstep11/
- License: This work is licensed under CC BY-NC-SA 4.0.
Comments