bandit.py
:K-臂老虎机td.py
:时序差分算法,包含单步 Sarsa,多步 Sarsa,Q-Learningdyna-q.py
:Dyna-Q 算法dqn.py
:DQN 算法及其两种进阶:Double DQN 与 Dueling DQNreinforce.py
:策略梯度算法actor_critic.py
:演员-评论员算法trpo.py
:TRPO 算法ppo.py
:PPO 算法
-
Notifications
You must be signed in to change notification settings - Fork 1
haukzero/rl-basic-learn
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
强化学习基础算法 [K-臂老虎机 | Sarsa | Q-Learning | Dyna-Q | DQN | REINFORCE | TRPO | PPO]
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published