Work In Progress
Prioritised experience replay [1] persistent advantage learning [2] bootstrapped [3] dueling [4] double [5] deep Q-network [6] for the Arcade Learning Environment [7]. Or PERPALB(triple-D)QN for short...
Run th main.lua
to run headless, or qlua main.lua
to display the game. The main options are -game
to choose the ROM (see the ROM directory for more details) and -mode
as either train
or eval
. Can visualise saliency maps [8], optionally using guided [9] or "deconvnet" [10] backpropagation. Saliency map modes are applied at runtime so that they can be applied retrospectively to saved models.
To run experiments based on hyperparameters specified in the individual papers, use ./run.sh <paper> <game> <args>
. For more details see the script itself. By default the code trains on a demo environment called Catch - use ./run.sh demo -gpu 0
to run the demo with good default parameters. Note that main.lua
uses CUDA by default if available, but the Catch network is small enough that it runs faster on CPU.
In training mode if you want to quit using Ctrl+C
then this will be caught and you will be asked if you would like to save the agent. Note that this includes a copy the experience replay memory, so will total ~7GB. The main script also automatically saves the weights of the best performing DQN (according to the average validation score).
In evaluation mode you can create recordings with -record true
(requires FFmpeg); this does not require using qlua
. Recordings will be stored in the videos directory.
Requires Torch7, and uses CUDA if available. Also requires the following extra luarocks packages:
- luaposix
- moses
- logroll
- classic
- torchx
- dpnn
- nninit
- xitari
- alewrap
- rlenvs
xitari, alewrap and rlenvs can be installed using the following commands:
luarocks install https://raw.githubusercontent.com/Kaixhin/xitari/master/xitari-0-0.rockspec
luarocks install https://raw.githubusercontent.com/Kaixhin/alewrap/master/alewrap-0-0.rockspec
luarocks install https://raw.githubusercontent.com/Kaixhin/rlenvs/master/rocks/rlenvs-scm-1.rockspec
- Georg Ostrovski for confirmation on network usage in advantage operators + note on interaction with Double DQN.
[1] Prioritized Experience Replay
[2] Increasing the Action Gap: New Operators for Reinforcement Learning
[3] Deep Exploration via Bootstrapped DQN
[4] Dueling Network Architectures for Deep Reinforcement Learning
[5] Deep Reinforcement Learning with Double Q-learning
[6] Playing Atari with Deep Reinforcement Learning
[7] The Arcade Learning Environment: An Evaluation Platform for General Agents
[8] Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
[9] Striving for Simplicity: The All Convolutional Net
[10] Visualizing and Understanding Convolutional Networks