This repository contains a reproducibility and extension study for the Fairness for Cooperative Multi-Agent Learning with Equivariant Policies Reproducibility Research Paper. The goal was to analyse the paper's reproducibility while studying if the author's claims generalise to a reduced, simpler world, avoiding the requirement of long training times.
To clone the repository with the simple_particle_envs submodule:
git clone --recurse-submodules <link_to_repository>
Installing the Recommended Anaconda environment (Python 3.9.15)
conda env create -f environment_windows.yml
conda env create -f environment_linux.yml
To setup multi-agent environments:
cd simple_particle_envs
pip install -e .
To verify installation, run:
xvfb-run -a python baselines/ --mode test --render
To train a Fair-E model, run:
python --env simple_torus --algorithm ddpg_symmetric
To train a Fair-E model with equivariance and shared reward, run:
python --env simple_torus --algorithm ddpg_symmetric --equivariant --collaborative
To train a Fair-ER model, run:
python --env simple_torus --algorithm ddpg_speed_fair --lambda_coeff 0.5
- The control parameter of fairness can be adjusted in
To resume training from a checkpoint, run:
python --env simple_torus --algorithm ddpg_symmetric --checkpoint_path /path/to/model/checkpoints
To train with a varying number of evaders and pursuers we use the scenario:
python --env simple_torus --algorithm ddpg_symmetric --nb_agents 5 --nb_prey 1
It is important to set the same flags that you used for training, so if you have used the equivariant and collaborative flag, you should also set them when running the evaluation.
To collect trajectories from a trained model, run eval/ or eval/ Here are a few examples:
- Greedy pursuers against random-moving evader:
python eval/ --env simple_torus --pred_policy greedy --prey_policy random --seed 75
- CD-DDPG pursuers (Fair-E) against sophisticated evader:
python eval/ --env simple_torus --pred_policy ddpg --prey_policy cosine --seed 72 --checkpoint_path /path/to/model/checkpoints
* CD-DDPG pursuers (Fair-ER) against sophisticated evader:
python eval/ --env simple_torus --pred_policy ddpg --prey_policy cosine --seed 72 --checkpoint_path /path/to/model/checkpoints
To collect trajectories trained with a varying number of evaders and pursuers we use the simple_torus scenario again. For example, with a Fair-E model:
python eval/ --env simple_torus --pred_policy ddpg --prey_policy cosine --seed 72 --checkpoint_path /path/to/model/checkpoints --nb_agents 5 --nb_prey 1
To create the plots, run:
python eval/ --fp path/of/trajectories --plot (1-5)
for 4 agents:
python eval/ --fp path/of/trajectories --plot (1-3)