Code for reproducing results in the paper "Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games".
This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings. In the former, players observe their exact gradient vectors of the utility functions. In the latter, they only observe the noisy gradient vectors. Even the celebrated Multiplicative Weights Update (MWU) and Optimistic MWU (OMWU) algorithms may not converge to a Nash equilibrium with noisy feedback. On the contrary, M2WU exhibits the last-iterate convergence to a stationary point near a Nash equilibrium in both feedback settings. We then prove that it converges to an exact Nash equilibrium by iteratively adapting the mutation term. We empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates.
This code is written in Python 3. To install the required dependencies, execute the following command:
$ pip install -r requirements.txt
Build the container:
$ docker build -t m2wu .
After build finished, run the container:
$ docker run -it m2wu
In order to investigate the performance of M2WU in biased Rock-Paper-Scissors with full feedback, execute the following command:
$ python run_experiment.py --num_trials 10 --T 100000 --random_init_strategy --algorithm m2wu
$ python run_experiment.py --num_trials 10 --T 100000 --random_init_strategy --algorithm m2wu --update_freq 100
In this experiment, the following options can be specified:
--game
: Name of a matrix game. The default value isbiased_rps
.--algorithm
: Learning algorithm.--num_trials
: Number of trials to run experiments. The default value is1
.--T
: Number of iterations. The default value is10000
.--feedback
: Feedback type given to players. The default value isfull
.--seed
: Random seed. The default value is0
.--random_init_strategy
: Whether to generate the initial strategy uniformly at random. The default value isFalse
.--eta
: Learning rate. The default value is0.1
.--decay
: Whether to use decreasing learning rates. The default value isFalse
.--mu
: Mutation rate for M2WU. The default value is0.1
.--update_freq
: Update the reference strategies in M2WU every N iterations. The value ofNone
means that the reference strategies are not updated. The default value isNone
.
To evaluate M2WU via an experiment in biased Rock-Paper-Scissors with noisy feedback, execute the following command:
$ python run_experiment.py --num_trials 10 --T 1000000 --feedback noisy --algorithm m2wu --eta 0.001 --mu 0.1
$ python run_experiment.py --num_trials 10 --T 1000000 --feedback noisy --algorithm m2wu --eta 0.001 --mu 0.5 --update_freq 20000
If you use our code in your work, please cite our paper:
@article{abe2022m2wu,
title={Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games},
author={Abe, Kenshi and Ariu, Kaito and Sakamoto, Mitsuki and Toyoshima, Kentaro and Iwasaki, Atsushi},
journal={arXiv preprint arXiv:2208.09855},
year={2022}
}