An experiment with the multi-armed bandits problem applying Reinforcement Learning techniques.
This problem is a commonly problem used to play with it with reinforcement learning techniques. The problem consist in that you have one or many bandit machines with arms. Is analogous if you have X machines with an arm each one or one machine with X arms. In this problem, you can pull an arm and you will got a reward. Every arm give you a reward that follows any kind of distribution and you algorithm has to get the best cumulative reward.
See wikipedia multi-armed bandit for more details about the problem.
See wikipedia reinforcement learning for more details about reinforcement learning.
Here, you can execute the main file and study what this does, modify it or what do you want.
I will improve the code over time. This is the first version.