Stochastic Multi-Armed Bandits - Implementation of the UCB algorithm for article suggestion to a class of users.
Adversarial Bandits and Experts - Implementation of the Multiplicative Weights algorithm to optimize our investments in an adversarial environment of stocks.
Markov Decision Processes & (Deep) Reinforcment Learning - Modelling a stock enironment withs MDPs. Developing agents: (i) Policy Iteration (model-based), (ii) Q-Learning (model-free), (iii) Deep-Q Learning (large scale MDP or continuous environment)