Deep Deterministic Policy Gradients (DDPG) for learning the thermal control policy.
Deep Deterministic Policy Gradient:
https://spinningup.openai.com/en/latest/algorithms/ddpg.html
Research Paper:
https://arxiv.org/abs/1901.04693
PyTorch based Library:
https://github.com/vitchyr/rlkit