Comparing various RLHF methods for instruction-tuning LLMs. Builds on top of HuggingFace TRL. You can find the project website here.