Added new Post-training an LLM using GRPO with TRL
recipe π§βπ³οΈ
#707
Loading
Post-training an LLM using GRPO with TRL
recipe π§βπ³οΈ
#707