Skip to content

semiotic-ai/rl-rewards

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RL Rewards

Note

This is a dramatically oversimplified explanation of RL. Please do not take this as accurate.

In reinforcement learning, agents learn to optimize for some expected return over the course of a game. For example, let's say the agent is learning to go from point A to point B using the best possible path. Path A could have rewards of [0, 0, 0, 10], and path B could have rewards of [1, 1, 1, 10]. The sum over each vector gives us our return for each path. Path A has a return of 10 and path B has a return of 13, making path B a better path that the RL agent should learn.

Identifying which reward function to use for a given problem is extremely difficult in RL. See for instance this old OpenAI blog post. We often want to experiment with which reward function, or combination of reward functions, works best for a given problem.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages