Replies: 1 comment 1 reply
-
So turns out sort of a dumb question. It works using the same hardware. However, it would be really useful to be able to render or use a policy on different hardware. I have changed the sharding file from cuda to cpu but i still get the prng key error. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So i'm having issues with rendering the output episode after training on gpu clusters and therefore im trying to separate the steps. Furthermore, I would like to load a previously trained checkpoint and use it to continue learning or deploy it in a real-time environment.
However, I fail to do all 3 of the above since I can't seem to load and then rebuild the policy correctly between files.
Perhaps a notebook doing just that would be great. The locomotion notebook has this loading of a checkpoint but i can't seem to reproduce it between files.
Perhaps the issue is also related to me trying to restore on a different device from what was used to train on?
One simple example i tried to use for restoring looks like this:
However, i keep getting errors related to the reset function in the ReachbotGetup function and the PRNG keys (The reset function is exactly the same as in the getup task for the go1. The ReachbotGetup is also the same as the Go1Getup with some values and reward functions changed):
Traceback (most recent call last):
Also, it appears that there is a provision in the train function to directly return the policy if the timesteps=0, which hints at the use case of just trying to restore a policy, but I can't figure out how to do it. Help would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions