You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 13, 2024. It is now read-only.
Be aware that if you simply use an old version of pytorch to solve this problem the behaviour might not be what you expect since the policy_loss was computed based on a network which no longer exists
The text was updated successfully, but these errors were encountered:
Thank you for your comment! I have indeed encountered a similar issue. I think another option is to move the policy update operation before the q update operation.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
the following code generates an error in some of the most recent versions of
py-torch
:oac-explore/trainer/trainer.py
Lines 146 to 159 in cbc0333
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
In order to solve it is necessary to move these lines
oac-explore/trainer/trainer.py
Lines 120 to 124 in cbc0333
between the q networks gradient steps and the steps on the policy network as so:
Be aware that if you simply use an old version of pytorch to solve this problem the behaviour might not be what you expect since the
policy_loss
was computed based on a network which no longer existsThe text was updated successfully, but these errors were encountered: