`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation` #27

matte-esse · 2022-03-19T14:17:39Z

the following code generates an error in some of the most recent versions of py-torch:

Lines 146 to 159 in cbc0333

    
                   """ 
        
                   Update networks 
        
                   """ 
        
                   self.qf1_optimizer.zero_grad() 
        
                   qf1_loss.backward() 
        
                   self.qf1_optimizer.step() 
        
                   self.qf2_optimizer.zero_grad() 
        
                   qf2_loss.backward() 
        
                   self.qf2_optimizer.step() 
        
                   self.policy_optimizer.zero_grad() 
        
                   policy_loss.backward() 
        
                   self.policy_optimizer.step()

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

In order to solve it is necessary to move these lines

oac-explore/trainer/trainer.py

Lines 120 to 124 in cbc0333

    
           q_new_actions = torch.min( 
        
               self.qf1(obs, new_obs_actions), 
        
               self.qf2(obs, new_obs_actions), 
        
           ) 
        
           policy_loss = (alpha * log_pi - q_new_actions).mean()

between the q networks gradient steps and the steps on the policy network as so:

"""
Update networks
"""
self.qf1_optimizer.zero_grad()
qf1_loss.backward(retain_graph=True)
self.qf1_optimizer.step()

self.qf2_optimizer.zero_grad()
qf2_loss.backward(retain_graph=True)
self.qf2_optimizer.step()

q_new_actions = torch.min(
    self.qf1(obs, new_obs_actions),
    self.qf2(obs, new_obs_actions),
)
policy_loss = (alpha * log_pi - q_new_actions).mean()

self.policy_optimizer.zero_grad()
policy_loss.backward(retain_graph=True)
self.policy_optimizer.step()

Be aware that if you simply use an old version of pytorch to solve this problem the behaviour might not be what you expect since the policy_loss was computed based on a network which no longer exists

The text was updated successfully, but these errors were encountered:

quanvuong · 2022-03-21T19:15:24Z

Hi,

Thank you for your comment! I have indeed encountered a similar issue. I think another option is to move the policy update operation before the q update operation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation` #27

`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation` #27

matte-esse commented Mar 19, 2022 •

edited

Loading

quanvuong commented Mar 21, 2022

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation #27

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation #27

Comments

matte-esse commented Mar 19, 2022 • edited Loading

quanvuong commented Mar 21, 2022

`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation` #27

`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation` #27

matte-esse commented Mar 19, 2022 •

edited

Loading