Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about your gradient #126

Open
BobxmuMa opened this issue Sep 19, 2021 · 2 comments
Open

about your gradient #126

BobxmuMa opened this issue Sep 19, 2021 · 2 comments

Comments

@BobxmuMa
Copy link

首先,非常感谢您开源了您的XNOR-pytorch代码。其次,我注意到您在更新单精度权重时,对于权重的梯度乘了一些系数:
self.target_modules[index].grad.data = m.add(m_add).mul(1.0-1.0/s[1]).mul(n)
self.target_modules[index].grad.data = self.target_modules[index].grad.data.mul(1e+9)
关于这些系数,我没有在原文中找到相应的描述,想问一下您为什么对梯度进行了这样的变换。

@zhaoxiangshun
Copy link

我也想知道,楼主如果明白了,麻烦给讲解一下,谢谢

@jiecaoyu
Copy link
Owner

Hi @BobxmuMa @zhaoxiangshun , this parameter 1e+9 appears in the paper author's initial repo and, therefore, I also kept it. The main effect of this parameter is to increase the range of the weights and reduce the effect of weight decay. I suppose using a much smaller weight decay value will have the same effect. I also tested the accuracy with and without this parameter. In my tests, I saw a higher accuracy if using this parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants