Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions #5

Open
ryhara opened this issue Sep 4, 2024 · 0 comments
Open

Questions #5

ryhara opened this issue Sep 4, 2024 · 0 comments

Comments

@ryhara
Copy link

ryhara commented Sep 4, 2024

Hi, @Chris10M

Thank you for publishing a great paper, code and dataset. I have a few questions.

  1. I would like to know the type and number of GPUs you used and the execution environment such as OS.

  2. Do you have the log file when you created best_model_state_dict.pth?
    I would like to confirm that some parameters are different between the paper and the code,and that the loss behavior is sometimes strange.

  3. Have there been any cases where the following nan occurs?

INFO train.py(156): epoch: 0, it: 1650/800000, loss_interpen: 0.11, loss_inter_shape: 0.24, loss_inter_transl: 26.03, loss_inter_j3d: 26.45, loss_global_orient: 46.3, loss_hand_pose: 12.34, loss_rj3d: 1.35, loss_j3d: 10.67, loss_shape: 9.12, loss_transl: 10.54, regularizer_loss: 0.0, loss_class_logits: 1.62, loss: 144.76, eta: 13 days, 1:54:30, time: 70.34
INFO train.py(156): epoch: 0, it: 1700/800000, loss_interpen: 0.03, loss_inter_shape: nan, loss_inter_transl: nan, loss_inter_j3d: nan, loss_global_orient: nan, loss_hand_pose: nan, loss_rj3d: nan, loss_j3d: nan, loss_shape: nan, loss_transl: nan, regularizer_loss: nan, loss_class_logits: nan, loss: nan, eta: 13 days, 1:36:20, time: 68.61
INFO train.py(156): epoch: 0, it: 1750/800000, loss_interpen: 0.0, loss_inter_shape: nan, loss_inter_transl: nan, loss_inter_j3d: nan, loss_global_orient: nan, loss_hand_pose: nan, loss_rj3d: nan, loss_j3d: nan, loss_shape: nan, loss_transl: nan, regularizer_loss: nan, loss_class_logits: nan, loss: nan, eta: 13 days, 1:03:36, time: 66.56
...
...
  1. Using nn.DataParalell, loss.backward() took a lot of time when using multiple GPUs and training does not proceed, is this normal?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant