Questions #5

ryhara · 2024-09-04T07:05:37Z

Thank you for publishing a great paper, code and dataset. I have a few questions.

I would like to know the type and number of GPUs you used and the execution environment such as OS.
Do you have the log file when you created best_model_state_dict.pth?
I would like to confirm that some parameters are different between the paper and the code,and that the loss behavior is sometimes strange.
Have there been any cases where the following nan occurs?

INFO train.py(156): epoch: 0, it: 1650/800000, loss_interpen: 0.11, loss_inter_shape: 0.24, loss_inter_transl: 26.03, loss_inter_j3d: 26.45, loss_global_orient: 46.3, loss_hand_pose: 12.34, loss_rj3d: 1.35, loss_j3d: 10.67, loss_shape: 9.12, loss_transl: 10.54, regularizer_loss: 0.0, loss_class_logits: 1.62, loss: 144.76, eta: 13 days, 1:54:30, time: 70.34
INFO train.py(156): epoch: 0, it: 1700/800000, loss_interpen: 0.03, loss_inter_shape: nan, loss_inter_transl: nan, loss_inter_j3d: nan, loss_global_orient: nan, loss_hand_pose: nan, loss_rj3d: nan, loss_j3d: nan, loss_shape: nan, loss_transl: nan, regularizer_loss: nan, loss_class_logits: nan, loss: nan, eta: 13 days, 1:36:20, time: 68.61
INFO train.py(156): epoch: 0, it: 1750/800000, loss_interpen: 0.0, loss_inter_shape: nan, loss_inter_transl: nan, loss_inter_j3d: nan, loss_global_orient: nan, loss_hand_pose: nan, loss_rj3d: nan, loss_j3d: nan, loss_shape: nan, loss_transl: nan, regularizer_loss: nan, loss_class_logits: nan, loss: nan, eta: 13 days, 1:03:36, time: 66.56
...
...

Using nn.DataParalell, loss.backward() took a lot of time when using multiple GPUs and training does not proceed, is this normal?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions #5

Questions #5

ryhara commented Sep 4, 2024 •

edited

Loading

Questions #5

Questions #5

Comments

ryhara commented Sep 4, 2024 • edited Loading

ryhara commented Sep 4, 2024 •

edited

Loading