-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About training #121
Comments
Please provide more details about your training environment and training logs. |
The training speed on Ubuntu with 2 GPUs is lower than that on Ubuntu with 1 GPU. Here are some training logs as follows: [2024-12-31 19:32:24,400][ INFO] {'backbone': 'resnet50', [2024-12-31 19:32:25,021][ INFO] Total params: 40.5M user-Precision-7920-Tower:24437:24437 [0] NCCL INFO Bootstrap : Using enp0s31f6:192.168.207.78<0> [2024-12-31 20:12:46,550][ INFO] ***** Evaluation ***** >>>> Class [0 hard roofs] F1: 60.80 [2024-12-31 20:12:46,551][ INFO] ***** Evaluation original ***** >>>> Kappa: 55.87 [2024-12-31 20:12:46,551][ INFO] ***** Evaluation original ***** >>>> OA: 67.73 [2024-12-31 20:12:46,551][ INFO] ***** Evaluation ***** >>>> Class [0 hard roofs] UA: 69.69 |
Sorry for the late response. Are you using A100 for the training? |
I use 4090 for the training. |
I think the speed is within expectation. From our training log, it takes 1 minute for an A100 GPU to complete the first 43 iterations. In comparison, your first 84 iterations take 5 minutes. The iterations in each epoch are doubled and the GPU is switched from A100 to 4090, so I guess the 5x more time / epoch is normal. |
Why does it take more than a week to complete the experiment for unimatch training with one sixteenth of the data
The text was updated successfully, but these errors were encountered: