Fixed hccl bug during multi-card training on ascend: sdma memory copy failed #5067
main.yml
on: pull_request
Rsync code
5m 54s
runs on nv step2
/
Build-dipu-cuda-latest-target
4m 23s
runs on nv step2
/
Test-dipu-cuda
21m 14s
runs on nv step2
/
Test-one-iter-cuda
15m 26s
Build-dipu-camb-pt210
7m 5s
runs on nv step2
/
Test-dipu-cuda-latest-target
6m 41s
runs on nv step2
/
Test-dipu-cuda-pt211
6m 30s
Test-dipu-camb-latest-target
11m 23s
Test-dipu-camb-pt211
12m 29s
Test-dipu-ascend-910b
34s
Test-one-iter-ascend-910b
32s
Test-dipu-ascend-latest-target-910b
40s
Test-dicp-on-dipu-ascend
6m 12s
Annotations
3 errors
Test-one-iter-ascend-910b
Process completed with exit code 1.
|
Test-one-iter-ascend-910b
Process completed with exit code 1.
|
Test-dipu-ascend-910b
Process completed with exit code 1.
|