Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Four tests fail #19

Open
drew-parsons opened this issue Jul 5, 2023 · 1 comment
Open

Four tests fail #19

drew-parsons opened this issue Jul 5, 2023 · 1 comment

Comments

@drew-parsons
Copy link
Contributor

Building the CombBLAS 2.0 release on Linux (Debian unstable) with OpenMPI 4.1.5, 4 out of 20 fail, running via ctest (/usr/bin/ctest --force-new-ctest-process --verbose -j8),

80% tests passed, 4 tests failed out of 20

Total Test time (real) = 970.11 sec

The following tests FAILED:
          8 - Indexing_Test (Failed)
          9 - SpAsgn_Test (Failed)
         15 - FBFS_Test (Failed)
         16 - FMIS_Test (Failed)

These are different to the failing tests in #15 , which seems to be driven by missing files.

The failing test output is,

Indexing_Test:

test 8
      Start  8: Indexing_Test

8: Test command: /usr/bin/mpiexec "-n" "4" "/projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest" "../TESTDATA" "B_100x100.txt" "B_10x30_Indexed.txt" "rand10outta100.txt" "rand30outta100.txt"
8: Working Directory: /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests
8: Test timeout computed to be: 1500
8: Indexing working correctly
8: Elements stored on proc 0: {(0,0.234), (1,0.829), (2,0.221), (3,0.454), (4,0.096), (5,0.399), (6,0.895), (7,0.156), (8,0.052), (9,0.709), (10,0.305), (11,0.669), (12,0.493), (13,0.619), (14,0.736), (15,0.615), (16,0.124), (17,0.831), (18,0.958), (19,0.284), (20,0.411), (21,0.473
8: Elements stored on proc 1: {(0,0.196), (1,0.571), (2,0.482), (3,0.09), (4,0.79), (5,0.939), (6,0.684), (7,0.465), (8,0.236), (9,0.713), (10,0.32), (11,0.748), (12,0.771), (13,0.123), (14,0.79), (15,0.06), (16,0.82), (17,0.506), (18,0.859), (19,0.268), (20,0.49), (21,0.01), (22,0
8: Elements stored on proc 2: {(0,0.159), (1,0.811), (2,0.198), (3,0.163), (4,0.779), (5,0.241), (6,0.623), (7,0.955), (8,0.258), (9,0.861), (10,0.104), (11,0.381), (12,0.657), (13,0.356), (14,0.083), (15,0.712), (16,0.413), (17,0.488), (18,0.646), (19,0.99), (20,0.523), (21,0.034)
8: Elements stored on proc 3: {(0,0.609), (1,0.557), (2,0.926), (3,0.481), (4,0.218), (5,0.92), (6,0.049), (7,0.052), (8,0.424), (9,0.214), (10,0.606), (11,0.385), (12,0.848), (13,0.583), (14,0.586), (15,0.615), (16,0.797), (17,0.48), (18,0.378), (19,0.66), (20,0.169), (21,0.258),
8: COMBBLAS Warning: It is dangerous to create (vector) objects without specifying the communicator, are you sure you want to create this object in MPI_COMM_WORLD?
8: COMBBLAS Warning: It is dangerous to create (vector) objects without specifying the communicator, are you sure you want to create this object in MPI_COMM_WORLD?
8: [sandy:2061479] *** An error occurred in MPI_Alltoallv
8: [sandy:2061479] *** reported by process [2022768641,1]
8: [sandy:2061479] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0
8: [sandy:2061479] *** MPI_ERR_TRUNCATE: message truncated
8: [sandy:2061479] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
8: [sandy:2061479] ***    and potentially your MPI job)
8: [sandy:2061523] *** Process received signal ***
8: [sandy:2061523] Signal: Segmentation fault (11)
8: [sandy:2061523] Signal code: Address not mapped (1)
8: [sandy:2061523] Failing at address: 0x78a63cc0
8: [sandy:2061523] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3c510)[0x7fa666c5a510]
8: [sandy:2061523] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x152449)[0x7fa666d70449]
8: [sandy:2061523] [ 2] /usr/lib/x86_64-linux-gnu/libopen-pal.so.40(opal_convertor_pack+0xaf)[0x7fa6670ae0df]
8: [sandy:2061523] [ 3] /usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_datatype_sndrcv+0x1fe)[0x7fa66729055e]
8: [sandy:2061523] [ 4] /usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_coll_base_alltoallv_intra_basic_linear+0x2bf)[0x7fa6672decaf]
8: [sandy:2061523] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so(ompi_coll_tuned_alltoallv_intra_dec_fixed+0x42)[0x7fa6649e6fa2]
8: [sandy:2061523] [ 6] /usr/lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Alltoallv+0x1b5)[0x7fa667293315]
8: [sandy:2061523] [ 7] /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest(+0x1b151)[0x562012e18151]
8: [sandy:2061523] [ 8] /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest(_ZN8combblas11SpParHelper13KeyValuePSortIdiiEESt6vectorISt4pairIT_T0_ESaIS6_EEPS6_T1_PSA_RKP19ompi_communicator_t+0x3bb)[0x562012e3ac7b]
8: [sandy:2061523] [ 9] /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest(_ZN8combblas14FullyDistSpVecIidE4sortEv+0x284)[0x562012e3af34]
8: [sandy:2061523] [10] /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest(_Z4TopKIidESt4pairIN8combblas12FullyDistVecIT_S3_EENS2_IS3_T0_EEERNS1_14FullyDistSpVecIS3_S5_EES3_+0x247)[0x562012e3b3a7]
8: [sandy:2061523] [11] /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest(main+0xa2c)[0x562012e12ebc]
8: [sandy:2061523] [12] /lib/x86_64-linux-gnu/libc.so.6(+0x276ca)[0x7fa666c456ca]
8: [sandy:2061523] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7fa666c45785]
8: [sandy:2061523] [14] /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest(_start+0x21)[0x562012e13981]
8: [sandy:2061523] *** End of error message ***
8: [sandy:2061354] 2 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
8: [sandy:2061354] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
 3/20 Test  #8: Indexing_Test ....................***Failed   20.56 sec

SpAsgn_Test:

test 9
      Start  9: SpAsgn_Test

9: Test command: /usr/bin/mpiexec "-n" "4" "/projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/SpAsgnTest" "../TESTDATA" "A_100x100.txt" "A_with20x30hole.txt" "dense_20x30matrix.txt" "A_wdenseblocks.txt" "20outta100.txt" "30outta100.txt"
9: Working Directory: /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests
9: Test timeout computed to be: 1500
9: Pruning is working
9: SpAsgn working correctly
9: COMBBLAS Warning: It is dangerous to create (vector) objects without specifying the communicator, are you sure you want to create this object in MPI_COMM_WORLD?
9: COMBBLAS Warning: It is dangerous to create (vector) objects without specifying the communicator, are you sure you want to create this object in MPI_COMM_WORLD?
9: [sandy:2061949] *** An error occurred in MPI_Alltoallv
9: [sandy:2061949] *** reported by process [2060517377,0]
9: [sandy:2061949] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0
9: [sandy:2061949] *** MPI_ERR_COUNT: invalid count argument
9: [sandy:2061949] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
9: [sandy:2061949] ***    and potentially your MPI job)
9: [sandy:2061930] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
9: [sandy:2061930] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
11/20 Test  #9: SpAsgn_Test ......................***Failed  155.20 sec

FBFS_Test:

test 15
      Start 15: FBFS_Test

15: Test command: /usr/bin/mpiexec "-n" "4" "/projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/Applications/fbfs" "Gen" "16"
15: Working Directory: /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/Applications
15: Test timeout computed to be: 1500
15: Using synthetic data, which we ALWAYS permute for load balance
15: We only balance the original input, we don't repermute after each filter change
15: BFS is run on UNDIRECTED graph, hence hitting CCs, and TEPS is bidirectional
15: Forcing scale to : 16
15: graph_generation:               1.415538 s
15: Generated renamed edge lists
15: Converted to Boolean and removed 149 loops
15: As a whole: 65536 rows and 65536 columns and 909896 nonzeros
15: I/O (or generation) took 9.55391 seconds
15: As a whole: 65536 rows and 65536 columns and 909896 nonzeros
15: All degrees calculated
15: Load balance: 1.00815
15: [sandy:2062597] *** Process received signal ***
15: Symmetricized
15: --------------------------------------------------------------------------
15: Primary job  terminated normally, but 1 process returned
15: a non-zero exit code. Per user-direction, the job has been aborted.
15: --------------------------------------------------------------------------
15: --------------------------------------------------------------------------
15: mpiexec noticed that process rank 0 with PID 0 on node sandy exited on signal 11 (Segmentation fault).
15: --------------------------------------------------------------------------
 8/20 Test #15: FBFS_Test ........................***Failed   24.88 sec

FMIS_Test:

test 16
      Start 16: FMIS_Test

16: Test command: /usr/bin/mpiexec "-n" "4" "/projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/Applications/fmis" "17"
16: Working Directory: /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/Applications
16: Test timeout computed to be: 1500
16: COMBBLAS Warning: It is dangerous to create (matrix) objects without specifying the communicator, are you sure you want to create this object in MPI_COMM_WORLD?
16: Using synthetic data, which we ALWAYS permute for load balance
16: We only balance the original input, we don't repermute after each filter change
16: BFS is run on UNDIRECTED graph, hence hitting CCs, and TEPS is bidirectional
16: Forcing scale to : 17
16: Generated renamed edge lists
16: graph_generation:               0.647811 s
16: Converted to Boolean and removed 75 loops
16: As a whole: 131072 rows and 131072 columns and 619978 nonzeros
16: Generation took 6.18405 seconds
16: As a whole: 131072 rows and 131072 columns and 619978 nonzeros
16: All degrees calculated
16: Load balance: 1.02317
16: Symmetricized
16: --------------------------------------------------------------------------
16: Primary job  terminated normally, but 1 process returned
16: a non-zero exit code. Per user-direction, the job has been aborted.
16: --------------------------------------------------------------------------
16: --------------------------------------------------------------------------
16: mpiexec noticed that process rank 0 with PID 0 on node sandy exited on signal 11 (Segmentation fault).
16: --------------------------------------------------------------------------
10/20 Test #16: FMIS_Test ........................***Failed   26.22 sec
@drew-parsons
Copy link
Contributor Author

Actually these test failures do track the ones reported later in #15 . So the test failures there don't just affect FreeBSD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant