I rebuilt with --enable-debug, then ran with [bennet@cavium-hpc ~]$ salloc -N 1 --ntasks-per-node=24 salloc: Pending job allocation 158 salloc: job 158 queued and waiting for resources salloc: job 158 has been allocated resources salloc: Granted job allocation 158
[bennet@cavium-hpc ~]$ srun ./test_mpi The sum = 0.866386 Elapsed time is: 5.426759 The sum = 0.866386 Elapsed time is: 5.424068 The sum = 0.866386 Elapsed time is: 5.426195 The sum = 0.866386 Elapsed time is: 5.426059 The sum = 0.866386 Elapsed time is: 5.423192 The sum = 0.866386 Elapsed time is: 5.426252 The sum = 0.866386 Elapsed time is: 5.425444 The sum = 0.866386 Elapsed time is: 5.423647 The sum = 0.866386 Elapsed time is: 5.426082 The sum = 0.866386 Elapsed time is: 5.425936 The sum = 0.866386 Elapsed time is: 5.423964 Total time is: 59.677830 [bennet@cavium-hpc ~]$ mpirun --mca plm_base_verbose 10 ./test_mpi 2>&1 | tee debug2.log The zipped debug log should be attached. I did that after using systemctl to turn off the firewall on the login node from which the mpirun is executed, as well as on the host on which it runs. [bennet@cavium-hpc ~]$ mpirun hostname -------------------------------------------------------------------------- An ORTE daemon has unexpectedly failed after launch and before communicating back to mpirun. This could be caused by a number of factors, including an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -------------------------------------------------------------------------- [bennet@cavium-hpc ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 158 standard bash bennet R 14:30 1 cav01 [bennet@cavium-hpc ~]$ srun hostname cav01.arc-ts.umich.edu [ repeated 23 more times ] As always, your help is much appreciated, -- bennet On Sun, Jun 17, 2018 at 1:06 PM r...@open-mpi.org <r...@open-mpi.org> wrote: > > Add --enable-debug to your OMPI configure cmd line, and then add --mca > plm_base_verbose 10 to your mpirun cmd line. For some reason, the remote > daemon isn’t starting - this will give you some info as to why. > > > > On Jun 17, 2018, at 9:07 AM, Bennet Fauber <ben...@umich.edu> wrote: > > > > I have a compiled binary that will run with srun but not with mpirun. > > The attempts to run with mpirun all result in failures to initialize. > > I have tried this on one node, and on two nodes, with firewall turned > > on and with it off. > > > > Am I missing some command line option for mpirun? > > > > OMPI built from this configure command > > > > $ ./configure --prefix=/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b > > --mandir=/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/share/man > > --with-pmix=/opt/pmix/2.0.2 --with-libevent=external > > --with-hwloc=external --with-slurm --disable-dlopen CC=gcc CXX=g++ > > FC=gfortran > > > > All tests from `make check` passed, see below. > > > > [bennet@cavium-hpc ~]$ mpicc --show > > gcc -I/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/include -pthread > > -L/opt/pmix/2.0.2/lib -Wl,-rpath -Wl,/opt/pmix/2.0.2/lib -Wl,-rpath > > -Wl,/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/lib > > -Wl,--enable-new-dtags > > -L/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/lib -lmpi > > > > The test_mpi was compiled with > > > > $ gcc -o test_mpi test_mpi.c -lm > > > > This is the runtime library path > > > > [bennet@cavium-hpc ~]$ echo $LD_LIBRARY_PATH > > /opt/slurm/lib64:/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/lib:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib64:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib:/opt/slurm/lib64:/opt/pmix/2.0.2/lib:/sw/arcts/centos7/hpc-utils/lib > > > > > > These commands are given in exact sequence in which they were entered > > at a console. > > > > [bennet@cavium-hpc ~]$ salloc -N 1 --ntasks-per-node=24 > > salloc: Pending job allocation 156 > > salloc: job 156 queued and waiting for resources > > salloc: job 156 has been allocated resources > > salloc: Granted job allocation 156 > > > > [bennet@cavium-hpc ~]$ mpirun ./test_mpi > > -------------------------------------------------------------------------- > > An ORTE daemon has unexpectedly failed after launch and before > > communicating back to mpirun. This could be caused by a number > > of factors, including an inability to create a connection back > > to mpirun due to a lack of common network interfaces and/or no > > route found between them. Please check network connectivity > > (including firewalls and network routing requirements). > > -------------------------------------------------------------------------- > > > > [bennet@cavium-hpc ~]$ srun ./test_mpi > > The sum = 0.866386 > > Elapsed time is: 5.425439 > > The sum = 0.866386 > > Elapsed time is: 5.427427 > > The sum = 0.866386 > > Elapsed time is: 5.422579 > > The sum = 0.866386 > > Elapsed time is: 5.424168 > > The sum = 0.866386 > > Elapsed time is: 5.423951 > > The sum = 0.866386 > > Elapsed time is: 5.422414 > > The sum = 0.866386 > > Elapsed time is: 5.427156 > > The sum = 0.866386 > > Elapsed time is: 5.424834 > > The sum = 0.866386 > > Elapsed time is: 5.425103 > > The sum = 0.866386 > > Elapsed time is: 5.422415 > > The sum = 0.866386 > > Elapsed time is: 5.422948 > > Total time is: 59.668622 > > > > Thanks, -- bennet > > > > > > make check results > > ---------------------------------------------- > > > > make check-TESTS > > make[3]: Entering directory `/tmp/build/openmpi-3.1.0/ompi/debuggers' > > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/ompi/debuggers' > > PASS: predefined_gap_test > > PASS: predefined_pad_test > > SKIP: dlopen_test > > ============================================================================ > > Testsuite summary for Open MPI 3.1.0 > > ============================================================================ > > # TOTAL: 3 > > # PASS: 2 > > # SKIP: 1 > > # XFAIL: 0 > > # FAIL: 0 > > # XPASS: 0 > > # ERROR: 0 > > ============================================================================ > > [ elided ] > > PASS: atomic_cmpset_noinline > > - 5 threads: Passed > > PASS: atomic_cmpset_noinline > > - 8 threads: Passed > > ============================================================================ > > Testsuite summary for Open MPI 3.1.0 > > ============================================================================ > > # TOTAL: 8 > > # PASS: 8 > > # SKIP: 0 > > # XFAIL: 0 > > # FAIL: 0 > > # XPASS: 0 > > # ERROR: 0 > > ============================================================================ > > [ elided ] > > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/class' > > PASS: ompi_rb_tree > > PASS: opal_bitmap > > PASS: opal_hash_table > > PASS: opal_proc_table > > PASS: opal_tree > > PASS: opal_list > > PASS: opal_value_array > > PASS: opal_pointer_array > > PASS: opal_lifo > > PASS: opal_fifo > > ============================================================================ > > Testsuite summary for Open MPI 3.1.0 > > ============================================================================ > > # TOTAL: 10 > > # PASS: 10 > > # SKIP: 0 > > # XFAIL: 0 > > # FAIL: 0 > > # XPASS: 0 > > # ERROR: 0 > > ============================================================================ > > [ elided ] > > make opal_thread opal_condition > > make[3]: Entering directory `/tmp/build/openmpi-3.1.0/test/threads' > > CC opal_thread.o > > CCLD opal_thread > > CC opal_condition.o > > CCLD opal_condition > > make[3]: Leaving directory `/tmp/build/openmpi-3.1.0/test/threads' > > make check-TESTS > > make[3]: Entering directory `/tmp/build/openmpi-3.1.0/test/threads' > > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/threads' > > ============================================================================ > > Testsuite summary for Open MPI 3.1.0 > > ============================================================================ > > # TOTAL: 0 > > # PASS: 0 > > # SKIP: 0 > > # XFAIL: 0 > > # FAIL: 0 > > # XPASS: 0 > > # ERROR: 0 > > ============================================================================ > > [ elided ] > > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/datatype' > > PASS: opal_datatype_test > > PASS: unpack_hetero > > PASS: checksum > > PASS: position > > PASS: position_noncontig > > PASS: ddt_test > > PASS: ddt_raw > > PASS: unpack_ooo > > PASS: ddt_pack > > PASS: external32 > > ============================================================================ > > Testsuite summary for Open MPI 3.1.0 > > ============================================================================ > > # TOTAL: 10 > > # PASS: 10 > > # SKIP: 0 > > # XFAIL: 0 > > # FAIL: 0 > > # XPASS: 0 > > # ERROR: 0 > > ============================================================================ > > [ elided ] > > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/util' > > PASS: opal_bit_ops > > PASS: opal_path_nfs > > PASS: bipartite_graph > > ============================================================================ > > Testsuite summary for Open MPI 3.1.0 > > ============================================================================ > > # TOTAL: 3 > > # PASS: 3 > > # SKIP: 0 > > # XFAIL: 0 > > # FAIL: 0 > > # XPASS: 0 > > # ERROR: 0 > > ============================================================================ > > [ elided ] > > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/dss' > > PASS: dss_buffer > > PASS: dss_cmp > > PASS: dss_payload > > PASS: dss_print > > ============================================================================ > > Testsuite summary for Open MPI 3.1.0 > > ============================================================================ > > # TOTAL: 4 > > # PASS: 4 > > # SKIP: 0 > > # XFAIL: 0 > > # FAIL: 0 > > # XPASS: 0 > > # ERROR: 0 > > ============================================================================ > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
debug2.log.gz
Description: GNU Zip compressed data
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users