I rebuilt with --enable-debug, then ran with

[bennet@cavium-hpc ~]$ salloc -N 1 --ntasks-per-node=24
salloc: Pending job allocation 158
salloc: job 158 queued and waiting for resources
salloc: job 158 has been allocated resources
salloc: Granted job allocation 158

[bennet@cavium-hpc ~]$ srun ./test_mpi
The sum = 0.866386
Elapsed time is:  5.426759
The sum = 0.866386
Elapsed time is:  5.424068
The sum = 0.866386
Elapsed time is:  5.426195
The sum = 0.866386
Elapsed time is:  5.426059
The sum = 0.866386
Elapsed time is:  5.423192
The sum = 0.866386
Elapsed time is:  5.426252
The sum = 0.866386
Elapsed time is:  5.425444
The sum = 0.866386
Elapsed time is:  5.423647
The sum = 0.866386
Elapsed time is:  5.426082
The sum = 0.866386
Elapsed time is:  5.425936
The sum = 0.866386
Elapsed time is:  5.423964
Total time is:  59.677830

[bennet@cavium-hpc ~]$ mpirun --mca plm_base_verbose 10 ./test_mpi
2>&1 | tee debug2.log

The zipped debug log should be attached.

I did that after using systemctl to turn off the firewall on the login
node from which the mpirun is executed, as well as on the host on
which it runs.

[bennet@cavium-hpc ~]$ mpirun hostname
--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------

[bennet@cavium-hpc ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES
NODELIST(REASON)
               158  standard     bash   bennet  R      14:30      1 cav01
[bennet@cavium-hpc ~]$ srun hostname
cav01.arc-ts.umich.edu
[ repeated 23 more times ]

As always, your help is much appreciated,

-- bennet

On Sun, Jun 17, 2018 at 1:06 PM r...@open-mpi.org <r...@open-mpi.org> wrote:
>
> Add --enable-debug to your OMPI configure cmd line, and then add --mca 
> plm_base_verbose 10 to your mpirun cmd line. For some reason, the remote 
> daemon isn’t starting - this will give you some info as to why.
>
>
> > On Jun 17, 2018, at 9:07 AM, Bennet Fauber <ben...@umich.edu> wrote:
> >
> > I have a compiled binary that will run with srun but not with mpirun.
> > The attempts to run with mpirun all result in failures to initialize.
> > I have tried this on one node, and on two nodes, with firewall turned
> > on and with it off.
> >
> > Am I missing some command line option for mpirun?
> >
> > OMPI built from this configure command
> >
> >  $ ./configure --prefix=/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b
> > --mandir=/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/share/man
> > --with-pmix=/opt/pmix/2.0.2 --with-libevent=external
> > --with-hwloc=external --with-slurm --disable-dlopen CC=gcc CXX=g++
> > FC=gfortran
> >
> > All tests from `make check` passed, see below.
> >
> > [bennet@cavium-hpc ~]$ mpicc --show
> > gcc -I/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/include -pthread
> > -L/opt/pmix/2.0.2/lib -Wl,-rpath -Wl,/opt/pmix/2.0.2/lib -Wl,-rpath
> > -Wl,/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/lib
> > -Wl,--enable-new-dtags
> > -L/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/lib -lmpi
> >
> > The test_mpi was compiled with
> >
> > $ gcc -o test_mpi test_mpi.c -lm
> >
> > This is the runtime library path
> >
> > [bennet@cavium-hpc ~]$ echo $LD_LIBRARY_PATH
> > /opt/slurm/lib64:/sw/arcts/centos7/gcc_7_1_0/openmpi/3.1.0-b/lib:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib64:/opt/arm/gcc-7.1.0_Generic-AArch64_RHEL-7_aarch64-linux/lib:/opt/slurm/lib64:/opt/pmix/2.0.2/lib:/sw/arcts/centos7/hpc-utils/lib
> >
> >
> > These commands are given in exact sequence in which they were entered
> > at a console.
> >
> > [bennet@cavium-hpc ~]$ salloc -N 1 --ntasks-per-node=24
> > salloc: Pending job allocation 156
> > salloc: job 156 queued and waiting for resources
> > salloc: job 156 has been allocated resources
> > salloc: Granted job allocation 156
> >
> > [bennet@cavium-hpc ~]$ mpirun ./test_mpi
> > --------------------------------------------------------------------------
> > An ORTE daemon has unexpectedly failed after launch and before
> > communicating back to mpirun. This could be caused by a number
> > of factors, including an inability to create a connection back
> > to mpirun due to a lack of common network interfaces and/or no
> > route found between them. Please check network connectivity
> > (including firewalls and network routing requirements).
> > --------------------------------------------------------------------------
> >
> > [bennet@cavium-hpc ~]$ srun ./test_mpi
> > The sum = 0.866386
> > Elapsed time is:  5.425439
> > The sum = 0.866386
> > Elapsed time is:  5.427427
> > The sum = 0.866386
> > Elapsed time is:  5.422579
> > The sum = 0.866386
> > Elapsed time is:  5.424168
> > The sum = 0.866386
> > Elapsed time is:  5.423951
> > The sum = 0.866386
> > Elapsed time is:  5.422414
> > The sum = 0.866386
> > Elapsed time is:  5.427156
> > The sum = 0.866386
> > Elapsed time is:  5.424834
> > The sum = 0.866386
> > Elapsed time is:  5.425103
> > The sum = 0.866386
> > Elapsed time is:  5.422415
> > The sum = 0.866386
> > Elapsed time is:  5.422948
> > Total time is:  59.668622
> >
> > Thanks,    -- bennet
> >
> >
> > make check results
> > ----------------------------------------------
> >
> > make  check-TESTS
> > make[3]: Entering directory `/tmp/build/openmpi-3.1.0/ompi/debuggers'
> > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/ompi/debuggers'
> > PASS: predefined_gap_test
> > PASS: predefined_pad_test
> > SKIP: dlopen_test
> > ============================================================================
> > Testsuite summary for Open MPI 3.1.0
> > ============================================================================
> > # TOTAL: 3
> > # PASS:  2
> > # SKIP:  1
> > # XFAIL: 0
> > # FAIL:  0
> > # XPASS: 0
> > # ERROR: 0
> > ============================================================================
> > [ elided ]
> > PASS: atomic_cmpset_noinline
> >    - 5 threads: Passed
> > PASS: atomic_cmpset_noinline
> >    - 8 threads: Passed
> > ============================================================================
> > Testsuite summary for Open MPI 3.1.0
> > ============================================================================
> > # TOTAL: 8
> > # PASS:  8
> > # SKIP:  0
> > # XFAIL: 0
> > # FAIL:  0
> > # XPASS: 0
> > # ERROR: 0
> > ============================================================================
> > [ elided ]
> > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/class'
> > PASS: ompi_rb_tree
> > PASS: opal_bitmap
> > PASS: opal_hash_table
> > PASS: opal_proc_table
> > PASS: opal_tree
> > PASS: opal_list
> > PASS: opal_value_array
> > PASS: opal_pointer_array
> > PASS: opal_lifo
> > PASS: opal_fifo
> > ============================================================================
> > Testsuite summary for Open MPI 3.1.0
> > ============================================================================
> > # TOTAL: 10
> > # PASS:  10
> > # SKIP:  0
> > # XFAIL: 0
> > # FAIL:  0
> > # XPASS: 0
> > # ERROR: 0
> > ============================================================================
> > [ elided ]
> > make  opal_thread opal_condition
> > make[3]: Entering directory `/tmp/build/openmpi-3.1.0/test/threads'
> >  CC       opal_thread.o
> >  CCLD     opal_thread
> >  CC       opal_condition.o
> >  CCLD     opal_condition
> > make[3]: Leaving directory `/tmp/build/openmpi-3.1.0/test/threads'
> > make  check-TESTS
> > make[3]: Entering directory `/tmp/build/openmpi-3.1.0/test/threads'
> > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/threads'
> > ============================================================================
> > Testsuite summary for Open MPI 3.1.0
> > ============================================================================
> > # TOTAL: 0
> > # PASS:  0
> > # SKIP:  0
> > # XFAIL: 0
> > # FAIL:  0
> > # XPASS: 0
> > # ERROR: 0
> > ============================================================================
> > [ elided ]
> > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/datatype'
> > PASS: opal_datatype_test
> > PASS: unpack_hetero
> > PASS: checksum
> > PASS: position
> > PASS: position_noncontig
> > PASS: ddt_test
> > PASS: ddt_raw
> > PASS: unpack_ooo
> > PASS: ddt_pack
> > PASS: external32
> > ============================================================================
> > Testsuite summary for Open MPI 3.1.0
> > ============================================================================
> > # TOTAL: 10
> > # PASS:  10
> > # SKIP:  0
> > # XFAIL: 0
> > # FAIL:  0
> > # XPASS: 0
> > # ERROR: 0
> > ============================================================================
> > [ elided ]
> > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/util'
> > PASS: opal_bit_ops
> > PASS: opal_path_nfs
> > PASS: bipartite_graph
> > ============================================================================
> > Testsuite summary for Open MPI 3.1.0
> > ============================================================================
> > # TOTAL: 3
> > # PASS:  3
> > # SKIP:  0
> > # XFAIL: 0
> > # FAIL:  0
> > # XPASS: 0
> > # ERROR: 0
> > ============================================================================
> > [ elided ]
> > make[4]: Entering directory `/tmp/build/openmpi-3.1.0/test/dss'
> > PASS: dss_buffer
> > PASS: dss_cmp
> > PASS: dss_payload
> > PASS: dss_print
> > ============================================================================
> > Testsuite summary for Open MPI 3.1.0
> > ============================================================================
> > # TOTAL: 4
> > # PASS:  4
> > # SKIP:  0
> > # XFAIL: 0
> > # FAIL:  0
> > # XPASS: 0
> > # ERROR: 0
> > ============================================================================
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

Attachment: debug2.log.gz
Description: GNU Zip compressed data

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to