Hi,

I have installed openmpi-2.0.2rc2 on my "SUSE Linux Enterprise
Server 12 (x86_64)" with Sun C 5.14 beta and gcc-6.2.0. Unfortunately,
I get an error when I run one of my programs. Everything works as
expected with openmpi-master-201612232109-67a08e8. The program
gets a timeout with openmpi-v2.x-201612232156-5ce66b0.

loki spawn 144 ompi_info | grep -e "Open MPI:" -e "C compiler absolute:"
                Open MPI: 2.0.2rc2
     C compiler absolute: /opt/solstudio12.5b/bin/cc


loki spawn 145 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  loki
  System call: open(2)
  Error:       No such file or directory (errno 2)
--------------------------------------------------------------------------
[loki:17855] *** Process received signal ***
[loki:17855] Signal: Segmentation fault (11)
[loki:17855] Signal code: Address not mapped (1)
[loki:17855] Failing at address: 0x8
[loki:17855] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f053d0e9870]
[loki:17855] [ 1] /usr/local/openmpi-2.0.2_64_cc/lib64/openmpi/mca_pml_ob1.so(+0x990ae)[0x7f05325060ae] [loki:17855] [ 2] /usr/local/openmpi-2.0.2_64_cc/lib64/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_req_start+0x196)[0x7f053250cb16] [loki:17855] [ 3] /usr/local/openmpi-2.0.2_64_cc/lib64/openmpi/mca_pml_ob1.so(mca_pml_ob1_irecv+0x2f8)[0x7f05324bd3d8] [loki:17855] [ 4] /usr/local/openmpi-2.0.2_64_cc/lib64/libmpi.so.20(ompi_coll_base_bcast_intra_generic+0x34c)[0x7f053e52300c] [loki:17855] [ 5] /usr/local/openmpi-2.0.2_64_cc/lib64/libmpi.so.20(ompi_coll_base_bcast_intra_binomial+0x1ed)[0x7f053e523eed] [loki:17855] [ 6] /usr/local/openmpi-2.0.2_64_cc/lib64/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0x1a3)[0x7f0531ea7c03] [loki:17855] [ 7] /usr/local/openmpi-2.0.2_64_cc/lib64/libmpi.so.20(ompi_dpm_connect_accept+0xab8)[0x7f053d484f38] [loki:17855] [ 8] [loki:17845] [[55817,0],0] ORTE_ERROR_LOG: Not found in file ../../openmpi-2.0.2rc2/orte/orted/pmix/pmix_server_fence.c at line 186
/usr/local/openmpi-2.0.2_64_cc/lib64/libmpi.so.20(ompi_dpm_dyn_init+0xcd)[0x7f053d48aeed]
[loki:17855] [ 9] /usr/local/openmpi-2.0.2_64_cc/lib64/libmpi.so.20(ompi_mpi_init+0xf93)[0x7f053d53d5f3] [loki:17855] [10] /usr/local/openmpi-2.0.2_64_cc/lib64/libmpi.so.20(PMPI_Init+0x8d)[0x7f053db209cd]
[loki:17855] [11] spawn_slave[0x4009cf]
[loki:17855] [12] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f053cd53b25]
[loki:17855] [13] spawn_slave[0x400892]
[loki:17855] *** End of error message ***
[loki:17845] [[55817,0],0] ORTE_ERROR_LOG: Not found in file ../../openmpi-2.0.2rc2/orte/orted/pmix/pmix_server_fence.c at line 186
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[55817,2],0]) is on host: loki
  Process 2 ([[55817,2],1]) is on host: unknown!
  BTLs attempted: self sm tcp vader

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_dpm_dyn_init() failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
loki spawn 146







loki spawn 120 ompi_info | grep -e "Open MPI:" -e "C compiler absolute:"
                Open MPI: 2.0.2a1
     C compiler absolute: /opt/solstudio12.5b/bin/cc
loki spawn 121 which mpiexec
/usr/local/openmpi-2.1.0_64_cc/bin/mpiexec
loki spawn 122 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

[loki:21301] OPAL ERROR: Timeout in file ../../../../openmpi-v2.x-201612232156-5ce66b0/opal/mca/pmix/base/pmix_base_fns.c at line 195
[loki:21301] *** An error occurred in MPI_Comm_spawn
[loki:21301] *** reported by process [3431727105,0]
[loki:21301] *** on communicator MPI_COMM_WORLD
[loki:21301] *** MPI_ERR_UNKNOWN: unknown error
[loki:21301] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[loki:21301] ***    and potentially your MPI job)
loki spawn 123






loki spawn 111 ompi_info | grep -e "Open MPI:" -e "C compiler"
                Open MPI: 3.0.0a1
              C compiler: cc
     C compiler absolute: /opt/solstudio12.5b/bin/cc
  C compiler family name: SUN
      C compiler version: 0x5140
loki spawn 111 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

Parent process 0: tasks in MPI_COMM_WORLD:                    1
                  tasks in COMM_CHILD_PROCESSES local group:  1
                  tasks in COMM_CHILD_PROCESSES remote group: 4

Slave process 1 of 4 running on loki
Slave process 3 of 4 running on loki
Slave process 0 of 4 running on loki
Slave process 2 of 4 running on loki
spawn_slave 2: argv[0]: spawn_slave
spawn_slave 3: argv[0]: spawn_slave
spawn_slave 0: argv[0]: spawn_slave
spawn_slave 1: argv[0]: spawn_slave
loki spawn 112


I would be grateful, if somebody can fix the problems. Thank you
very much for any help in advance.


Kind regards

Siegmar
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to