Works perfectly for me, so I believe this must be an environment issue - I am using gcc 6.0.0 on CentOS7 with x86:
$ mpirun -n 1 -host bend001 --slot-list 0:0-1,1:0-1 --report-bindings ./simple_spawn [bend001:17599] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: [BB/BB/../../../..][BB/BB/../../../..] [pid 17601] starting up! 0 completed MPI_Init Parent [pid 17601] about to spawn! [pid 17603] starting up! [bend001:17599] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: [BB/BB/../../../..][BB/BB/../../../..] [bend001:17599] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: [BB/BB/../../../..][BB/BB/../../../..] [bend001:17599] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: [BB/BB/../../../..][BB/BB/../../../..] [pid 17604] starting up! [pid 17605] starting up! Parent done with spawn Parent sending message to child 0 completed MPI_Init Hello from the child 0 of 3 on host bend001 pid 17603 Child 0 received msg: 38 1 completed MPI_Init Hello from the child 1 of 3 on host bend001 pid 17604 2 completed MPI_Init Hello from the child 2 of 3 on host bend001 pid 17605 Child 0 disconnected Child 2 disconnected Parent disconnected Child 1 disconnected 17603: exiting 17605: exiting 17601: exiting 17604: exiting $ > On May 24, 2016, at 7:18 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: > > Hi Ralph and Gilles, > > the program breaks only, if I combine "--host" and "--slot-list". Perhaps this > information is helpful. I use a different machine now, so that you can see > that > the problem is not restricted to "loki". > > > pc03 spawn 115 ompi_info | grep -e "OPAL repo revision:" -e "C compiler > absolute:" > OPAL repo revision: v1.10.2-201-gd23dda8 > C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc > > > pc03 spawn 116 uname -a > Linux pc03 3.12.55-52.42-default #1 SMP Thu Mar 3 10:35:46 UTC 2016 (4354e1d) > x86_64 x86_64 x86_64 GNU/Linux > > > pc03 spawn 117 cat host_pc03.openmpi > pc03.informatik.hs-fulda.de slots=12 max_slots=12 > > > pc03 spawn 118 mpicc simple_spawn.c > > > pc03 spawn 119 mpiexec -np 1 --report-bindings a.out > [pc03:03711] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: > [BB/../../../../..][../../../../../..] > [pid 3713] starting up! > 0 completed MPI_Init > Parent [pid 3713] about to spawn! > [pc03:03711] MCW rank 0 bound to socket 1[core 6[hwt 0-1]], socket 1[core > 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket > 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: > [../../../../../..][BB/BB/BB/BB/BB/BB] > [pc03:03711] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket 0[core > 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket > 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: > [BB/BB/BB/BB/BB/BB][../../../../../..] > [pc03:03711] MCW rank 2 bound to socket 1[core 6[hwt 0-1]], socket 1[core > 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket > 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: > [../../../../../..][BB/BB/BB/BB/BB/BB] > [pid 3715] starting up! > [pid 3716] starting up! > [pid 3717] starting up! > Parent done with spawn > Parent sending message to child > 0 completed MPI_Init > Hello from the child 0 of 3 on host pc03 pid 3715 > 1 completed MPI_Init > Hello from the child 1 of 3 on host pc03 pid 3716 > 2 completed MPI_Init > Hello from the child 2 of 3 on host pc03 pid 3717 > Child 0 received msg: 38 > Child 0 disconnected > Child 2 disconnected > Parent disconnected > Child 1 disconnected > 3713: exiting > 3715: exiting > 3716: exiting > 3717: exiting > > > pc03 spawn 120 mpiexec -np 1 --hostfile host_pc03.openmpi --slot-list > 0:0-1,1:0-1 --report-bindings a.out > [pc03:03729] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core > 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: > [BB/BB/../../../..][BB/BB/../../../..] > [pid 3731] starting up! > 0 completed MPI_Init > Parent [pid 3731] about to spawn! > [pc03:03729] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core > 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: > [BB/BB/../../../..][BB/BB/../../../..] > [pc03:03729] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket 0[core > 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: > [BB/BB/../../../..][BB/BB/../../../..] > [pc03:03729] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core > 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: > [BB/BB/../../../..][BB/BB/../../../..] > [pid 3733] starting up! > [pid 3734] starting up! > [pid 3735] starting up! > Parent done with spawn > Parent sending message to child > 2 completed MPI_Init > Hello from the child 2 of 3 on host pc03 pid 3735 > 1 completed MPI_Init > Hello from the child 1 of 3 on host pc03 pid 3734 > 0 completed MPI_Init > Hello from the child 0 of 3 on host pc03 pid 3733 > Child 0 received msg: 38 > Child 0 disconnected > Child 2 disconnected > Child 1 disconnected > Parent disconnected > 3731: exiting > 3734: exiting > 3733: exiting > 3735: exiting > > > pc03 spawn 121 mpiexec -np 1 --host pc03 --slot-list 0:0-1,1:0-1 > --report-bindings a.out > [pc03:03744] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core > 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: > [BB/BB/../../../..][BB/BB/../../../..] > [pid 3746] starting up! > 0 completed MPI_Init > Parent [pid 3746] about to spawn! > [pc03:03744] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core > 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: > [BB/BB/../../../..][BB/BB/../../../..] > [pc03:03744] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core > 1[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]: > [BB/BB/../../../..][BB/BB/../../../..] > [pid 3748] starting up! > [pid 3749] starting up! > [pc03:03749] *** Process received signal *** > [pc03:03749] Signal: Segmentation fault (11) > [pc03:03749] Signal code: Address not mapped (1) > [pc03:03749] Failing at address: 0x8 > [pc03:03749] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7fe6f0d1f870] > [pc03:03749] [ 1] > /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7fe6f0f825b0] > [pc03:03749] [ 2] > /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7fe6f0f61b08] > [pc03:03749] [ 3] > /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7fe6f0f87e8a] > [pc03:03749] [ 4] > /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x1a0)[0x7fe6f0fc42ae] > [pc03:03749] [ 5] a.out[0x400d0c] > [pc03:03749] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fe6f0989b05] > [pc03:03749] [ 7] a.out[0x400bf9] > [pc03:03749] *** End of error message *** > -------------------------------------------------------------------------- > mpiexec noticed that process rank 2 with PID 3749 on node pc03 exited on > signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > pc03 spawn 122 > > > > Kind regards > > Siegmar > > > > > > > On 05/24/16 15:44, Ralph Castain wrote: >> >>> On May 24, 2016, at 6:21 AM, Siegmar Gross >>> <siegmar.gr...@informatik.hs-fulda.de> wrote: >>> >>> Hi Ralph, >>> >>> I copy the relevant lines to this place, so that it is easier to see what >>> happens. "a.out" is your program, which I compiled with mpicc. >>> >>>>> loki spawn 153 ompi_info | grep -e "OPAL repo revision:" -e "C compiler >>>>> absolute:" >>>>> OPAL repo revision: v1.10.2-201-gd23dda8 >>>>> C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc >>>>> loki spawn 154 mpicc simple_spawn.c >>> >>>>> loki spawn 155 mpiexec -np 1 a.out >>>>> [pid 24008] starting up! >>>>> 0 completed MPI_Init >>> ... >>> >>> "mpiexec -np 1 a.out" works. >>> >>> >>> >>>> I don’t know what “a.out” is, but it looks like there is some memory >>>> corruption there. >>> >>> "a.out" is still your program. I get the same error on different >>> machines, so that it is not very likely, that the (hardware) memory >>> is corrupted. >>> >>> >>>>> loki spawn 156 mpiexec -np 1 --host loki --slot-list 0-5 a.out >>>>> [pid 24102] starting up! >>>>> 0 completed MPI_Init >>>>> Parent [pid 24102] about to spawn! >>>>> [pid 24104] starting up! >>>>> [pid 24105] starting up! >>>>> [loki:24105] *** Process received signal *** >>>>> [loki:24105] Signal: Segmentation fault (11) >>>>> [loki:24105] Signal code: Address not mapped (1) >>> ... >>> >>> "mpiexec -np 1 --host loki --slot-list 0-5 a.out" breaks with a segmentation >>> faUlt. Can I do something, so that you can find out, what happens? >> >> I honestly have no idea - perhaps Gilles can help, as I have no access to >> that kind of environment. We aren’t seeing such problems elsewhere, so it is >> likely something local. >> >>> >>> >>> Kind regards >>> >>> Siegmar >>> >>> >>> >>> On 05/24/16 15:07, Ralph Castain wrote: >>>> >>>>> On May 24, 2016, at 4:19 AM, Siegmar Gross >>>>> <siegmar.gr...@informatik.hs-fulda.de >>>>> <mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote: >>>>> >>>>> Hi Ralph, >>>>> >>>>> thank you very much for your answer and your example program. >>>>> >>>>> On 05/23/16 17:45, Ralph Castain wrote: >>>>>> I cannot replicate the problem - both scenarios work fine for me. I’m not >>>>>> convinced your test code is correct, however, as you call Comm_free the >>>>>> inter-communicator but didn’t call Comm_disconnect. Checkout the attached >>>>>> for a correct code and see if it works for you. >>>>> >>>>> I thought that I only need MPI_Comm_Disconnect, if I would have >>>>> established a >>>>> connection with MPI_Comm_connect before. The man page for MPI_Comm_free >>>>> states >>>>> >>>>> "This operation marks the communicator object for deallocation. The >>>>> handle is set to MPI_COMM_NULL. Any pending operations that use this >>>>> communicator will complete normally; the object is actually deallocated >>>>> only >>>>> if there are no other active references to it.". >>>>> >>>>> The man page for MPI_Comm_disconnect states >>>>> >>>>> "MPI_Comm_disconnect waits for all pending communication on comm to >>>>> complete >>>>> internally, deallocates the communicator object, and sets the handle to >>>>> MPI_COMM_NULL. It is a collective operation.". >>>>> >>>>> I don't see a difference for my spawned processes, because both functions >>>>> will >>>>> "wait" until all pending operations have finished, before the object will >>>>> be >>>>> destroyed. Nevertheless, perhaps my small example program worked all the >>>>> years >>>>> by chance. >>>>> >>>>> However, I don't understand, why my program works with >>>>> "mpiexec -np 1 --host loki,loki,loki,loki,loki spawn_master" and breaks >>>>> with >>>>> "mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master". You are >>>>> right, >>>>> my slot-list is equivalent to "-bind-to none". I could also have used >>>>> "mpiexec -np 1 --host loki --oversubscribe spawn_master" which works as >>>>> well. >>>> >>>> Well, you are only giving us one slot when you specify "-host loki”, and >>>> then >>>> you are trying to launch multiple processes into it. The “slot-list” >>>> option only >>>> tells us what cpus to bind each process to - it doesn’t allocate process >>>> slots. >>>> So you have to tell us how many processes are allowed to run on this node. >>>> >>>>> >>>>> The program breaks with "There are not enough slots available in the >>>>> system >>>>> to satisfy ...", if I only use "--host loki" or different host names, >>>>> without mentioning five host names, using "slot-list", or "oversubscribe", >>>>> Unfortunately "--host <host name>:<number of slots>" isn't available for >>>>> openmpi-1.10.3rc2 to specify the number of available slots. >>>> >>>> Correct - we did not backport the new syntax >>>> >>>>> >>>>> Your program behaves the same way as mine, so that MPI_Comm_disconnect >>>>> will not solve my problem. I had to modify your program in a negligible >>>>> way >>>>> to get it compiled. >>>>> >>>>> loki spawn 153 ompi_info | grep -e "OPAL repo revision:" -e "C compiler >>>>> absolute:" >>>>> OPAL repo revision: v1.10.2-201-gd23dda8 >>>>> C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc >>>>> loki spawn 154 mpicc simple_spawn.c >>>>> loki spawn 155 mpiexec -np 1 a.out >>>>> [pid 24008] starting up! >>>>> 0 completed MPI_Init >>>>> Parent [pid 24008] about to spawn! >>>>> [pid 24010] starting up! >>>>> [pid 24011] starting up! >>>>> [pid 24012] starting up! >>>>> Parent done with spawn >>>>> Parent sending message to child >>>>> 0 completed MPI_Init >>>>> Hello from the child 0 of 3 on host loki pid 24010 >>>>> 1 completed MPI_Init >>>>> Hello from the child 1 of 3 on host loki pid 24011 >>>>> 2 completed MPI_Init >>>>> Hello from the child 2 of 3 on host loki pid 24012 >>>>> Child 0 received msg: 38 >>>>> Child 0 disconnected >>>>> Child 1 disconnected >>>>> Child 2 disconnected >>>>> Parent disconnected >>>>> 24012: exiting >>>>> 24010: exiting >>>>> 24008: exiting >>>>> 24011: exiting >>>>> >>>>> >>>>> Is something wrong with my command line? I didn't use slot-list before, so >>>>> that I'm not sure, if I use it in the intended way. >>>> >>>> I don’t know what “a.out” is, but it looks like there is some memory >>>> corruption >>>> there. >>>> >>>>> >>>>> loki spawn 156 mpiexec -np 1 --host loki --slot-list 0-5 a.out >>>>> [pid 24102] starting up! >>>>> 0 completed MPI_Init >>>>> Parent [pid 24102] about to spawn! >>>>> [pid 24104] starting up! >>>>> [pid 24105] starting up! >>>>> [loki:24105] *** Process received signal *** >>>>> [loki:24105] Signal: Segmentation fault (11) >>>>> [loki:24105] Signal code: Address not mapped (1) >>>>> [loki:24105] Failing at address: 0x8 >>>>> [loki:24105] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f39aa76f870] >>>>> [loki:24105] [ 1] >>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7f39aa9d25b0] >>>>> [loki:24105] [ 2] >>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7f39aa9b1b08] >>>>> [loki:24105] [ 3] *** An error occurred in MPI_Init >>>>> *** on a NULL communicator >>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>>>> *** and potentially your MPI job) >>>>> [loki:24104] Local abort before MPI_INIT completed successfully; not able >>>>> to >>>>> aggregate error messages, and not able to guarantee that all other >>>>> processes >>>>> were killed! >>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7f39aa9d7e8a] >>>>> [loki:24105] [ 4] >>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x1a0)[0x7f39aaa142ae] >>>>> [loki:24105] [ 5] a.out[0x400d0c] >>>>> [loki:24105] [ 6] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f39aa3d9b05] >>>>> [loki:24105] [ 7] a.out[0x400bf9] >>>>> [loki:24105] *** End of error message *** >>>>> ------------------------------------------------------- >>>>> Child job 2 terminated normally, but 1 process returned >>>>> a non-zero exit code.. Per user-direction, the job has been aborted. >>>>> ------------------------------------------------------- >>>>> -------------------------------------------------------------------------- >>>>> mpiexec detected that one or more processes exited with non-zero status, >>>>> thus >>>>> causing >>>>> the job to be terminated. The first process to do so was: >>>>> >>>>> Process name: [[49560,2],0] >>>>> Exit code: 1 >>>>> -------------------------------------------------------------------------- >>>>> loki spawn 157 >>>>> >>>>> >>>>> Hopefully, you will find out what happens. Please let me know, if I can >>>>> help you in any way. >>>>> >>>>> Kind regards >>>>> >>>>> Siegmar >>>>> >>>>> >>>>>> FWIW: I don’t know how many cores you have on your sockets, but if you >>>>>> have 6 cores/socket, then your slot-list is equivalent to “—bind-to none” >>>>>> as the slot-list applies to every process being launched >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On May 23, 2016, at 6:26 AM, Siegmar Gross >>>>>>> <siegmar.gr...@informatik.hs-fulda.de >>>>>>> <mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I installed openmpi-1.10.3rc2 on my "SUSE Linux Enterprise Server >>>>>>> 12 (x86_64)" with Sun C 5.13 and gcc-6.1.0. Unfortunately I get >>>>>>> a segmentation fault for "--slot-list" for one of my small programs. >>>>>>> >>>>>>> >>>>>>> loki spawn 119 ompi_info | grep -e "OPAL repo revision:" -e "C compiler >>>>>>> absolute:" >>>>>>> OPAL repo revision: v1.10.2-201-gd23dda8 >>>>>>> C compiler absolute: /usr/local/gcc-6.1.0/bin/gcc >>>>>>> >>>>>>> >>>>>>> loki spawn 120 mpiexec -np 1 --host loki,loki,loki,loki,loki >>>>>>> spawn_master >>>>>>> >>>>>>> Parent process 0 running on loki >>>>>>> I create 4 slave processes >>>>>>> >>>>>>> Parent process 0: tasks in MPI_COMM_WORLD: 1 >>>>>>> tasks in COMM_CHILD_PROCESSES local group: 1 >>>>>>> tasks in COMM_CHILD_PROCESSES remote group: 4 >>>>>>> >>>>>>> Slave process 0 of 4 running on loki >>>>>>> Slave process 1 of 4 running on loki >>>>>>> Slave process 2 of 4 running on loki >>>>>>> spawn_slave 2: argv[0]: spawn_slave >>>>>>> Slave process 3 of 4 running on loki >>>>>>> spawn_slave 0: argv[0]: spawn_slave >>>>>>> spawn_slave 1: argv[0]: spawn_slave >>>>>>> spawn_slave 3: argv[0]: spawn_slave >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> loki spawn 121 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 >>>>>>> spawn_master >>>>>>> >>>>>>> Parent process 0 running on loki >>>>>>> I create 4 slave processes >>>>>>> >>>>>>> [loki:17326] *** Process received signal *** >>>>>>> [loki:17326] Signal: Segmentation fault (11) >>>>>>> [loki:17326] Signal code: Address not mapped (1) >>>>>>> [loki:17326] Failing at address: 0x8 >>>>>>> [loki:17326] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f4e469b3870] >>>>>>> [loki:17326] [ 1] *** An error occurred in MPI_Init >>>>>>> *** on a NULL communicator >>>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>>>>>> *** and potentially your MPI job) >>>>>>> [loki:17324] Local abort before MPI_INIT completed successfully; not >>>>>>> able to >>>>>>> aggregate error messages, and not able to guarantee that all other >>>>>>> processes >>>>>>> were killed! >>>>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_proc_self+0x35)[0x7f4e46c165b0] >>>>>>> [loki:17326] [ 2] >>>>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_comm_init+0x68b)[0x7f4e46bf5b08] >>>>>>> [loki:17326] [ 3] *** An error occurred in MPI_Init >>>>>>> *** on a NULL communicator >>>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>>>>>> *** and potentially your MPI job) >>>>>>> [loki:17325] Local abort before MPI_INIT completed successfully; not >>>>>>> able to >>>>>>> aggregate error messages, and not able to guarantee that all other >>>>>>> processes >>>>>>> were killed! >>>>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(ompi_mpi_init+0xa90)[0x7f4e46c1be8a] >>>>>>> [loki:17326] [ 4] >>>>>>> /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12(MPI_Init+0x180)[0x7f4e46c5828e] >>>>>>> [loki:17326] [ 5] spawn_slave[0x40097e] >>>>>>> [loki:17326] [ 6] >>>>>>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f4e4661db05] >>>>>>> [loki:17326] [ 7] spawn_slave[0x400a54] >>>>>>> [loki:17326] *** End of error message *** >>>>>>> ------------------------------------------------------- >>>>>>> Child job 2 terminated normally, but 1 process returned >>>>>>> a non-zero exit code.. Per user-direction, the job has been aborted. >>>>>>> ------------------------------------------------------- >>>>>>> -------------------------------------------------------------------------- >>>>>>> mpiexec detected that one or more processes exited with non-zero status, >>>>>>> thus causing >>>>>>> the job to be terminated. The first process to do so was: >>>>>>> >>>>>>> Process name: [[56340,2],0] >>>>>>> Exit code: 1 >>>>>>> -------------------------------------------------------------------------- >>>>>>> loki spawn 122 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> I would be grateful, if somebody can fix the problem. Thank you >>>>>>> very much for any help in advance. >>>>>>> >>>>>>> >>>>>>> Kind regards >>>>>>> >>>>>>> Siegmar >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2016/05/29281.php >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this >>>>>> post: http://www.open-mpi.org/community/lists/users/2016/05/29284.php >>>>>> >>>>> <simple_spawn_modified.c>_______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2016/05/29300.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2016/05/29301.php >>>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/05/29304.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/05/29307.php >> > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29308.php