Well, I can reproduce it - but I won’t have time to address it until I return 
later this week.

Whether or not procs get spawned onto a remote host depends on the number of 
local slots. You asked for 8 processes, so if there are more than 8 slots on 
the node, then it will launch them all on the local node. If you want to spread 
them across nodes, you need to use —map-by node

Also, FWIW: this job will “hang” as the spawned procs (“hostname”) never call 
MPI_Init. You can only use MPI_Comm_spawn to launch MPI processes as the 
spawning parent will blissfully wait forever for the child processes to call 
MPI_Connect.


> On Jan 26, 2015, at 11:29 AM, Evan <evan.sama...@gmail.com> wrote:
> 
> Hi,
> 
> I am using OpenMPI 1.8.4 on a Ubuntu 14.04 machine and 5 Ubuntu 12.04 
> machines.  I am using ssh to launch MPI jobs and I'm able to run simple 
> programs like 'mpirun -np 8 --host localhost,pachy1 hostname' and get the 
> expected output (pachy1 being an entry in my /etc/hosts file).
> 
> I started using MPI_Comm_spawn in my app with the intent of NOT calling 
> mpirun to launch the program that calls MPI_Comm_spawn (my attempt at using 
> the Singleton MPI_INIT pattern described in 10.5.2 of MPI 3.0 standard).  The 
> app needs to launch an MPI job of a given size from a given hostfile, where 
> the job needs to report some info back to the app, so it seemed 
> MPI_Comm_spawn was my best bet.  The app is only rarely going to be used this 
> way, thus mpirun not being used to launch the app that is the parent in the 
> MPI_Comm_spawn operation.  This pattern works fine if the only entries in the 
> hostfile are 'localhost'.  However if I add a host that isn't local I get a 
> segmentation fault from the orted process.
> 
> In any case, I distilled my example down as small as I could.  I've attached 
> the C code of the master and the hostfile I'm using. Here's the output:
> 
> evan@lasarti:~/devel/toy_progs/mpi_spawn$ ./master 
> ~/mpi/test_distributed.hostfile
> [lasarti:32020] [[21014,1],0] FORKING HNP: orted --hnp --set-sid --report-uri 
> 14 --singleton-died-pipe 15 -mca state_novm_select 1 -mca ess_base_jobid 
> 1377173504
> [lasarti:32022] *** Process received signal ***
> [lasarti:32022] Signal: Segmentation fault (11)
> [lasarti:32022] Signal code: Address not mapped (1)
> [lasarti:32022] Failing at address: (nil)
> [lasarti:32022] [ 0] 
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f07af039340]
> [lasarti:32022] [ 1] 
> /opt/openmpi-1.8.4/lib/libopen-pal.so.6(opal_hwloc191_hwloc_get_obj_by_depth+0x32)[0x7f07aea227c2]
> [lasarti:32022] [ 2] 
> /opt/openmpi-1.8.4/lib/libopen-pal.so.6(opal_hwloc_base_get_nbobjs_by_type+0x90)[0x7f07ae9f5430]
> [lasarti:32022] [ 3] 
> /opt/openmpi-1.8.4/lib/openmpi/mca_rmaps_round_robin.so(orte_rmaps_rr_byobj+0x134)[0x7f07ab2fb154]
> [lasarti:32022] [ 4] 
> /opt/openmpi-1.8.4/lib/openmpi/mca_rmaps_round_robin.so(+0x12c6)[0x7f07ab2fa2c6]
> [lasarti:32022] [ 5] 
> /opt/openmpi-1.8.4/lib/libopen-rte.so.7(orte_rmaps_base_map_job+0x21a)[0x7f07af299f7a]
> [lasarti:32022] [ 6] 
> /opt/openmpi-1.8.4/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x6e4)[0x7f07ae9e7034]
> [lasarti:32022] [ 7] 
> /opt/openmpi-1.8.4/lib/libopen-rte.so.7(orte_daemon+0xdff)[0x7f07af27a86f]
> [lasarti:32022] [ 8] orted(main+0x47)[0x400877]
> [lasarti:32022] [ 9] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f07aec84ec5]
> [lasarti:32022] [10] orted[0x4008cb]
> [lasarti:32022] *** End of error message ***
> 
> If I launch 'master.c' using mpirun, I don't get a segmentation fault, but it 
> doesn't seem to be launching the process on anything more than localhost, no 
> matter what hostfile I give it.
> 
> For what it's worth, I fully expected to debug some path issues regarding the 
> binary I wanted to launch with MPI_Comm_spawn when I used this distributed, 
> but this error at first glance doesn't appear to have anything to do with 
> that.  I'm sure this is something silly I'm doing wrong, but I don't really 
> know how to debug this further given this error.
> 
> Evan
> 
> P.S. Only including zipped config.log since the "ompi_info -v ompi full 
> --parsable" command I got from http://www.open-mpi.org/community/help/ 
> doesn't seem to work anymore.
> 
> 
> <master.c><test_distributed.hostfile><config.log.tar.bz2>_______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/01/26235.php

Reply via email to