Well, I can reproduce it - but I won’t have time to address it until I return later this week.
Whether or not procs get spawned onto a remote host depends on the number of local slots. You asked for 8 processes, so if there are more than 8 slots on the node, then it will launch them all on the local node. If you want to spread them across nodes, you need to use —map-by node Also, FWIW: this job will “hang” as the spawned procs (“hostname”) never call MPI_Init. You can only use MPI_Comm_spawn to launch MPI processes as the spawning parent will blissfully wait forever for the child processes to call MPI_Connect. > On Jan 26, 2015, at 11:29 AM, Evan <evan.sama...@gmail.com> wrote: > > Hi, > > I am using OpenMPI 1.8.4 on a Ubuntu 14.04 machine and 5 Ubuntu 12.04 > machines. I am using ssh to launch MPI jobs and I'm able to run simple > programs like 'mpirun -np 8 --host localhost,pachy1 hostname' and get the > expected output (pachy1 being an entry in my /etc/hosts file). > > I started using MPI_Comm_spawn in my app with the intent of NOT calling > mpirun to launch the program that calls MPI_Comm_spawn (my attempt at using > the Singleton MPI_INIT pattern described in 10.5.2 of MPI 3.0 standard). The > app needs to launch an MPI job of a given size from a given hostfile, where > the job needs to report some info back to the app, so it seemed > MPI_Comm_spawn was my best bet. The app is only rarely going to be used this > way, thus mpirun not being used to launch the app that is the parent in the > MPI_Comm_spawn operation. This pattern works fine if the only entries in the > hostfile are 'localhost'. However if I add a host that isn't local I get a > segmentation fault from the orted process. > > In any case, I distilled my example down as small as I could. I've attached > the C code of the master and the hostfile I'm using. Here's the output: > > evan@lasarti:~/devel/toy_progs/mpi_spawn$ ./master > ~/mpi/test_distributed.hostfile > [lasarti:32020] [[21014,1],0] FORKING HNP: orted --hnp --set-sid --report-uri > 14 --singleton-died-pipe 15 -mca state_novm_select 1 -mca ess_base_jobid > 1377173504 > [lasarti:32022] *** Process received signal *** > [lasarti:32022] Signal: Segmentation fault (11) > [lasarti:32022] Signal code: Address not mapped (1) > [lasarti:32022] Failing at address: (nil) > [lasarti:32022] [ 0] > /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f07af039340] > [lasarti:32022] [ 1] > /opt/openmpi-1.8.4/lib/libopen-pal.so.6(opal_hwloc191_hwloc_get_obj_by_depth+0x32)[0x7f07aea227c2] > [lasarti:32022] [ 2] > /opt/openmpi-1.8.4/lib/libopen-pal.so.6(opal_hwloc_base_get_nbobjs_by_type+0x90)[0x7f07ae9f5430] > [lasarti:32022] [ 3] > /opt/openmpi-1.8.4/lib/openmpi/mca_rmaps_round_robin.so(orte_rmaps_rr_byobj+0x134)[0x7f07ab2fb154] > [lasarti:32022] [ 4] > /opt/openmpi-1.8.4/lib/openmpi/mca_rmaps_round_robin.so(+0x12c6)[0x7f07ab2fa2c6] > [lasarti:32022] [ 5] > /opt/openmpi-1.8.4/lib/libopen-rte.so.7(orte_rmaps_base_map_job+0x21a)[0x7f07af299f7a] > [lasarti:32022] [ 6] > /opt/openmpi-1.8.4/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x6e4)[0x7f07ae9e7034] > [lasarti:32022] [ 7] > /opt/openmpi-1.8.4/lib/libopen-rte.so.7(orte_daemon+0xdff)[0x7f07af27a86f] > [lasarti:32022] [ 8] orted(main+0x47)[0x400877] > [lasarti:32022] [ 9] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f07aec84ec5] > [lasarti:32022] [10] orted[0x4008cb] > [lasarti:32022] *** End of error message *** > > If I launch 'master.c' using mpirun, I don't get a segmentation fault, but it > doesn't seem to be launching the process on anything more than localhost, no > matter what hostfile I give it. > > For what it's worth, I fully expected to debug some path issues regarding the > binary I wanted to launch with MPI_Comm_spawn when I used this distributed, > but this error at first glance doesn't appear to have anything to do with > that. I'm sure this is something silly I'm doing wrong, but I don't really > know how to debug this further given this error. > > Evan > > P.S. Only including zipped config.log since the "ompi_info -v ompi full > --parsable" command I got from http://www.open-mpi.org/community/help/ > doesn't seem to work anymore. > > > <master.c><test_distributed.hostfile><config.log.tar.bz2>_______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/01/26235.php