Indeed, I simply commented out all the MPI_Info stuff, which you essentially did by passing a dummy argument. I'm still not able to get it to succeed.
So here we go, my results defy logic. I'm sure this could be my fault...I've only been an occasional user of OpenMPI and MPI in general over the years and I've never used MPI_Comm_spawn before this project. I tested simple_spawn like so: mpicc simple_spawn.c -o simple_spawn ./simple_spawn When my default hostfile points to a file that just lists localhost, this test completes successfully. If it points to my hostfile with localhost and 5 remote hosts, here's the output: evan@lasarti:~/devel/toy_progs/mpi_spawn$ mpicc simple_spawn.c -o simple_spawn evan@lasarti:~/devel/toy_progs/mpi_spawn$ ./simple_spawn [pid 5703] starting up! 0 completed MPI_Init Parent [pid 5703] about to spawn! [lasarti:05703] [[14661,1],0] FORKING HNP: orted --hnp --set-sid --report-uri 14 --singleton-died-pipe 15 -mca state_novm_select 1 -mca ess_base_jobid 960823296 [lasarti:05705] *** Process received signal *** [lasarti:05705] Signal: Segmentation fault (11) [lasarti:05705] Signal code: Address not mapped (1) [lasarti:05705] Failing at address: (nil) [lasarti:05705] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fc185dcf340] [lasarti:05705] [ 1] /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_rmaps_base_compute_bindings+0x650)[0x7fc186033bb0] [lasarti:05705] [ 2] /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_rmaps_base_map_job+0x939)[0x7fc18602fb99] [lasarti:05705] [ 3] /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x6e4)[0x7fc18577dcc4] [lasarti:05705] [ 4] /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_daemon+0xdf8)[0x7fc186010438] [lasarti:05705] [ 5] orted(main+0x47)[0x400887] [lasarti:05705] [ 6] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fc185a1aec5] [lasarti:05705] [ 7] orted[0x4008db] [lasarti:05705] *** End of error message *** You can see from the message that this particular run IS from the latest snapshot, though the failure happens on v.1.8.4 as well. I didn't bother installing the snapshot on the remote nodes though. Should I do that? It looked to me like this error happened well before we got to a remote node, so that's why I didn't. Your thoughts? Evan On Tue, Feb 3, 2015 at 7:40 PM, Ralph Castain <r...@open-mpi.org> wrote: > I confess I am sorely puzzled. I replace the Info key with MPI_INFO_NULL, > but still had to pass a bogus argument to master since you still have the > Info_set code in there - otherwise, info_set segfaults due to a NULL > argv[1]. Doing that (and replacing "hostname" with an MPI example code) > makes everything work just fine. > > I've attached one of our example comm_spawn codes that we test against - > it also works fine with the current head of the 1.8 code base. I confess > that some changes have been made since 1.8.4 was released, and it is > entirely possible that this was a problem in 1.8.4 and has since been fixed. > > So I'd suggest trying with the nightly 1.8 tarball and seeing if it works > for you. You can download it from here: > > http://www.open-mpi.org/nightly/v1.8/ > > HTH > Ralph > > > On Tue, Feb 3, 2015 at 6:20 PM, Evan Samanas <evan.sama...@gmail.com> > wrote: > >> Yes, I did. I replaced the info argument of MPI_Comm_spawn with >> MPI_INFO_NULL. >> >> On Tue, Feb 3, 2015 at 5:54 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> When running your comm_spawn code, did you remove the Info key code? You >>> wouldn't need to provide a hostfile or hosts any more, which is why it >>> should resolve that problem. >>> >>> I agree that providing either hostfile or host as an Info key will cause >>> the program to segfault - I'm woking on that issue. >>> >>> >>> On Tue, Feb 3, 2015 at 3:46 PM, Evan Samanas <evan.sama...@gmail.com> >>> wrote: >>> >>>> Setting these environment variables did indeed change the way mpirun >>>> maps things, and I didn't have to specify a hostfile. However, setting >>>> these for my MPI_Comm_spawn code still resulted in the same segmentation >>>> fault. >>>> >>>> Evan >>>> >>>> On Tue, Feb 3, 2015 at 10:09 AM, Ralph Castain <r...@open-mpi.org> >>>> wrote: >>>> >>>>> If you add the following to your environment, you should run on >>>>> multiple nodes: >>>>> >>>>> OMPI_MCA_rmaps_base_mapping_policy=node >>>>> OMPI_MCA_orte_default_hostfile=<your hostfile> >>>>> >>>>> The first tells OMPI to map-by node. The second passes in your default >>>>> hostfile so you don't need to specify it as an Info key. >>>>> >>>>> HTH >>>>> Ralph >>>>> >>>>> >>>>> On Tue, Feb 3, 2015 at 9:23 AM, Evan Samanas <evan.sama...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Ralph, >>>>>> >>>>>> Good to know you've reproduced it. I was experiencing this using >>>>>> both the hostfile and host key. A simple comm_spawn was working for me >>>>>> as >>>>>> well, but it was only launching locally, and I'm pretty sure each node >>>>>> only >>>>>> has 4 slots given past behavior (the mpirun -np 8 example I gave in my >>>>>> first email launches on both hosts). Is there a way to specify the >>>>>> hosts I >>>>>> want to launch on without the hostfile or host key so I can test remote >>>>>> launch? >>>>>> >>>>>> And to the "hostname" response...no wonder it was hanging! I just >>>>>> constructed that as a basic example. In my real use I'm launching >>>>>> something that calls MPI_Init. >>>>>> >>>>>> Evan >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2015/02/26271.php >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/02/26272.php >>>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/02/26281.php >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/02/26285.php >>> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/02/26286.php >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/02/26287.php >