Hmmm, I went to the build directories of openmpi for my two machines,
went into the orte/test/mpi directory and made the executables on both
machines.  I set the hostsfile in the env variable on the "master"
machine.

Here's the output:

OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
./simple_spawn
Parent [pid 97504] starting up!
0 completed MPI_Init
Parent [pid 97504] about to spawn!
Parent [pid 97507] starting up!
Parent [pid 97508] starting up!
Parent [pid 30626] starting up!
^C
zsh: interrupt  OMPI_MCA_orte_default_hostfile= ./simple_spawn

I had to ^C to kill the hung process.

When I run using mpirun:

OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
mpirun -np 1 ./simple_spawn
Parent [pid 97511] starting up!
0 completed MPI_Init
Parent [pid 97511] about to spawn!
Parent [pid 97513] starting up!
Parent [pid 30762] starting up!
Parent [pid 30764] starting up!
Parent done with spawn
Parent sending message to child
1 completed MPI_Init
Hello from the child 1 of 3 on host budgeb-sandybridge pid 97513
0 completed MPI_Init
Hello from the child 0 of 3 on host budgeb-interlagos pid 30762
2 completed MPI_Init
Hello from the child 2 of 3 on host budgeb-interlagos pid 30764
Child 1 disconnected
Child 0 received msg: 38
Child 0 disconnected
Parent disconnected
Child 2 disconnected
97511: exiting
97513: exiting
30762: exiting
30764: exiting

As you can see, I'm using openmpi v 1.6.1.  I just barely freshly
installed on both machines using the default configure options.

Thanks for all your help.

  Brian

On Tue, Aug 28, 2012 at 4:39 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Looks to me like it didn't find your executable - could be a question of 
> where it exists relative to where you are running. If you look in your OMPI 
> source tree at the orte/test/mpi directory, you'll see an example program 
> "simple_spawn.c" there. Just "make simple_spawn" and execute that with your 
> default hostfile set - does it work okay?
>
> It works fine for me, hence the question.
>
> Also, what OMPI version are you using?
>
> On Aug 28, 2012, at 4:25 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>
>> I see.  Okay.  So, I just tried removing the check for universe size,
>> and set the universe size to 2.  Here's my output:
>>
>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
>> [budgeb-interlagos:29965] [[4156,0],0] ORTE_ERROR_LOG: Fatal in file
>> base/plm_base_receive.c at line 253
>> [budgeb-interlagos:29963] [[4156,1],0] ORTE_ERROR_LOG: The specified
>> application failed to start in file dpm_orte.c at line 785
>>
>> The corresponding run with mpirun still works.
>>
>> Thanks,
>>  Brian
>>
>> On Tue, Aug 28, 2012 at 2:46 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> I see the issue - it's here:
>>>
>>>>  MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, &flag);
>>>>
>>>>  if(!flag) {
>>>>      std::cerr << "no universe size" << std::endl;
>>>>      return -1;
>>>>  }
>>>>  universeSize = *puniverseSize;
>>>>  if(universeSize == 1) {
>>>>      std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>>>>  }
>>>
>>> The universe size is set to 1 on a singleton because the attribute gets set 
>>> at the beginning of time - we haven't any way to go back and change it. The 
>>> sequence of events explains why. The singleton starts up and sets its 
>>> attributes, including universe_size. It also spins off an orte daemon to 
>>> act as its own private "mpirun" in case you call comm_spawn. At this point, 
>>> however, no hostfile has been read - the singleton is just an MPI proc 
>>> doing its own thing, and the orte daemon is just sitting there on 
>>> "stand-by".
>>>
>>> When your app calls comm_spawn, then the orte daemon gets called to launch 
>>> the new procs. At that time, it (not the original singleton!) reads the 
>>> hostfile to find out how many nodes are around, and then does the launch.
>>>
>>> You are trying to check the number of nodes from within the singleton, 
>>> which won't work - it has no way of discovering that info.
>>>
>>>
>>>
>>>
>>> On Aug 28, 2012, at 2:38 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>>>
>>>>> echo hostsfile
>>>> localhost
>>>> budgeb-sandybridge
>>>>
>>>> Thanks,
>>>> Brian
>>>>
>>>> On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> Hmmm...what is in your "hostsfile"?
>>>>>
>>>>> On Aug 28, 2012, at 2:33 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>>>>>
>>>>>> Hi Ralph -
>>>>>>
>>>>>> Thanks for confirming this is possible.  I'm trying this and currently
>>>>>> failing.  Perhaps there's something I'm missing in the code to make
>>>>>> this work.  Here are the two instantiations and their outputs:
>>>>>>
>>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>>>>>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
>>>>>> cannot start slaves... not enough nodes
>>>>>>
>>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>>>>>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 ./master_exe
>>>>>> master spawned 1 slaves...
>>>>>> slave responding...
>>>>>>
>>>>>>
>>>>>> The code:
>>>>>>
>>>>>> //master.cpp
>>>>>> #include <mpi.h>
>>>>>> #include <boost/filesystem.hpp>
>>>>>> #include <iostream>
>>>>>>
>>>>>> int main(int argc, char **args) {
>>>>>>  int worldSize, universeSize, *puniverseSize, flag;
>>>>>>
>>>>>>  MPI_Comm everyone; //intercomm
>>>>>>  boost::filesystem::path curPath =
>>>>>> boost::filesystem::absolute(boost::filesystem::current_path());
>>>>>>
>>>>>>  std::string toRun = (curPath / "slave_exe").string();
>>>>>>
>>>>>>  int ret = MPI_Init(&argc, &args);
>>>>>>
>>>>>>  if(ret != MPI_SUCCESS) {
>>>>>>      std::cerr << "failed init" << std::endl;
>>>>>>      return -1;
>>>>>>  }
>>>>>>
>>>>>>  MPI_Comm_size(MPI_COMM_WORLD, &worldSize);
>>>>>>
>>>>>>  if(worldSize != 1) {
>>>>>>      std::cerr << "too many masters" << std::endl;
>>>>>>  }
>>>>>>
>>>>>>  MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, &flag);
>>>>>>
>>>>>>  if(!flag) {
>>>>>>      std::cerr << "no universe size" << std::endl;
>>>>>>      return -1;
>>>>>>  }
>>>>>>  universeSize = *puniverseSize;
>>>>>>  if(universeSize == 1) {
>>>>>>      std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>>>>>>  }
>>>>>>
>>>>>>
>>>>>>  char *buf = (char*)alloca(toRun.size() + 1);
>>>>>>  memcpy(buf, toRun.c_str(), toRun.size());
>>>>>>  buf[toRun.size()] = '\0';
>>>>>>
>>>>>>  MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL,
>>>>>> 0, MPI_COMM_SELF, &everyone,
>>>>>>                 MPI_ERRCODES_IGNORE);
>>>>>>
>>>>>>  std::cerr << "master spawned " << universeSize-1 << " slaves..."
>>>>>> << std::endl;
>>>>>>
>>>>>>  MPI_Finalize();
>>>>>>
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>>
>>>>>> //slave.cpp
>>>>>> #include <mpi.h>
>>>>>>
>>>>>> int main(int argc, char **args) {
>>>>>>  int size;
>>>>>>  MPI_Comm parent;
>>>>>>  MPI_Init(&argc, &args);
>>>>>>
>>>>>>  MPI_Comm_get_parent(&parent);
>>>>>>
>>>>>>  if(parent == MPI_COMM_NULL) {
>>>>>>      std::cerr << "slave has no parent" << std::endl;
>>>>>>  }
>>>>>>  MPI_Comm_remote_size(parent, &size);
>>>>>>  if(size != 1) {
>>>>>>      std::cerr << "parent size is " << size << std::endl;
>>>>>>  }
>>>>>>
>>>>>>  std::cerr << "slave responding..." << std::endl;
>>>>>>
>>>>>>  MPI_Finalize();
>>>>>>
>>>>>>  return 0;
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Any ideas?  Thanks for any help.
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>>> It really is just that simple :-)
>>>>>>>
>>>>>>> On Aug 22, 2012, at 8:56 AM, Brian Budge <brian.bu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Okay.  Is there a tutorial or FAQ for setting everything up?  Or is it
>>>>>>>> really just that simple?  I don't need to run a copy of the orte
>>>>>>>> server somewhere?
>>>>>>>>
>>>>>>>> if my current ip is 192.168.0.1,
>>>>>>>>
>>>>>>>> 0 > echo 192.168.0.11 > /tmp/hostfile
>>>>>>>> 1 > echo 192.168.0.12 >> /tmp/hostfile
>>>>>>>> 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile
>>>>>>>> 3 > ./mySpawningExe
>>>>>>>>
>>>>>>>> At this point, mySpawningExe will be the master, running on
>>>>>>>> 192.168.0.1, and I can have spawned, for example, childExe on
>>>>>>>> 192.168.0.11 and 192.168.0.12?  Or childExe1 on 192.168.0.11 and
>>>>>>>> childExe2 on 192.168.0.12?
>>>>>>>>
>>>>>>>> Thanks for the help.
>>>>>>>>
>>>>>>>> Brian
>>>>>>>>
>>>>>>>> On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain <r...@open-mpi.org> 
>>>>>>>> wrote:
>>>>>>>>> Sure, that's still true on all 1.3 or above releases. All you need to 
>>>>>>>>> do is set the hostfile envar so we pick it up:
>>>>>>>>>
>>>>>>>>> OMPI_MCA_orte_default_hostfile=<foo>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Aug 21, 2012, at 7:23 PM, Brian Budge <brian.bu...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi.  I know this is an old thread, but I'm curious if there are any
>>>>>>>>>> tutorials describing how to set this up?  Is this still available on
>>>>>>>>>> newer open mpi versions?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Brian
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain <r...@lanl.gov> wrote:
>>>>>>>>>>> Hi Elena
>>>>>>>>>>>
>>>>>>>>>>> I'm copying this to the user list just to correct a mis-statement 
>>>>>>>>>>> on my part
>>>>>>>>>>> in an earlier message that went there. I had stated that a 
>>>>>>>>>>> singleton could
>>>>>>>>>>> comm_spawn onto other nodes listed in a hostfile by setting an 
>>>>>>>>>>> environmental
>>>>>>>>>>> variable that pointed us to the hostfile.
>>>>>>>>>>>
>>>>>>>>>>> This is incorrect in the 1.2 code series. That series does not allow
>>>>>>>>>>> singletons to read a hostfile at all. Hence, any comm_spawn done by 
>>>>>>>>>>> a
>>>>>>>>>>> singleton can only launch child processes on the singleton's local 
>>>>>>>>>>> host.
>>>>>>>>>>>
>>>>>>>>>>> This situation has been corrected for the upcoming 1.3 code series. 
>>>>>>>>>>> For the
>>>>>>>>>>> 1.2 series, though, you will have to do it via an mpirun command 
>>>>>>>>>>> line.
>>>>>>>>>>>
>>>>>>>>>>> Sorry for the confusion - I sometimes have too many code families 
>>>>>>>>>>> to keep
>>>>>>>>>>> straight in this old mind!
>>>>>>>>>>>
>>>>>>>>>>> Ralph
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 1/4/08 5:10 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello Ralph,
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you very much for the explanations.
>>>>>>>>>>>> But I still do not get it running...
>>>>>>>>>>>>
>>>>>>>>>>>> For the case
>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host 
>>>>>>>>>>>> my_master.exe
>>>>>>>>>>>> everything works.
>>>>>>>>>>>>
>>>>>>>>>>>> For the case
>>>>>>>>>>>> ./my_master.exe
>>>>>>>>>>>> it does not.
>>>>>>>>>>>>
>>>>>>>>>>>> I did:
>>>>>>>>>>>> - create my_hostfile and put it in the $HOME/.openmpi/components/
>>>>>>>>>>>> my_hostfile :
>>>>>>>>>>>> bollenstreek slots=2 max_slots=3
>>>>>>>>>>>> octocore01 slots=8  max_slots=8
>>>>>>>>>>>> octocore02 slots=8  max_slots=8
>>>>>>>>>>>> clstr000 slots=2 max_slots=3
>>>>>>>>>>>> clstr001 slots=2 max_slots=3
>>>>>>>>>>>> clstr002 slots=2 max_slots=3
>>>>>>>>>>>> clstr003 slots=2 max_slots=3
>>>>>>>>>>>> clstr004 slots=2 max_slots=3
>>>>>>>>>>>> clstr005 slots=2 max_slots=3
>>>>>>>>>>>> clstr006 slots=2 max_slots=3
>>>>>>>>>>>> clstr007 slots=2 max_slots=3
>>>>>>>>>>>> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I  put it in 
>>>>>>>>>>>> .tcshrc and
>>>>>>>>>>>> then source .tcshrc)
>>>>>>>>>>>> - in my_master.cpp I did
>>>>>>>>>>>> MPI_Info info1;
>>>>>>>>>>>> MPI_Info_create(&info1);
>>>>>>>>>>>> char* hostname =
>>>>>>>>>>>> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02";
>>>>>>>>>>>> MPI_Info_set(info1, "host", hostname);
>>>>>>>>>>>>
>>>>>>>>>>>> _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, info1, 0,
>>>>>>>>>>>> MPI_ERRCODES_IGNORE);
>>>>>>>>>>>>
>>>>>>>>>>>> - After I call the executable, I've got this error message
>>>>>>>>>>>>
>>>>>>>>>>>> bollenstreek: > ./my_master
>>>>>>>>>>>> number of processes to run: 1
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> Some of the requested hosts are not included in the current 
>>>>>>>>>>>> allocation for
>>>>>>>>>>>> the application:
>>>>>>>>>>>> ./childexe
>>>>>>>>>>>> The requested hosts were:
>>>>>>>>>>>> clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02
>>>>>>>>>>>>
>>>>>>>>>>>> Verify that you have mapped the allocated resources properly using 
>>>>>>>>>>>> the
>>>>>>>>>>>> --host specification.
>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in 
>>>>>>>>>>>> file
>>>>>>>>>>>> base/rmaps_base_support_fns.c at line 225
>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in 
>>>>>>>>>>>> file
>>>>>>>>>>>> rmaps_rr.c at line 478
>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in 
>>>>>>>>>>>> file
>>>>>>>>>>>> base/rmaps_base_map_job.c at line 210
>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in 
>>>>>>>>>>>> file
>>>>>>>>>>>> rmgr_urm.c at line 372
>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in 
>>>>>>>>>>>> file
>>>>>>>>>>>> communicator/comm_dyn.c at line 608
>>>>>>>>>>>>
>>>>>>>>>>>> Did I miss something?
>>>>>>>>>>>> Thanks for help!
>>>>>>>>>>>>
>>>>>>>>>>>> Elena
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov]
>>>>>>>>>>>> Sent: Tuesday, December 18, 2007 3:50 PM
>>>>>>>>>>>> To: Elena Zhebel; Open MPI Users <us...@open-mpi.org>
>>>>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster 
>>>>>>>>>>>> configuration
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/18/07 7:35 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks a lot! Now it works!
>>>>>>>>>>>>> The solution is to use mpirun -n 1 -hostfile my.hosts *.exe and 
>>>>>>>>>>>>> pass
>>>>>>>>>>>> MPI_Info
>>>>>>>>>>>>> Key to the Spawn function!
>>>>>>>>>>>>>
>>>>>>>>>>>>> One more question: is it necessary to start my "master" program 
>>>>>>>>>>>>> with
>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host 
>>>>>>>>>>>>> my_master.exe ?
>>>>>>>>>>>>
>>>>>>>>>>>> No, it isn't necessary - assuming that my_master_host is the first 
>>>>>>>>>>>> host
>>>>>>>>>>>> listed in your hostfile! If you are only executing one 
>>>>>>>>>>>> my_master.exe (i.e.,
>>>>>>>>>>>> you gave -n 1 to mpirun), then we will automatically map that 
>>>>>>>>>>>> process onto
>>>>>>>>>>>> the first host in your hostfile.
>>>>>>>>>>>>
>>>>>>>>>>>> If you want my_master.exe to go on someone other than the first 
>>>>>>>>>>>> host in the
>>>>>>>>>>>> file, then you have to give us the -host option.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are there other possibilities for easy start?
>>>>>>>>>>>>> I would say just to run ./my_master.exe , but then the master 
>>>>>>>>>>>>> process
>>>>>>>>>>>> doesn't
>>>>>>>>>>>>> know about the available in the network hosts.
>>>>>>>>>>>>
>>>>>>>>>>>> You can set the hostfile parameter in your environment instead of 
>>>>>>>>>>>> on the
>>>>>>>>>>>> command line. Just set OMPI_MCA_rds_hostfile_path = my.hosts.
>>>>>>>>>>>>
>>>>>>>>>>>> You can then just run ./my_master.exe on the host where you want 
>>>>>>>>>>>> the master
>>>>>>>>>>>> to reside - everything should work the same.
>>>>>>>>>>>>
>>>>>>>>>>>> Just as an FYI: the name of that environmental variable is going 
>>>>>>>>>>>> to change
>>>>>>>>>>>> in the 1.3 release, but everything will still work the same.
>>>>>>>>>>>>
>>>>>>>>>>>> Hope that helps
>>>>>>>>>>>> Ralph
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks and regards,
>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov]
>>>>>>>>>>>>> Sent: Monday, December 17, 2007 5:49 PM
>>>>>>>>>>>>> To: Open MPI Users <us...@open-mpi.org>; Elena Zhebel
>>>>>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster 
>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 12/17/07 8:19 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello Ralph,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you for your answer.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm using OpenMPI 1.2.3. , compiler glibc232, Linux Suse 10.0.
>>>>>>>>>>>>>> My "master" executable runs only on the one local host, then it 
>>>>>>>>>>>>>> spawns
>>>>>>>>>>>>>> "slaves" (with MPI::Intracomm::Spawn).
>>>>>>>>>>>>>> My question was: how to determine the hosts where these "slaves" 
>>>>>>>>>>>>>> will be
>>>>>>>>>>>>>> spawned?
>>>>>>>>>>>>>> You said: "You have to specify all of the hosts that can be used 
>>>>>>>>>>>>>> by
>>>>>>>>>>>>>> your job
>>>>>>>>>>>>>> in the original hostfile". How can I specify the host file? I 
>>>>>>>>>>>>>> can not
>>>>>>>>>>>>>> find it
>>>>>>>>>>>>>> in the documentation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hmmm...sorry about the lack of documentation. I always assumed 
>>>>>>>>>>>>> that the MPI
>>>>>>>>>>>>> folks in the project would document such things since it has 
>>>>>>>>>>>>> little to do
>>>>>>>>>>>>> with the underlying run-time, but I guess that fell through the 
>>>>>>>>>>>>> cracks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are two parts to your question:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. how to specify the hosts to be used for the entire job. I 
>>>>>>>>>>>>> believe that
>>>>>>>>>>>> is
>>>>>>>>>>>>> somewhat covered here:
>>>>>>>>>>>>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run
>>>>>>>>>>>>>
>>>>>>>>>>>>> That FAQ tells you what a hostfile should look like, though you 
>>>>>>>>>>>>> may already
>>>>>>>>>>>>> know that. Basically, we require that you list -all- of the nodes 
>>>>>>>>>>>>> that both
>>>>>>>>>>>>> your master and slave programs will use.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. how to specify which nodes are available for the master, and 
>>>>>>>>>>>>> which for
>>>>>>>>>>>>> the slave.
>>>>>>>>>>>>>
>>>>>>>>>>>>> You would specify the host for your master on the mpirun command 
>>>>>>>>>>>>> line with
>>>>>>>>>>>>> something like:
>>>>>>>>>>>>>
>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host 
>>>>>>>>>>>>> my_master.exe
>>>>>>>>>>>>>
>>>>>>>>>>>>> This directs Open MPI to map that specified executable on the 
>>>>>>>>>>>>> specified
>>>>>>>>>>>> host
>>>>>>>>>>>>> - note that my_master_host must have been in my_hostfile.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Inside your master, you would create an MPI_Info key "host" that 
>>>>>>>>>>>>> has a
>>>>>>>>>>>> value
>>>>>>>>>>>>> consisting of a string "host1,host2,host3" identifying the hosts 
>>>>>>>>>>>>> you want
>>>>>>>>>>>>> your slave to execute upon. Those hosts must have been included in
>>>>>>>>>>>>> my_hostfile. Include that key in the MPI_Info array passed to 
>>>>>>>>>>>>> your Spawn.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We don't currently support providing a hostfile for the slaves 
>>>>>>>>>>>>> (as opposed
>>>>>>>>>>>>> to the host-at-a-time string above). This may become available in 
>>>>>>>>>>>>> a future
>>>>>>>>>>>>> release - TBD.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hope that helps
>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks and regards,
>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: users-boun...@open-mpi.org 
>>>>>>>>>>>>>> [mailto:users-boun...@open-mpi.org] On
>>>>>>>>>>>>>> Behalf Of Ralph H Castain
>>>>>>>>>>>>>> Sent: Monday, December 17, 2007 3:31 PM
>>>>>>>>>>>>>> To: Open MPI Users <us...@open-mpi.org>
>>>>>>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster
>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 12/12/07 5:46 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm working on a MPI application where I'm using OpenMPI 
>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>> MPICH.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In my "master" program I call the function 
>>>>>>>>>>>>>>> MPI::Intracomm::Spawn which
>>>>>>>>>>>>>> spawns
>>>>>>>>>>>>>>> "slave" processes. It is not clear for me how to spawn the 
>>>>>>>>>>>>>>> "slave"
>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>> over the network. Currently "master" creates "slaves" on the 
>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>> host.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If I use 'mpirun --hostfile openmpi.hosts' then processes are 
>>>>>>>>>>>>>>> spawn
>>>>>>>>>>>>>>> over
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> network as expected. But now I need to spawn processes over the
>>>>>>>>>>>>>>> network
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> my own executable using MPI::Intracomm::Spawn, how can I 
>>>>>>>>>>>>>>> achieve it?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm not sure from your description exactly what you are trying 
>>>>>>>>>>>>>> to do,
>>>>>>>>>>>>>> nor in
>>>>>>>>>>>>>> what environment this is all operating within or what version of 
>>>>>>>>>>>>>> Open
>>>>>>>>>>>>>> MPI
>>>>>>>>>>>>>> you are using. Setting aside the environment and version issue, 
>>>>>>>>>>>>>> I'm
>>>>>>>>>>>>>> guessing
>>>>>>>>>>>>>> that you are running your executable over some specified set of 
>>>>>>>>>>>>>> hosts,
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>> want to provide a different hostfile that specifies the hosts to 
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> used for
>>>>>>>>>>>>>> the "slave" processes. Correct?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If that is correct, then I'm afraid you can't do that in any 
>>>>>>>>>>>>>> version
>>>>>>>>>>>>>> of Open
>>>>>>>>>>>>>> MPI today. You have to specify all of the hosts that can be used 
>>>>>>>>>>>>>> by
>>>>>>>>>>>>>> your job
>>>>>>>>>>>>>> in the original hostfile. You can then specify a subset of those 
>>>>>>>>>>>>>> hosts
>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>> used by your original "master" program, and then specify a 
>>>>>>>>>>>>>> different
>>>>>>>>>>>>>> subset
>>>>>>>>>>>>>> to be used by the "slaves" when calling Spawn.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But the system requires that you tell it -all- of the hosts that 
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>> going
>>>>>>>>>>>>>> to be used at the beginning of the job.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At the moment, there is no plan to remove that requirement, 
>>>>>>>>>>>>>> though
>>>>>>>>>>>>>> there has
>>>>>>>>>>>>>> been occasional discussion about doing so at some point in the 
>>>>>>>>>>>>>> future.
>>>>>>>>>>>>>> No
>>>>>>>>>>>>>> promises that it will happen, though - managed environments, in
>>>>>>>>>>>>>> particular,
>>>>>>>>>>>>>> currently object to the idea of changing the allocation 
>>>>>>>>>>>>>> on-the-fly. We
>>>>>>>>>>>>>> may,
>>>>>>>>>>>>>> though, make a provision for purely hostfile-based environments 
>>>>>>>>>>>>>> (i.e.,
>>>>>>>>>>>>>> unmanaged) at some time in the future.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks in advance for any help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to