In the event that I need to get this up-and-running soon (I do need
something working within 2 weeks), can you recommend an older version
where this is expected to work?

Thanks,
  Brian

On Tue, Aug 28, 2012 at 4:58 PM, Brian Budge <brian.bu...@gmail.com> wrote:
> Thanks!
>
> On Tue, Aug 28, 2012 at 4:57 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> Yeah, I'm seeing the hang as well when running across multiple machines. Let 
>> me dig a little and get this fixed.
>>
>> Thanks
>> Ralph
>>
>> On Aug 28, 2012, at 4:51 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>>
>>> Hmmm, I went to the build directories of openmpi for my two machines,
>>> went into the orte/test/mpi directory and made the executables on both
>>> machines.  I set the hostsfile in the env variable on the "master"
>>> machine.
>>>
>>> Here's the output:
>>>
>>> OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
>>> ./simple_spawn
>>> Parent [pid 97504] starting up!
>>> 0 completed MPI_Init
>>> Parent [pid 97504] about to spawn!
>>> Parent [pid 97507] starting up!
>>> Parent [pid 97508] starting up!
>>> Parent [pid 30626] starting up!
>>> ^C
>>> zsh: interrupt  OMPI_MCA_orte_default_hostfile= ./simple_spawn
>>>
>>> I had to ^C to kill the hung process.
>>>
>>> When I run using mpirun:
>>>
>>> OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
>>> mpirun -np 1 ./simple_spawn
>>> Parent [pid 97511] starting up!
>>> 0 completed MPI_Init
>>> Parent [pid 97511] about to spawn!
>>> Parent [pid 97513] starting up!
>>> Parent [pid 30762] starting up!
>>> Parent [pid 30764] starting up!
>>> Parent done with spawn
>>> Parent sending message to child
>>> 1 completed MPI_Init
>>> Hello from the child 1 of 3 on host budgeb-sandybridge pid 97513
>>> 0 completed MPI_Init
>>> Hello from the child 0 of 3 on host budgeb-interlagos pid 30762
>>> 2 completed MPI_Init
>>> Hello from the child 2 of 3 on host budgeb-interlagos pid 30764
>>> Child 1 disconnected
>>> Child 0 received msg: 38
>>> Child 0 disconnected
>>> Parent disconnected
>>> Child 2 disconnected
>>> 97511: exiting
>>> 97513: exiting
>>> 30762: exiting
>>> 30764: exiting
>>>
>>> As you can see, I'm using openmpi v 1.6.1.  I just barely freshly
>>> installed on both machines using the default configure options.
>>>
>>> Thanks for all your help.
>>>
>>>  Brian
>>>
>>> On Tue, Aug 28, 2012 at 4:39 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>> Looks to me like it didn't find your executable - could be a question of 
>>>> where it exists relative to where you are running. If you look in your 
>>>> OMPI source tree at the orte/test/mpi directory, you'll see an example 
>>>> program "simple_spawn.c" there. Just "make simple_spawn" and execute that 
>>>> with your default hostfile set - does it work okay?
>>>>
>>>> It works fine for me, hence the question.
>>>>
>>>> Also, what OMPI version are you using?
>>>>
>>>> On Aug 28, 2012, at 4:25 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>>>>
>>>>> I see.  Okay.  So, I just tried removing the check for universe size,
>>>>> and set the universe size to 2.  Here's my output:
>>>>>
>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
>>>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
>>>>> [budgeb-interlagos:29965] [[4156,0],0] ORTE_ERROR_LOG: Fatal in file
>>>>> base/plm_base_receive.c at line 253
>>>>> [budgeb-interlagos:29963] [[4156,1],0] ORTE_ERROR_LOG: The specified
>>>>> application failed to start in file dpm_orte.c at line 785
>>>>>
>>>>> The corresponding run with mpirun still works.
>>>>>
>>>>> Thanks,
>>>>> Brian
>>>>>
>>>>> On Tue, Aug 28, 2012 at 2:46 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>> I see the issue - it's here:
>>>>>>
>>>>>>> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, &flag);
>>>>>>>
>>>>>>> if(!flag) {
>>>>>>>     std::cerr << "no universe size" << std::endl;
>>>>>>>     return -1;
>>>>>>> }
>>>>>>> universeSize = *puniverseSize;
>>>>>>> if(universeSize == 1) {
>>>>>>>     std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>>>>>>> }
>>>>>>
>>>>>> The universe size is set to 1 on a singleton because the attribute gets 
>>>>>> set at the beginning of time - we haven't any way to go back and change 
>>>>>> it. The sequence of events explains why. The singleton starts up and 
>>>>>> sets its attributes, including universe_size. It also spins off an orte 
>>>>>> daemon to act as its own private "mpirun" in case you call comm_spawn. 
>>>>>> At this point, however, no hostfile has been read - the singleton is 
>>>>>> just an MPI proc doing its own thing, and the orte daemon is just 
>>>>>> sitting there on "stand-by".
>>>>>>
>>>>>> When your app calls comm_spawn, then the orte daemon gets called to 
>>>>>> launch the new procs. At that time, it (not the original singleton!) 
>>>>>> reads the hostfile to find out how many nodes are around, and then does 
>>>>>> the launch.
>>>>>>
>>>>>> You are trying to check the number of nodes from within the singleton, 
>>>>>> which won't work - it has no way of discovering that info.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Aug 28, 2012, at 2:38 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>>>>>>
>>>>>>>> echo hostsfile
>>>>>>> localhost
>>>>>>> budgeb-sandybridge
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Brian
>>>>>>>
>>>>>>> On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>> wrote:
>>>>>>>> Hmmm...what is in your "hostsfile"?
>>>>>>>>
>>>>>>>> On Aug 28, 2012, at 2:33 PM, Brian Budge <brian.bu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Ralph -
>>>>>>>>>
>>>>>>>>> Thanks for confirming this is possible.  I'm trying this and currently
>>>>>>>>> failing.  Perhaps there's something I'm missing in the code to make
>>>>>>>>> this work.  Here are the two instantiations and their outputs:
>>>>>>>>>
>>>>>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>>>>>>>>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
>>>>>>>>> cannot start slaves... not enough nodes
>>>>>>>>>
>>>>>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>>>>>>>>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 
>>>>>>>>>> ./master_exe
>>>>>>>>> master spawned 1 slaves...
>>>>>>>>> slave responding...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The code:
>>>>>>>>>
>>>>>>>>> //master.cpp
>>>>>>>>> #include <mpi.h>
>>>>>>>>> #include <boost/filesystem.hpp>
>>>>>>>>> #include <iostream>
>>>>>>>>>
>>>>>>>>> int main(int argc, char **args) {
>>>>>>>>> int worldSize, universeSize, *puniverseSize, flag;
>>>>>>>>>
>>>>>>>>> MPI_Comm everyone; //intercomm
>>>>>>>>> boost::filesystem::path curPath =
>>>>>>>>> boost::filesystem::absolute(boost::filesystem::current_path());
>>>>>>>>>
>>>>>>>>> std::string toRun = (curPath / "slave_exe").string();
>>>>>>>>>
>>>>>>>>> int ret = MPI_Init(&argc, &args);
>>>>>>>>>
>>>>>>>>> if(ret != MPI_SUCCESS) {
>>>>>>>>>     std::cerr << "failed init" << std::endl;
>>>>>>>>>     return -1;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> MPI_Comm_size(MPI_COMM_WORLD, &worldSize);
>>>>>>>>>
>>>>>>>>> if(worldSize != 1) {
>>>>>>>>>     std::cerr << "too many masters" << std::endl;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, 
>>>>>>>>> &flag);
>>>>>>>>>
>>>>>>>>> if(!flag) {
>>>>>>>>>     std::cerr << "no universe size" << std::endl;
>>>>>>>>>     return -1;
>>>>>>>>> }
>>>>>>>>> universeSize = *puniverseSize;
>>>>>>>>> if(universeSize == 1) {
>>>>>>>>>     std::cerr << "cannot start slaves... not enough nodes" << 
>>>>>>>>> std::endl;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> char *buf = (char*)alloca(toRun.size() + 1);
>>>>>>>>> memcpy(buf, toRun.c_str(), toRun.size());
>>>>>>>>> buf[toRun.size()] = '\0';
>>>>>>>>>
>>>>>>>>> MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL,
>>>>>>>>> 0, MPI_COMM_SELF, &everyone,
>>>>>>>>>                MPI_ERRCODES_IGNORE);
>>>>>>>>>
>>>>>>>>> std::cerr << "master spawned " << universeSize-1 << " slaves..."
>>>>>>>>> << std::endl;
>>>>>>>>>
>>>>>>>>> MPI_Finalize();
>>>>>>>>>
>>>>>>>>> return 0;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> //slave.cpp
>>>>>>>>> #include <mpi.h>
>>>>>>>>>
>>>>>>>>> int main(int argc, char **args) {
>>>>>>>>> int size;
>>>>>>>>> MPI_Comm parent;
>>>>>>>>> MPI_Init(&argc, &args);
>>>>>>>>>
>>>>>>>>> MPI_Comm_get_parent(&parent);
>>>>>>>>>
>>>>>>>>> if(parent == MPI_COMM_NULL) {
>>>>>>>>>     std::cerr << "slave has no parent" << std::endl;
>>>>>>>>> }
>>>>>>>>> MPI_Comm_remote_size(parent, &size);
>>>>>>>>> if(size != 1) {
>>>>>>>>>     std::cerr << "parent size is " << size << std::endl;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> std::cerr << "slave responding..." << std::endl;
>>>>>>>>>
>>>>>>>>> MPI_Finalize();
>>>>>>>>>
>>>>>>>>> return 0;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any ideas?  Thanks for any help.
>>>>>>>>>
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>> On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain <r...@open-mpi.org> 
>>>>>>>>> wrote:
>>>>>>>>>> It really is just that simple :-)
>>>>>>>>>>
>>>>>>>>>> On Aug 22, 2012, at 8:56 AM, Brian Budge <brian.bu...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Okay.  Is there a tutorial or FAQ for setting everything up?  Or is 
>>>>>>>>>>> it
>>>>>>>>>>> really just that simple?  I don't need to run a copy of the orte
>>>>>>>>>>> server somewhere?
>>>>>>>>>>>
>>>>>>>>>>> if my current ip is 192.168.0.1,
>>>>>>>>>>>
>>>>>>>>>>> 0 > echo 192.168.0.11 > /tmp/hostfile
>>>>>>>>>>> 1 > echo 192.168.0.12 >> /tmp/hostfile
>>>>>>>>>>> 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile
>>>>>>>>>>> 3 > ./mySpawningExe
>>>>>>>>>>>
>>>>>>>>>>> At this point, mySpawningExe will be the master, running on
>>>>>>>>>>> 192.168.0.1, and I can have spawned, for example, childExe on
>>>>>>>>>>> 192.168.0.11 and 192.168.0.12?  Or childExe1 on 192.168.0.11 and
>>>>>>>>>>> childExe2 on 192.168.0.12?
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the help.
>>>>>>>>>>>
>>>>>>>>>>> Brian
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain <r...@open-mpi.org> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Sure, that's still true on all 1.3 or above releases. All you need 
>>>>>>>>>>>> to do is set the hostfile envar so we pick it up:
>>>>>>>>>>>>
>>>>>>>>>>>> OMPI_MCA_orte_default_hostfile=<foo>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Aug 21, 2012, at 7:23 PM, Brian Budge <brian.bu...@gmail.com> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi.  I know this is an old thread, but I'm curious if there are 
>>>>>>>>>>>>> any
>>>>>>>>>>>>> tutorials describing how to set this up?  Is this still available 
>>>>>>>>>>>>> on
>>>>>>>>>>>>> newer open mpi versions?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Brian
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain <r...@lanl.gov> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi Elena
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm copying this to the user list just to correct a 
>>>>>>>>>>>>>> mis-statement on my part
>>>>>>>>>>>>>> in an earlier message that went there. I had stated that a 
>>>>>>>>>>>>>> singleton could
>>>>>>>>>>>>>> comm_spawn onto other nodes listed in a hostfile by setting an 
>>>>>>>>>>>>>> environmental
>>>>>>>>>>>>>> variable that pointed us to the hostfile.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is incorrect in the 1.2 code series. That series does not 
>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>> singletons to read a hostfile at all. Hence, any comm_spawn done 
>>>>>>>>>>>>>> by a
>>>>>>>>>>>>>> singleton can only launch child processes on the singleton's 
>>>>>>>>>>>>>> local host.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This situation has been corrected for the upcoming 1.3 code 
>>>>>>>>>>>>>> series. For the
>>>>>>>>>>>>>> 1.2 series, though, you will have to do it via an mpirun command 
>>>>>>>>>>>>>> line.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sorry for the confusion - I sometimes have too many code 
>>>>>>>>>>>>>> families to keep
>>>>>>>>>>>>>> straight in this old mind!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/4/08 5:10 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello Ralph,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you very much for the explanations.
>>>>>>>>>>>>>>> But I still do not get it running...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For the case
>>>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host 
>>>>>>>>>>>>>>> my_master.exe
>>>>>>>>>>>>>>> everything works.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For the case
>>>>>>>>>>>>>>> ./my_master.exe
>>>>>>>>>>>>>>> it does not.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I did:
>>>>>>>>>>>>>>> - create my_hostfile and put it in the 
>>>>>>>>>>>>>>> $HOME/.openmpi/components/
>>>>>>>>>>>>>>> my_hostfile :
>>>>>>>>>>>>>>> bollenstreek slots=2 max_slots=3
>>>>>>>>>>>>>>> octocore01 slots=8  max_slots=8
>>>>>>>>>>>>>>> octocore02 slots=8  max_slots=8
>>>>>>>>>>>>>>> clstr000 slots=2 max_slots=3
>>>>>>>>>>>>>>> clstr001 slots=2 max_slots=3
>>>>>>>>>>>>>>> clstr002 slots=2 max_slots=3
>>>>>>>>>>>>>>> clstr003 slots=2 max_slots=3
>>>>>>>>>>>>>>> clstr004 slots=2 max_slots=3
>>>>>>>>>>>>>>> clstr005 slots=2 max_slots=3
>>>>>>>>>>>>>>> clstr006 slots=2 max_slots=3
>>>>>>>>>>>>>>> clstr007 slots=2 max_slots=3
>>>>>>>>>>>>>>> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I  put it in 
>>>>>>>>>>>>>>> .tcshrc and
>>>>>>>>>>>>>>> then source .tcshrc)
>>>>>>>>>>>>>>> - in my_master.cpp I did
>>>>>>>>>>>>>>> MPI_Info info1;
>>>>>>>>>>>>>>> MPI_Info_create(&info1);
>>>>>>>>>>>>>>> char* hostname =
>>>>>>>>>>>>>>> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02";
>>>>>>>>>>>>>>> MPI_Info_set(info1, "host", hostname);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, 
>>>>>>>>>>>>>>> info1, 0,
>>>>>>>>>>>>>>> MPI_ERRCODES_IGNORE);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - After I call the executable, I've got this error message
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> bollenstreek: > ./my_master
>>>>>>>>>>>>>>> number of processes to run: 1
>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>> Some of the requested hosts are not included in the current 
>>>>>>>>>>>>>>> allocation for
>>>>>>>>>>>>>>> the application:
>>>>>>>>>>>>>>> ./childexe
>>>>>>>>>>>>>>> The requested hosts were:
>>>>>>>>>>>>>>> clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Verify that you have mapped the allocated resources properly 
>>>>>>>>>>>>>>> using the
>>>>>>>>>>>>>>> --host specification.
>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in 
>>>>>>>>>>>>>>> file
>>>>>>>>>>>>>>> base/rmaps_base_support_fns.c at line 225
>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in 
>>>>>>>>>>>>>>> file
>>>>>>>>>>>>>>> rmaps_rr.c at line 478
>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in 
>>>>>>>>>>>>>>> file
>>>>>>>>>>>>>>> base/rmaps_base_map_job.c at line 210
>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in 
>>>>>>>>>>>>>>> file
>>>>>>>>>>>>>>> rmgr_urm.c at line 372
>>>>>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in 
>>>>>>>>>>>>>>> file
>>>>>>>>>>>>>>> communicator/comm_dyn.c at line 608
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Did I miss something?
>>>>>>>>>>>>>>> Thanks for help!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov]
>>>>>>>>>>>>>>> Sent: Tuesday, December 18, 2007 3:50 PM
>>>>>>>>>>>>>>> To: Elena Zhebel; Open MPI Users <us...@open-mpi.org>
>>>>>>>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster 
>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 12/18/07 7:35 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks a lot! Now it works!
>>>>>>>>>>>>>>>> The solution is to use mpirun -n 1 -hostfile my.hosts *.exe 
>>>>>>>>>>>>>>>> and pass
>>>>>>>>>>>>>>> MPI_Info
>>>>>>>>>>>>>>>> Key to the Spawn function!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One more question: is it necessary to start my "master" 
>>>>>>>>>>>>>>>> program with
>>>>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host 
>>>>>>>>>>>>>>>> my_master.exe ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No, it isn't necessary - assuming that my_master_host is the 
>>>>>>>>>>>>>>> first host
>>>>>>>>>>>>>>> listed in your hostfile! If you are only executing one 
>>>>>>>>>>>>>>> my_master.exe (i.e.,
>>>>>>>>>>>>>>> you gave -n 1 to mpirun), then we will automatically map that 
>>>>>>>>>>>>>>> process onto
>>>>>>>>>>>>>>> the first host in your hostfile.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you want my_master.exe to go on someone other than the first 
>>>>>>>>>>>>>>> host in the
>>>>>>>>>>>>>>> file, then you have to give us the -host option.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Are there other possibilities for easy start?
>>>>>>>>>>>>>>>> I would say just to run ./my_master.exe , but then the master 
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>>>>> know about the available in the network hosts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You can set the hostfile parameter in your environment instead 
>>>>>>>>>>>>>>> of on the
>>>>>>>>>>>>>>> command line. Just set OMPI_MCA_rds_hostfile_path = my.hosts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You can then just run ./my_master.exe on the host where you 
>>>>>>>>>>>>>>> want the master
>>>>>>>>>>>>>>> to reside - everything should work the same.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Just as an FYI: the name of that environmental variable is 
>>>>>>>>>>>>>>> going to change
>>>>>>>>>>>>>>> in the 1.3 release, but everything will still work the same.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hope that helps
>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks and regards,
>>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov]
>>>>>>>>>>>>>>>> Sent: Monday, December 17, 2007 5:49 PM
>>>>>>>>>>>>>>>> To: Open MPI Users <us...@open-mpi.org>; Elena Zhebel
>>>>>>>>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster 
>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 12/17/07 8:19 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello Ralph,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank you for your answer.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm using OpenMPI 1.2.3. , compiler glibc232, Linux Suse 10.0.
>>>>>>>>>>>>>>>>> My "master" executable runs only on the one local host, then 
>>>>>>>>>>>>>>>>> it spawns
>>>>>>>>>>>>>>>>> "slaves" (with MPI::Intracomm::Spawn).
>>>>>>>>>>>>>>>>> My question was: how to determine the hosts where these 
>>>>>>>>>>>>>>>>> "slaves" will be
>>>>>>>>>>>>>>>>> spawned?
>>>>>>>>>>>>>>>>> You said: "You have to specify all of the hosts that can be 
>>>>>>>>>>>>>>>>> used by
>>>>>>>>>>>>>>>>> your job
>>>>>>>>>>>>>>>>> in the original hostfile". How can I specify the host file? I 
>>>>>>>>>>>>>>>>> can not
>>>>>>>>>>>>>>>>> find it
>>>>>>>>>>>>>>>>> in the documentation.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hmmm...sorry about the lack of documentation. I always assumed 
>>>>>>>>>>>>>>>> that the MPI
>>>>>>>>>>>>>>>> folks in the project would document such things since it has 
>>>>>>>>>>>>>>>> little to do
>>>>>>>>>>>>>>>> with the underlying run-time, but I guess that fell through 
>>>>>>>>>>>>>>>> the cracks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There are two parts to your question:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. how to specify the hosts to be used for the entire job. I 
>>>>>>>>>>>>>>>> believe that
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> somewhat covered here:
>>>>>>>>>>>>>>>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That FAQ tells you what a hostfile should look like, though 
>>>>>>>>>>>>>>>> you may already
>>>>>>>>>>>>>>>> know that. Basically, we require that you list -all- of the 
>>>>>>>>>>>>>>>> nodes that both
>>>>>>>>>>>>>>>> your master and slave programs will use.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. how to specify which nodes are available for the master, 
>>>>>>>>>>>>>>>> and which for
>>>>>>>>>>>>>>>> the slave.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You would specify the host for your master on the mpirun 
>>>>>>>>>>>>>>>> command line with
>>>>>>>>>>>>>>>> something like:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host 
>>>>>>>>>>>>>>>> my_master.exe
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This directs Open MPI to map that specified executable on the 
>>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>> host
>>>>>>>>>>>>>>>> - note that my_master_host must have been in my_hostfile.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Inside your master, you would create an MPI_Info key "host" 
>>>>>>>>>>>>>>>> that has a
>>>>>>>>>>>>>>> value
>>>>>>>>>>>>>>>> consisting of a string "host1,host2,host3" identifying the 
>>>>>>>>>>>>>>>> hosts you want
>>>>>>>>>>>>>>>> your slave to execute upon. Those hosts must have been 
>>>>>>>>>>>>>>>> included in
>>>>>>>>>>>>>>>> my_hostfile. Include that key in the MPI_Info array passed to 
>>>>>>>>>>>>>>>> your Spawn.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We don't currently support providing a hostfile for the slaves 
>>>>>>>>>>>>>>>> (as opposed
>>>>>>>>>>>>>>>> to the host-at-a-time string above). This may become available 
>>>>>>>>>>>>>>>> in a future
>>>>>>>>>>>>>>>> release - TBD.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hope that helps
>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks and regards,
>>>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>> From: users-boun...@open-mpi.org 
>>>>>>>>>>>>>>>>> [mailto:users-boun...@open-mpi.org] On
>>>>>>>>>>>>>>>>> Behalf Of Ralph H Castain
>>>>>>>>>>>>>>>>> Sent: Monday, December 17, 2007 3:31 PM
>>>>>>>>>>>>>>>>> To: Open MPI Users <us...@open-mpi.org>
>>>>>>>>>>>>>>>>> Cc: Ralph H Castain
>>>>>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster
>>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 12/12/07 5:46 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm working on a MPI application where I'm using OpenMPI 
>>>>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>>>>> MPICH.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In my "master" program I call the function 
>>>>>>>>>>>>>>>>>> MPI::Intracomm::Spawn which
>>>>>>>>>>>>>>>>> spawns
>>>>>>>>>>>>>>>>>> "slave" processes. It is not clear for me how to spawn the 
>>>>>>>>>>>>>>>>>> "slave"
>>>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>>>>> over the network. Currently "master" creates "slaves" on the 
>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>> host.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If I use 'mpirun --hostfile openmpi.hosts' then processes 
>>>>>>>>>>>>>>>>>> are spawn
>>>>>>>>>>>>>>>>>> over
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> network as expected. But now I need to spawn processes over 
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> network
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>> my own executable using MPI::Intracomm::Spawn, how can I 
>>>>>>>>>>>>>>>>>> achieve it?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm not sure from your description exactly what you are 
>>>>>>>>>>>>>>>>> trying to do,
>>>>>>>>>>>>>>>>> nor in
>>>>>>>>>>>>>>>>> what environment this is all operating within or what version 
>>>>>>>>>>>>>>>>> of Open
>>>>>>>>>>>>>>>>> MPI
>>>>>>>>>>>>>>>>> you are using. Setting aside the environment and version 
>>>>>>>>>>>>>>>>> issue, I'm
>>>>>>>>>>>>>>>>> guessing
>>>>>>>>>>>>>>>>> that you are running your executable over some specified set 
>>>>>>>>>>>>>>>>> of hosts,
>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> want to provide a different hostfile that specifies the hosts 
>>>>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>>>>> used for
>>>>>>>>>>>>>>>>> the "slave" processes. Correct?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If that is correct, then I'm afraid you can't do that in any 
>>>>>>>>>>>>>>>>> version
>>>>>>>>>>>>>>>>> of Open
>>>>>>>>>>>>>>>>> MPI today. You have to specify all of the hosts that can be 
>>>>>>>>>>>>>>>>> used by
>>>>>>>>>>>>>>>>> your job
>>>>>>>>>>>>>>>>> in the original hostfile. You can then specify a subset of 
>>>>>>>>>>>>>>>>> those hosts
>>>>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>>>>> used by your original "master" program, and then specify a 
>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>> subset
>>>>>>>>>>>>>>>>> to be used by the "slaves" when calling Spawn.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> But the system requires that you tell it -all- of the hosts 
>>>>>>>>>>>>>>>>> that are
>>>>>>>>>>>>>>>>> going
>>>>>>>>>>>>>>>>> to be used at the beginning of the job.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> At the moment, there is no plan to remove that requirement, 
>>>>>>>>>>>>>>>>> though
>>>>>>>>>>>>>>>>> there has
>>>>>>>>>>>>>>>>> been occasional discussion about doing so at some point in 
>>>>>>>>>>>>>>>>> the future.
>>>>>>>>>>>>>>>>> No
>>>>>>>>>>>>>>>>> promises that it will happen, though - managed environments, 
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> particular,
>>>>>>>>>>>>>>>>> currently object to the idea of changing the allocation 
>>>>>>>>>>>>>>>>> on-the-fly. We
>>>>>>>>>>>>>>>>> may,
>>>>>>>>>>>>>>>>> though, make a provision for purely hostfile-based 
>>>>>>>>>>>>>>>>> environments (i.e.,
>>>>>>>>>>>>>>>>> unmanaged) at some time in the future.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks in advance for any help.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Elena
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to