Hmmm, I went to the build directories of openmpi for my two machines, went into the orte/test/mpi directory and made the executables on both machines. I set the hostsfile in the env variable on the "master" machine.
Here's the output: OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile ./simple_spawn Parent [pid 97504] starting up! 0 completed MPI_Init Parent [pid 97504] about to spawn! Parent [pid 97507] starting up! Parent [pid 97508] starting up! Parent [pid 30626] starting up! ^C zsh: interrupt OMPI_MCA_orte_default_hostfile= ./simple_spawn I had to ^C to kill the hung process. When I run using mpirun: OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile mpirun -np 1 ./simple_spawn Parent [pid 97511] starting up! 0 completed MPI_Init Parent [pid 97511] about to spawn! Parent [pid 97513] starting up! Parent [pid 30762] starting up! Parent [pid 30764] starting up! Parent done with spawn Parent sending message to child 1 completed MPI_Init Hello from the child 1 of 3 on host budgeb-sandybridge pid 97513 0 completed MPI_Init Hello from the child 0 of 3 on host budgeb-interlagos pid 30762 2 completed MPI_Init Hello from the child 2 of 3 on host budgeb-interlagos pid 30764 Child 1 disconnected Child 0 received msg: 38 Child 0 disconnected Parent disconnected Child 2 disconnected 97511: exiting 97513: exiting 30762: exiting 30764: exiting As you can see, I'm using openmpi v 1.6.1. I just barely freshly installed on both machines using the default configure options. Thanks for all your help. Brian On Tue, Aug 28, 2012 at 4:39 PM, Ralph Castain <r...@open-mpi.org> wrote: > Looks to me like it didn't find your executable - could be a question of > where it exists relative to where you are running. If you look in your OMPI > source tree at the orte/test/mpi directory, you'll see an example program > "simple_spawn.c" there. Just "make simple_spawn" and execute that with your > default hostfile set - does it work okay? > > It works fine for me, hence the question. > > Also, what OMPI version are you using? > > On Aug 28, 2012, at 4:25 PM, Brian Budge <brian.bu...@gmail.com> wrote: > >> I see. Okay. So, I just tried removing the check for universe size, >> and set the universe size to 2. Here's my output: >> >> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib >> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe >> [budgeb-interlagos:29965] [[4156,0],0] ORTE_ERROR_LOG: Fatal in file >> base/plm_base_receive.c at line 253 >> [budgeb-interlagos:29963] [[4156,1],0] ORTE_ERROR_LOG: The specified >> application failed to start in file dpm_orte.c at line 785 >> >> The corresponding run with mpirun still works. >> >> Thanks, >> Brian >> >> On Tue, Aug 28, 2012 at 2:46 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> I see the issue - it's here: >>> >>>> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, &flag); >>>> >>>> if(!flag) { >>>> std::cerr << "no universe size" << std::endl; >>>> return -1; >>>> } >>>> universeSize = *puniverseSize; >>>> if(universeSize == 1) { >>>> std::cerr << "cannot start slaves... not enough nodes" << std::endl; >>>> } >>> >>> The universe size is set to 1 on a singleton because the attribute gets set >>> at the beginning of time - we haven't any way to go back and change it. The >>> sequence of events explains why. The singleton starts up and sets its >>> attributes, including universe_size. It also spins off an orte daemon to >>> act as its own private "mpirun" in case you call comm_spawn. At this point, >>> however, no hostfile has been read - the singleton is just an MPI proc >>> doing its own thing, and the orte daemon is just sitting there on >>> "stand-by". >>> >>> When your app calls comm_spawn, then the orte daemon gets called to launch >>> the new procs. At that time, it (not the original singleton!) reads the >>> hostfile to find out how many nodes are around, and then does the launch. >>> >>> You are trying to check the number of nodes from within the singleton, >>> which won't work - it has no way of discovering that info. >>> >>> >>> >>> >>> On Aug 28, 2012, at 2:38 PM, Brian Budge <brian.bu...@gmail.com> wrote: >>> >>>>> echo hostsfile >>>> localhost >>>> budgeb-sandybridge >>>> >>>> Thanks, >>>> Brian >>>> >>>> On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> Hmmm...what is in your "hostsfile"? >>>>> >>>>> On Aug 28, 2012, at 2:33 PM, Brian Budge <brian.bu...@gmail.com> wrote: >>>>> >>>>>> Hi Ralph - >>>>>> >>>>>> Thanks for confirming this is possible. I'm trying this and currently >>>>>> failing. Perhaps there's something I'm missing in the code to make >>>>>> this work. Here are the two instantiations and their outputs: >>>>>> >>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib >>>>>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe >>>>>> cannot start slaves... not enough nodes >>>>>> >>>>>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib >>>>>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 ./master_exe >>>>>> master spawned 1 slaves... >>>>>> slave responding... >>>>>> >>>>>> >>>>>> The code: >>>>>> >>>>>> //master.cpp >>>>>> #include <mpi.h> >>>>>> #include <boost/filesystem.hpp> >>>>>> #include <iostream> >>>>>> >>>>>> int main(int argc, char **args) { >>>>>> int worldSize, universeSize, *puniverseSize, flag; >>>>>> >>>>>> MPI_Comm everyone; //intercomm >>>>>> boost::filesystem::path curPath = >>>>>> boost::filesystem::absolute(boost::filesystem::current_path()); >>>>>> >>>>>> std::string toRun = (curPath / "slave_exe").string(); >>>>>> >>>>>> int ret = MPI_Init(&argc, &args); >>>>>> >>>>>> if(ret != MPI_SUCCESS) { >>>>>> std::cerr << "failed init" << std::endl; >>>>>> return -1; >>>>>> } >>>>>> >>>>>> MPI_Comm_size(MPI_COMM_WORLD, &worldSize); >>>>>> >>>>>> if(worldSize != 1) { >>>>>> std::cerr << "too many masters" << std::endl; >>>>>> } >>>>>> >>>>>> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &puniverseSize, &flag); >>>>>> >>>>>> if(!flag) { >>>>>> std::cerr << "no universe size" << std::endl; >>>>>> return -1; >>>>>> } >>>>>> universeSize = *puniverseSize; >>>>>> if(universeSize == 1) { >>>>>> std::cerr << "cannot start slaves... not enough nodes" << std::endl; >>>>>> } >>>>>> >>>>>> >>>>>> char *buf = (char*)alloca(toRun.size() + 1); >>>>>> memcpy(buf, toRun.c_str(), toRun.size()); >>>>>> buf[toRun.size()] = '\0'; >>>>>> >>>>>> MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL, >>>>>> 0, MPI_COMM_SELF, &everyone, >>>>>> MPI_ERRCODES_IGNORE); >>>>>> >>>>>> std::cerr << "master spawned " << universeSize-1 << " slaves..." >>>>>> << std::endl; >>>>>> >>>>>> MPI_Finalize(); >>>>>> >>>>>> return 0; >>>>>> } >>>>>> >>>>>> >>>>>> //slave.cpp >>>>>> #include <mpi.h> >>>>>> >>>>>> int main(int argc, char **args) { >>>>>> int size; >>>>>> MPI_Comm parent; >>>>>> MPI_Init(&argc, &args); >>>>>> >>>>>> MPI_Comm_get_parent(&parent); >>>>>> >>>>>> if(parent == MPI_COMM_NULL) { >>>>>> std::cerr << "slave has no parent" << std::endl; >>>>>> } >>>>>> MPI_Comm_remote_size(parent, &size); >>>>>> if(size != 1) { >>>>>> std::cerr << "parent size is " << size << std::endl; >>>>>> } >>>>>> >>>>>> std::cerr << "slave responding..." << std::endl; >>>>>> >>>>>> MPI_Finalize(); >>>>>> >>>>>> return 0; >>>>>> } >>>>>> >>>>>> >>>>>> Any ideas? Thanks for any help. >>>>>> >>>>>> Brian >>>>>> >>>>>> On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>>> It really is just that simple :-) >>>>>>> >>>>>>> On Aug 22, 2012, at 8:56 AM, Brian Budge <brian.bu...@gmail.com> wrote: >>>>>>> >>>>>>>> Okay. Is there a tutorial or FAQ for setting everything up? Or is it >>>>>>>> really just that simple? I don't need to run a copy of the orte >>>>>>>> server somewhere? >>>>>>>> >>>>>>>> if my current ip is 192.168.0.1, >>>>>>>> >>>>>>>> 0 > echo 192.168.0.11 > /tmp/hostfile >>>>>>>> 1 > echo 192.168.0.12 >> /tmp/hostfile >>>>>>>> 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile >>>>>>>> 3 > ./mySpawningExe >>>>>>>> >>>>>>>> At this point, mySpawningExe will be the master, running on >>>>>>>> 192.168.0.1, and I can have spawned, for example, childExe on >>>>>>>> 192.168.0.11 and 192.168.0.12? Or childExe1 on 192.168.0.11 and >>>>>>>> childExe2 on 192.168.0.12? >>>>>>>> >>>>>>>> Thanks for the help. >>>>>>>> >>>>>>>> Brian >>>>>>>> >>>>>>>> On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain <r...@open-mpi.org> >>>>>>>> wrote: >>>>>>>>> Sure, that's still true on all 1.3 or above releases. All you need to >>>>>>>>> do is set the hostfile envar so we pick it up: >>>>>>>>> >>>>>>>>> OMPI_MCA_orte_default_hostfile=<foo> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Aug 21, 2012, at 7:23 PM, Brian Budge <brian.bu...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi. I know this is an old thread, but I'm curious if there are any >>>>>>>>>> tutorials describing how to set this up? Is this still available on >>>>>>>>>> newer open mpi versions? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Brian >>>>>>>>>> >>>>>>>>>> On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain <r...@lanl.gov> wrote: >>>>>>>>>>> Hi Elena >>>>>>>>>>> >>>>>>>>>>> I'm copying this to the user list just to correct a mis-statement >>>>>>>>>>> on my part >>>>>>>>>>> in an earlier message that went there. I had stated that a >>>>>>>>>>> singleton could >>>>>>>>>>> comm_spawn onto other nodes listed in a hostfile by setting an >>>>>>>>>>> environmental >>>>>>>>>>> variable that pointed us to the hostfile. >>>>>>>>>>> >>>>>>>>>>> This is incorrect in the 1.2 code series. That series does not allow >>>>>>>>>>> singletons to read a hostfile at all. Hence, any comm_spawn done by >>>>>>>>>>> a >>>>>>>>>>> singleton can only launch child processes on the singleton's local >>>>>>>>>>> host. >>>>>>>>>>> >>>>>>>>>>> This situation has been corrected for the upcoming 1.3 code series. >>>>>>>>>>> For the >>>>>>>>>>> 1.2 series, though, you will have to do it via an mpirun command >>>>>>>>>>> line. >>>>>>>>>>> >>>>>>>>>>> Sorry for the confusion - I sometimes have too many code families >>>>>>>>>>> to keep >>>>>>>>>>> straight in this old mind! >>>>>>>>>>> >>>>>>>>>>> Ralph >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 1/4/08 5:10 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello Ralph, >>>>>>>>>>>> >>>>>>>>>>>> Thank you very much for the explanations. >>>>>>>>>>>> But I still do not get it running... >>>>>>>>>>>> >>>>>>>>>>>> For the case >>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host >>>>>>>>>>>> my_master.exe >>>>>>>>>>>> everything works. >>>>>>>>>>>> >>>>>>>>>>>> For the case >>>>>>>>>>>> ./my_master.exe >>>>>>>>>>>> it does not. >>>>>>>>>>>> >>>>>>>>>>>> I did: >>>>>>>>>>>> - create my_hostfile and put it in the $HOME/.openmpi/components/ >>>>>>>>>>>> my_hostfile : >>>>>>>>>>>> bollenstreek slots=2 max_slots=3 >>>>>>>>>>>> octocore01 slots=8 max_slots=8 >>>>>>>>>>>> octocore02 slots=8 max_slots=8 >>>>>>>>>>>> clstr000 slots=2 max_slots=3 >>>>>>>>>>>> clstr001 slots=2 max_slots=3 >>>>>>>>>>>> clstr002 slots=2 max_slots=3 >>>>>>>>>>>> clstr003 slots=2 max_slots=3 >>>>>>>>>>>> clstr004 slots=2 max_slots=3 >>>>>>>>>>>> clstr005 slots=2 max_slots=3 >>>>>>>>>>>> clstr006 slots=2 max_slots=3 >>>>>>>>>>>> clstr007 slots=2 max_slots=3 >>>>>>>>>>>> - setenv OMPI_MCA_rds_hostfile_path my_hostfile (I put it in >>>>>>>>>>>> .tcshrc and >>>>>>>>>>>> then source .tcshrc) >>>>>>>>>>>> - in my_master.cpp I did >>>>>>>>>>>> MPI_Info info1; >>>>>>>>>>>> MPI_Info_create(&info1); >>>>>>>>>>>> char* hostname = >>>>>>>>>>>> "clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02"; >>>>>>>>>>>> MPI_Info_set(info1, "host", hostname); >>>>>>>>>>>> >>>>>>>>>>>> _intercomm = intracomm.Spawn("./childexe", argv1, _nProc, info1, 0, >>>>>>>>>>>> MPI_ERRCODES_IGNORE); >>>>>>>>>>>> >>>>>>>>>>>> - After I call the executable, I've got this error message >>>>>>>>>>>> >>>>>>>>>>>> bollenstreek: > ./my_master >>>>>>>>>>>> number of processes to run: 1 >>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>> Some of the requested hosts are not included in the current >>>>>>>>>>>> allocation for >>>>>>>>>>>> the application: >>>>>>>>>>>> ./childexe >>>>>>>>>>>> The requested hosts were: >>>>>>>>>>>> clstr002,clstr003,clstr005,clstr006,clstr007,octocore01,octocore02 >>>>>>>>>>>> >>>>>>>>>>>> Verify that you have mapped the allocated resources properly using >>>>>>>>>>>> the >>>>>>>>>>>> --host specification. >>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in >>>>>>>>>>>> file >>>>>>>>>>>> base/rmaps_base_support_fns.c at line 225 >>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in >>>>>>>>>>>> file >>>>>>>>>>>> rmaps_rr.c at line 478 >>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in >>>>>>>>>>>> file >>>>>>>>>>>> base/rmaps_base_map_job.c at line 210 >>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in >>>>>>>>>>>> file >>>>>>>>>>>> rmgr_urm.c at line 372 >>>>>>>>>>>> [bollenstreek:21443] [0,0,0] ORTE_ERROR_LOG: Out of resource in >>>>>>>>>>>> file >>>>>>>>>>>> communicator/comm_dyn.c at line 608 >>>>>>>>>>>> >>>>>>>>>>>> Did I miss something? >>>>>>>>>>>> Thanks for help! >>>>>>>>>>>> >>>>>>>>>>>> Elena >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov] >>>>>>>>>>>> Sent: Tuesday, December 18, 2007 3:50 PM >>>>>>>>>>>> To: Elena Zhebel; Open MPI Users <us...@open-mpi.org> >>>>>>>>>>>> Cc: Ralph H Castain >>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster >>>>>>>>>>>> configuration >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 12/18/07 7:35 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks a lot! Now it works! >>>>>>>>>>>>> The solution is to use mpirun -n 1 -hostfile my.hosts *.exe and >>>>>>>>>>>>> pass >>>>>>>>>>>> MPI_Info >>>>>>>>>>>>> Key to the Spawn function! >>>>>>>>>>>>> >>>>>>>>>>>>> One more question: is it necessary to start my "master" program >>>>>>>>>>>>> with >>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host >>>>>>>>>>>>> my_master.exe ? >>>>>>>>>>>> >>>>>>>>>>>> No, it isn't necessary - assuming that my_master_host is the first >>>>>>>>>>>> host >>>>>>>>>>>> listed in your hostfile! If you are only executing one >>>>>>>>>>>> my_master.exe (i.e., >>>>>>>>>>>> you gave -n 1 to mpirun), then we will automatically map that >>>>>>>>>>>> process onto >>>>>>>>>>>> the first host in your hostfile. >>>>>>>>>>>> >>>>>>>>>>>> If you want my_master.exe to go on someone other than the first >>>>>>>>>>>> host in the >>>>>>>>>>>> file, then you have to give us the -host option. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Are there other possibilities for easy start? >>>>>>>>>>>>> I would say just to run ./my_master.exe , but then the master >>>>>>>>>>>>> process >>>>>>>>>>>> doesn't >>>>>>>>>>>>> know about the available in the network hosts. >>>>>>>>>>>> >>>>>>>>>>>> You can set the hostfile parameter in your environment instead of >>>>>>>>>>>> on the >>>>>>>>>>>> command line. Just set OMPI_MCA_rds_hostfile_path = my.hosts. >>>>>>>>>>>> >>>>>>>>>>>> You can then just run ./my_master.exe on the host where you want >>>>>>>>>>>> the master >>>>>>>>>>>> to reside - everything should work the same. >>>>>>>>>>>> >>>>>>>>>>>> Just as an FYI: the name of that environmental variable is going >>>>>>>>>>>> to change >>>>>>>>>>>> in the 1.3 release, but everything will still work the same. >>>>>>>>>>>> >>>>>>>>>>>> Hope that helps >>>>>>>>>>>> Ralph >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks and regards, >>>>>>>>>>>>> Elena >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: Ralph H Castain [mailto:r...@lanl.gov] >>>>>>>>>>>>> Sent: Monday, December 17, 2007 5:49 PM >>>>>>>>>>>>> To: Open MPI Users <us...@open-mpi.org>; Elena Zhebel >>>>>>>>>>>>> Cc: Ralph H Castain >>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster >>>>>>>>>>>>> configuration >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 12/17/07 8:19 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello Ralph, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you for your answer. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm using OpenMPI 1.2.3. , compiler glibc232, Linux Suse 10.0. >>>>>>>>>>>>>> My "master" executable runs only on the one local host, then it >>>>>>>>>>>>>> spawns >>>>>>>>>>>>>> "slaves" (with MPI::Intracomm::Spawn). >>>>>>>>>>>>>> My question was: how to determine the hosts where these "slaves" >>>>>>>>>>>>>> will be >>>>>>>>>>>>>> spawned? >>>>>>>>>>>>>> You said: "You have to specify all of the hosts that can be used >>>>>>>>>>>>>> by >>>>>>>>>>>>>> your job >>>>>>>>>>>>>> in the original hostfile". How can I specify the host file? I >>>>>>>>>>>>>> can not >>>>>>>>>>>>>> find it >>>>>>>>>>>>>> in the documentation. >>>>>>>>>>>>> >>>>>>>>>>>>> Hmmm...sorry about the lack of documentation. I always assumed >>>>>>>>>>>>> that the MPI >>>>>>>>>>>>> folks in the project would document such things since it has >>>>>>>>>>>>> little to do >>>>>>>>>>>>> with the underlying run-time, but I guess that fell through the >>>>>>>>>>>>> cracks. >>>>>>>>>>>>> >>>>>>>>>>>>> There are two parts to your question: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. how to specify the hosts to be used for the entire job. I >>>>>>>>>>>>> believe that >>>>>>>>>>>> is >>>>>>>>>>>>> somewhat covered here: >>>>>>>>>>>>> http://www.open-mpi.org/faq/?category=running#simple-spmd-run >>>>>>>>>>>>> >>>>>>>>>>>>> That FAQ tells you what a hostfile should look like, though you >>>>>>>>>>>>> may already >>>>>>>>>>>>> know that. Basically, we require that you list -all- of the nodes >>>>>>>>>>>>> that both >>>>>>>>>>>>> your master and slave programs will use. >>>>>>>>>>>>> >>>>>>>>>>>>> 2. how to specify which nodes are available for the master, and >>>>>>>>>>>>> which for >>>>>>>>>>>>> the slave. >>>>>>>>>>>>> >>>>>>>>>>>>> You would specify the host for your master on the mpirun command >>>>>>>>>>>>> line with >>>>>>>>>>>>> something like: >>>>>>>>>>>>> >>>>>>>>>>>>> mpirun -n 1 -hostfile my_hostfile -host my_master_host >>>>>>>>>>>>> my_master.exe >>>>>>>>>>>>> >>>>>>>>>>>>> This directs Open MPI to map that specified executable on the >>>>>>>>>>>>> specified >>>>>>>>>>>> host >>>>>>>>>>>>> - note that my_master_host must have been in my_hostfile. >>>>>>>>>>>>> >>>>>>>>>>>>> Inside your master, you would create an MPI_Info key "host" that >>>>>>>>>>>>> has a >>>>>>>>>>>> value >>>>>>>>>>>>> consisting of a string "host1,host2,host3" identifying the hosts >>>>>>>>>>>>> you want >>>>>>>>>>>>> your slave to execute upon. Those hosts must have been included in >>>>>>>>>>>>> my_hostfile. Include that key in the MPI_Info array passed to >>>>>>>>>>>>> your Spawn. >>>>>>>>>>>>> >>>>>>>>>>>>> We don't currently support providing a hostfile for the slaves >>>>>>>>>>>>> (as opposed >>>>>>>>>>>>> to the host-at-a-time string above). This may become available in >>>>>>>>>>>>> a future >>>>>>>>>>>>> release - TBD. >>>>>>>>>>>>> >>>>>>>>>>>>> Hope that helps >>>>>>>>>>>>> Ralph >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks and regards, >>>>>>>>>>>>>> Elena >>>>>>>>>>>>>> >>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>> From: users-boun...@open-mpi.org >>>>>>>>>>>>>> [mailto:users-boun...@open-mpi.org] On >>>>>>>>>>>>>> Behalf Of Ralph H Castain >>>>>>>>>>>>>> Sent: Monday, December 17, 2007 3:31 PM >>>>>>>>>>>>>> To: Open MPI Users <us...@open-mpi.org> >>>>>>>>>>>>>> Cc: Ralph H Castain >>>>>>>>>>>>>> Subject: Re: [OMPI users] MPI::Intracomm::Spawn and cluster >>>>>>>>>>>>>> configuration >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 12/12/07 5:46 AM, "Elena Zhebel" <ezhe...@fugro-jason.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm working on a MPI application where I'm using OpenMPI >>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>> MPICH. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In my "master" program I call the function >>>>>>>>>>>>>>> MPI::Intracomm::Spawn which >>>>>>>>>>>>>> spawns >>>>>>>>>>>>>>> "slave" processes. It is not clear for me how to spawn the >>>>>>>>>>>>>>> "slave" >>>>>>>>>>>>>> processes >>>>>>>>>>>>>>> over the network. Currently "master" creates "slaves" on the >>>>>>>>>>>>>>> same >>>>>>>>>>>>>>> host. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If I use 'mpirun --hostfile openmpi.hosts' then processes are >>>>>>>>>>>>>>> spawn >>>>>>>>>>>>>>> over >>>>>>>>>>>>>> the >>>>>>>>>>>>>>> network as expected. But now I need to spawn processes over the >>>>>>>>>>>>>>> network >>>>>>>>>>>>>> from >>>>>>>>>>>>>>> my own executable using MPI::Intracomm::Spawn, how can I >>>>>>>>>>>>>>> achieve it? >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm not sure from your description exactly what you are trying >>>>>>>>>>>>>> to do, >>>>>>>>>>>>>> nor in >>>>>>>>>>>>>> what environment this is all operating within or what version of >>>>>>>>>>>>>> Open >>>>>>>>>>>>>> MPI >>>>>>>>>>>>>> you are using. Setting aside the environment and version issue, >>>>>>>>>>>>>> I'm >>>>>>>>>>>>>> guessing >>>>>>>>>>>>>> that you are running your executable over some specified set of >>>>>>>>>>>>>> hosts, >>>>>>>>>>>>>> but >>>>>>>>>>>>>> want to provide a different hostfile that specifies the hosts to >>>>>>>>>>>>>> be >>>>>>>>>>>>>> used for >>>>>>>>>>>>>> the "slave" processes. Correct? >>>>>>>>>>>>>> >>>>>>>>>>>>>> If that is correct, then I'm afraid you can't do that in any >>>>>>>>>>>>>> version >>>>>>>>>>>>>> of Open >>>>>>>>>>>>>> MPI today. You have to specify all of the hosts that can be used >>>>>>>>>>>>>> by >>>>>>>>>>>>>> your job >>>>>>>>>>>>>> in the original hostfile. You can then specify a subset of those >>>>>>>>>>>>>> hosts >>>>>>>>>>>>>> to be >>>>>>>>>>>>>> used by your original "master" program, and then specify a >>>>>>>>>>>>>> different >>>>>>>>>>>>>> subset >>>>>>>>>>>>>> to be used by the "slaves" when calling Spawn. >>>>>>>>>>>>>> >>>>>>>>>>>>>> But the system requires that you tell it -all- of the hosts that >>>>>>>>>>>>>> are >>>>>>>>>>>>>> going >>>>>>>>>>>>>> to be used at the beginning of the job. >>>>>>>>>>>>>> >>>>>>>>>>>>>> At the moment, there is no plan to remove that requirement, >>>>>>>>>>>>>> though >>>>>>>>>>>>>> there has >>>>>>>>>>>>>> been occasional discussion about doing so at some point in the >>>>>>>>>>>>>> future. >>>>>>>>>>>>>> No >>>>>>>>>>>>>> promises that it will happen, though - managed environments, in >>>>>>>>>>>>>> particular, >>>>>>>>>>>>>> currently object to the idea of changing the allocation >>>>>>>>>>>>>> on-the-fly. We >>>>>>>>>>>>>> may, >>>>>>>>>>>>>> though, make a provision for purely hostfile-based environments >>>>>>>>>>>>>> (i.e., >>>>>>>>>>>>>> unmanaged) at some time in the future. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks in advance for any help. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Elena >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users