Perhaps it would help if you could give us some idea of the interest here? The prior Mesos integration was done as an academic project, which is why it died once the student graduated.
Is there some long-term interest here? Or is this part of an academic effort? > On Jun 5, 2016, at 7:22 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> >> On Jun 5, 2016, at 4:30 PM, Du, Fan <fan...@intel.com >> <mailto:fan...@intel.com>> wrote: >> >> Thanks for your reply! >> >> On 2016/6/5 3:01, Ralph Castain wrote: >>> The closest thing we have to what you describe is the “orte-dvm” - this >>> allows one to launch a persistent collection of daemons. You can then >>> run your applications against it using “mpiexec -hnp <url>” where the >>> url is that of the orte-dvm “head” daemon. >> >> I tried this, maybe I miss something. >> >> On host1: >> orte-dvm --allow-run-as-root >> VMURI: 2783903744.0;tcp://192.168.10.55:47325 <tcp://192.168.10.55:47325> >> DVM ready >> >> On host2: >> mpiexec -hnp 2783903744.0;tcp://192.168.10.55:47325 >> <tcp://192.168.10.55:47325> > > Your shell will take the semi-colon to mean the end of the line - you have to > enclose it all in quotes > >> OMPI_MCA_orte_hnp_uri=2783903744.0 >> OMPI_MCA_ess=tool >> [grantleyIPDC01:03305] [[21695,0],0] ORTE_ERROR_LOG: Bad parameter in file >> base/rml_base_contact.c at line 161 >> -bash: tcp://192.168.10.55:47325: <tcp://192.168.10.55:47325:> No such file >> or directory >> >> digging the code a bit deeper, the uri expects to have job id and rank id. >> and how to make the subsequent orte-dvm know where the head orte-dvm is? >> I checked orte-dvm help, seems no such option there. >> >>> If I understand you correctly, however, then you would want the orte-dvm >>> to assemble itself based on the asynchronous start of the individual >>> daemons. In other words, Mesos would start a daemon on each node as that >>> node became available. Then, once all the daemons have been started, >>> Mesos would execute “mpiexec” to start the application. >>> >>> Is that correct? >> >> Yes >> >>> If so, then we don’t support that mode today, but it could fairly easily >>> be added. >>> However, I don’t see why you couldn’t just write a small >>> standalone tool that collects all the Mesos resources in a file until >>> all have been assembled, and then executes “mpiexec -hostfile <myfile>”. >> >> Because, mpiexec will eventually rely ssh to run mpi proxy on hosts, > > What’s the problem with that? It’s how many HPC clusters work. Is ssh not > enabled? > >> while >> in Mesos, it works like: framework passes information about on which host >> to run what commands, and pass such information to Mesos master, then Mesos >> master will instruct hosts to run commands. >> >> This is where Mesos work module doesn't fit into Open MPI. > > Easiest thing would be to add a Mesos PLM plugin to OMPI - IIRC, someone once > did that, but nobody was interested and so it died > > >> >>> Is there some reason this won’t work? It would be much simpler and would >>> work with any MPI. >>> >>> Ralph >>> >>> >>>> On Jun 3, 2016, at 5:10 PM, Du, Fan <fan...@intel.com >>>> <mailto:fan...@intel.com> >>>> <mailto:fan...@intel.com <mailto:fan...@intel.com>>> wrote: >>>> >>>> >>>> >>>> On 2016/6/2 19:14, Gilles Gouaillardet wrote: >>>>> Hi, >>>>> >>>>> may I ask why you need/want to launch orted manually ? >>>> >>>> Good question. >>>> >>>> The intention is to get orted commands, and run orted with Mesos. >>>> This all comes from who Mesos works, in essence it offers >>>> resources(cpu/memory/ports) >>>> in a per host basis to framework, framework then builds information of >>>> how to run >>>> specific tasks, and pass those information to Mesos master, at last >>>> Mesos will >>>> instructs hosts to execute the framework tasks. >>>> >>>> Take MPICH2 as example, the framework to support MPICH2 works as above. >>>> 1. framework gets offers from Mesos master, and tells the Mesos master >>>> to run a wrapper >>>> of MPICH2 proxy(hydra_pmi_proxy), at this time, the wrapper waits for >>>> commands to >>>> execute the proxy. >>>> >>>> 2. After launch enough MPICH2 proxy wrapper on hosts as user expect, >>>> then run the >>>> real mpiexec program with '-launcher manual' to grab commands for the >>>> proxy, then >>>> pass those commands to the proxy wrapper, so finally the real MPICH2 >>>> proxy got launched, >>>> and mpiexec will proceed on normally. >>>> >>>> That's why I'm looking for similar functionality as '-launcher manual >>>> MPICH2. >>>> Non native speaker, I hope I tell the story clear :) >>>> >>>> >>>> >>>>> unless you are running under a batch manager, Open MPI uses the rsh pml >>>>> to remotely start orted. >>>>> basically, it does >>>>> ssh host orted <orted params> >>>>> >>>>> the best I can suggest is you do >>>>> >>>>> mpirun --mca orte_rsh_agent myrshagent.sh --mca orte_launch_agent >>>>> mylaunchagent.sh ... >>>>> under the hood, mpirun will do >>>>> myrshagent.sh host mylaunchagent.sh <orted params> >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> On Thursday, June 2, 2016, Du, Fan <fan...@intel.com >>>>> <mailto:fan...@intel.com> >>>>> <mailto:fan...@intel.com <mailto:fan...@intel.com>> >>>>> <mailto:fan...@intel.com <mailto:fan...@intel.com>>> wrote: >>>>> >>>>> Hi folks >>>>> >>>>> Starting from Open MPI, I can launch mpi application a.out as >>>>> following on host1 >>>>> mpirun --allow-run-as-root --host host1,host2 -np 4 /tmp/a.out >>>>> >>>>> On host2, I saw an proxy, say orted here is spawned: >>>>> orted --hnp-topo-sig 4N:2S:4L3:20L2:20L1:20C:40H:x86_64 -mca ess env >>>>> -mca orte_ess_jobid 1275133952 -mca orte_ess_vpid 1 -mca >>>>> orte_ess_num_procs 2 -mca orte_hnp_uri >>>>> 1275133952.0;tcp://host1_ip:40024 <tcp://host1_ip:40024> --tree-spawn >>>>> -mca plm rsh >>>>> --tree-spawn >>>>> >>>>> It seems mpirun use ssh as launcher on my system. >>>>> What if I want to run orted things manually, not by mpirun >>>>> automatically, >>>>> I mean, does mpirun has any option to produce commands for orted? >>>>> >>>>> As for MPICH2 implementation, there is "-launcher manual" option to >>>>> make this works, >>>>> for example: >>>>> # mpiexec.hydra -launcher manual -np 4 htop >>>>> HYDRA_LAUNCH: /usr/local/bin/hydra_pmi_proxy --control-port >>>>> grantleyIPDC04:34652 --rmk user --launcher manual --demux poll >>>>> --pgid 0 --retries 10 --usize -2 --proxy-id 0 >>>>> HYDRA_LAUNCH_END >>>>> >>>>> Then I can manually run hydra_pmi_proxy with commands, and >>>>> mpiexec.hydra will proceed on. >>>>> >>>>> Thanks! >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org >>>>> <mailto:us...@open-mpi.org>> >>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> <https://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2016/06/29346.php >>>>> <http://www.open-mpi.org/community/lists/users/2016/06/29346.php> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2016/06/29347.php >>>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org >>>> <mailto:us...@open-mpi.org>> >>>> Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>> Link to this >>>> post:http://www.open-mpi.org/community/lists/users/2016/06/29364.php >>>> <http://www.open-mpi.org/community/lists/users/2016/06/29364.php> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/06/29367.php >>> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/06/29375.php