Re: [OMPI devel] Running on Kubernetes

2018-06-08 Thread Rong Ou
Thanks Ralph! I created an issue to track this: https://github.com/kubeflow/mpi-operator/issues/12. On Mon, May 28, 2018 at 5:25 AM r...@open-mpi.org wrote: > One suggestion: this approach requires that the job be executed using > “mpirun”. Another approach would be to integrate PMIx into

Re: [OMPI devel] Running on Kubernetes

2018-05-28 Thread r...@open-mpi.org
One suggestion: this approach requires that the job be executed using “mpirun”. Another approach would be to integrate PMIx into Kubernetes, thus allowing any job to call MPI_Init regardless of how it was started. The advantage would be that it enables the use of MPI by workflow-based

Re: [OMPI devel] Running on Kubernetes

2018-05-24 Thread Rong Ou
Hi guys, Thanks for all the suggestions! It's been a while but we finally got it approved for open sourcing. I've submitted a proposal to kubeflow: https://github.com/kubeflow/community/blob/master/proposals/mpi-operator-proposal.md. In this version we've managed to not use ssh, relying on

Re: [OMPI devel] Running on Kubernetes

2018-03-16 Thread r...@open-mpi.org
I haven’t really spent any time with Kubernetes, but it seems to me you could just write a Kubernetes plm (and maybe an odls) component and bypass the ssh stuff completely given that you say there is a launcher API. > On Mar 16, 2018, at 11:02 AM, Jeff Squyres (jsquyres) >

Re: [OMPI devel] Running on Kubernetes

2018-03-16 Thread Jeff Squyres (jsquyres)
On Mar 16, 2018, at 10:01 AM, Gilles Gouaillardet wrote: > > By default, Open MPI uses the rsh PLM in order to start a job. To clarify one thing here: the name of our plugin is "rsh" for historical reasons, but it defaults to looking to looking for "ssh" first.

Re: [OMPI devel] Running on Kubernetes

2018-03-16 Thread Gilles Gouaillardet
Hi Rong, SSH is safe when properly implemented. That being said, some sites does not allow endusers to directly SSH into compute nodes because they do not want them to do anything without the resource manager knowing about it. What is your concern with SSH ? You can run a resource manager (such

[OMPI devel] Running on Kubernetes

2018-03-16 Thread Rong Ou
Hi, I've implemented a Kubernetes operator for running OpenMPI jobs there. An operator is just a Custom Resource Definition (CRD) for a new type of mpi-aware batch job, and a custom controller that manages the states of those jobs.