Re: [OMPI devel] Cross-job disconnect is broken

2015-09-08 Thread Jeff Squyres (jsquyres)
On Sep 8, 2015, at 4:59 PM, George Bosilca wrote: > > Why would anyone use connect/accept (or join) between processes on the same > job? The only environment where such a functionality makes sense is where > disjoint applications (think computing part and the visualization part) are > able to

Re: [OMPI devel] Cross-job disconnect is broken

2015-09-08 Thread Ralph Castain
It’s called comm_spawn, which involves the connect/accept code after launch :-) > On Sep 8, 2015, at 1:59 PM, George Bosilca wrote: > > Why would anyone use connect/accept (or join) between processes on the same > job? The only environment where such a functionality makes sense is where > dis

Re: [OMPI devel] Cross-job disconnect is broken

2015-09-08 Thread George Bosilca
Why would anyone use connect/accept (or join) between processes on the same job? The only environment where such a functionality makes sense is where disjoint applications (think computing part and the visualization part) are able to connect together. There are application that use such a model, bu

Re: [OMPI devel] Cross-job disconnect is broken

2015-09-08 Thread Jeff Squyres (jsquyres)
On Sep 7, 2015, at 5:07 PM, Ralph Castain wrote: > > * two jobs started by the same mpirun - supported today by ORTE > > * two jobs started by different mpiruns - we used to support, but is broken > in grpcomm/barrier > > * two direct-launched jobs - never supported > > * one direct-launched

[OMPI devel] Cross-job disconnect is broken

2015-09-07 Thread Ralph Castain
Yo folks I was working on the PMIx integration in support of connect/accept, and happened to take a closer look at the “disconnect” function we call during finalize. I then realized that we had broken this function for the use-case where two jobs started by different mpiruns connect when we mad