[OMPI devel] Slurm support in master

2015-09-08 Thread Ralph Castain
Hi folks I’ve poked around this evening and gotten the Slurm support in master to at least build, and for mpirun to now work correctly under a Slurm job allocation. This should all be committed as soon as auto-testing completes: https://github.com/open-mpi/ompi/pull/877 Howard/Nathan: I believ

Re: [OMPI devel] Cross-job disconnect is broken

2015-09-08 Thread Jeff Squyres (jsquyres)
On Sep 8, 2015, at 4:59 PM, George Bosilca wrote: > > Why would anyone use connect/accept (or join) between processes on the same > job? The only environment where such a functionality makes sense is where > disjoint applications (think computing part and the visualization part) are > able to

Re: [OMPI devel] Cross-job disconnect is broken

2015-09-08 Thread Ralph Castain
It’s called comm_spawn, which involves the connect/accept code after launch :-) > On Sep 8, 2015, at 1:59 PM, George Bosilca wrote: > > Why would anyone use connect/accept (or join) between processes on the same > job? The only environment where such a functionality makes sense is where > dis

Re: [OMPI devel] Cross-job disconnect is broken

2015-09-08 Thread George Bosilca
Why would anyone use connect/accept (or join) between processes on the same job? The only environment where such a functionality makes sense is where disjoint applications (think computing part and the visualization part) are able to connect together. There are application that use such a model, bu

Re: [OMPI devel] Cross-job disconnect is broken

2015-09-08 Thread Jeff Squyres (jsquyres)
On Sep 7, 2015, at 5:07 PM, Ralph Castain wrote: > > * two jobs started by the same mpirun - supported today by ORTE > > * two jobs started by different mpiruns - we used to support, but is broken > in grpcomm/barrier > > * two direct-launched jobs - never supported > > * one direct-launched

Re: [OMPI devel] psm mtl weirdness

2015-09-08 Thread Friedley, Andrew
Hi Howard, Is this new behavior? Do you see the error if you set PSM_DEVICES=shm,self ? The PSM MTL should be setting this on its own, but maybe something changed. Andrew From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Howard Pritchard Sent: Tuesday, September 8, 2015 10:06 AM To

Re: [OMPI devel] MTT failures since the last few days on ppc64

2015-09-08 Thread Jeff Squyres (jsquyres)
Thanks Adrian; I turned this into https://github.com/open-mpi/ompi/issues/874. > On Sep 8, 2015, at 9:56 AM, Adrian Reber wrote: > > Since a few days the MTT runs on my ppc64 systems are failing with: > > [bimini:11716] *** Process received signal *** > [bimini:11716] Signal: Segmentation fault

[OMPI devel] psm mtl weirdness

2015-09-08 Thread Howard Pritchard
Hi PSM folks, I'm noticing some weirdness on master using the psm mtl. If I run multi-node, I don't see a problem. If I run using only a single node, however, and use more than 1 rank, then I get a timeout in psm_ep_connect. On ompi-release I also observe this problem, but it seems to be more sp

[OMPI devel] MTT failures since the last few days on ppc64

2015-09-08 Thread Adrian Reber
Since a few days the MTT runs on my ppc64 systems are failing with: [bimini:11716] *** Process received signal *** [bimini:11716] Signal: Segmentation fault (11) [bimini:11716] Signal code: Address not mapped (1) [bimini:11716] Failing at address: (nil)[bimini:11716] [ 0] [0x3fffa2bb0448] [bimini: