Re: [OMPI devel] OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread r...@open-mpi.org
I don’t think a collision was the issue here. We were taking the mpirun-generated jobid and passing it thru the hash, thus creating an incorrect and invalid value. What I’m more surprised by is that it doesn’t -always- fail. Only thing I can figure is that, unlike with PMIx, the usock oob compon

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread r...@open-mpi.org
It’s okay - it was just confusing This actually wound up having nothing to do with how the jobid is generated. The root cause of the problem was that we took an mpirun-generated jobid, and then mistakenly passed it back thru a hash function instead of just using it. So we hashed a perfectly goo

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread Joshua Ladd
Ralph, We love PMIx :). In this context, when I say PMIx, I am referring to the PMIx framework in OMPI/OPAL, not the standalone PMIx library. Sorry that wasn't clear. Josh On Thu, Sep 15, 2016 at 10:07 AM, r...@open-mpi.org wrote: > I don’t understand this fascination with PMIx. PMIx didn’t ca

Re: [OMPI devel] toward a unique session directory

2016-09-15 Thread r...@open-mpi.org
Actually, you just use the envar that was previously cited on a different email thread: if (NULL != getenv(OPAL_MCA_PREFIX"orte_launch")) { /* you were launched by mpirun */ } else { /* you were direct launched */ } This is available from time of first instruction, so no worrie

Re: [OMPI devel] toward a unique session directory

2016-09-15 Thread Pritchard Jr., Howard
HI Gilles, From what point in the job launch are you needed to determine whether or not the job was direct launched? Howard -- Howard Pritchard HPC-DES Los Alamos National Laboratory On 9/15/16, 7:38 AM, "devel on behalf of Gilles Gouaillardet" wrote: >Ralph, > >that looks good to me. >

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread r...@open-mpi.org
I don’t understand this fascination with PMIx. PMIx didn’t calculate this jobid - OMPI did. Yes, it is in the opal/pmix layer, but it had -nothing- to do with PMIx. So why do you want to continue to blame PMIx for this problem?? > On Sep 15, 2016, at 4:29 AM, Joshua Ladd wrote: > > Great cat

Re: [OMPI devel] Lots of new features rolled out on github.com today

2016-09-15 Thread Bland, Wesley
I believe if you don’t have a smartphone handy, you can still use other tools get your auth code. Personally, I use Alfred + the google authenticator workflow, but you can also use Authy (https://www.authy.com/blog/introducing-authy-for-your-personal-computer) and I think it will sync with all

Re: [OMPI devel] toward a unique session directory

2016-09-15 Thread Gilles Gouaillardet
Ralph, that looks good to me. can you please remind me how to test if an app was launched by mpirun/orted or direct launched by the RM ? right now, which direct launch method is supported ? i am aware of srun (SLURM) and apron (CRAY), are there any other ? Cheers, Gilles On Thu, Sep 15, 2016

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread Eric Chamberland
Hi Gilles, On 15/09/16 03:38 AM, Gilles Gouaillardet wrote: Eric, a bug has been identified, and a patch is available at https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1376.patch the bug is specific to singleton mode (e.g. ./a.out vs mpirun -np 1 ./a.out), so if appl

Re: [OMPI devel] OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread Gilles Gouaillardet
I just realized i screwed up my test, and i was missing some relevant info... So on one hand, i fixed a bug in singleton, But on the other hand, i cannot tell whether a collision was involved in this issue Cheers, Gilles Joshua Ladd wrote: >Great catch, Gilles! Not much of a surprise though. 

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread Joshua Ladd
Great catch, Gilles! Not much of a surprise though. Indeed, this issue has EVERYTHING to do with how PMIx is calculating the jobid, which, in this case, results in hash collisions. ;-P Josh On Thursday, September 15, 2016, Gilles Gouaillardet wrote: > Eric, > > > a bug has been identified, and

Re: [OMPI devel] toward a unique session directory

2016-09-15 Thread r...@open-mpi.org
> On Sep 15, 2016, at 12:51 AM, Gilles Gouaillardet wrote: > > Ralph, > > > > my reply is in the text > > > On 9/15/2016 11:11 AM, r...@open-mpi.org wrote: >> If we are going to make a change, then let’s do it only once. Since we >> introduced PMIx and the concep

Re: [OMPI devel] toward a unique session directory

2016-09-15 Thread Gilles Gouaillardet
Ralph, my reply is in the text On 9/15/2016 11:11 AM, r...@open-mpi.org wrote: If we are going to make a change, then let’s do it only once. Since we introduced PMIx and the concept of the string namespace, the plan has been to switch away from a numerical jobid and to the namespace. This e

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread Gilles Gouaillardet
Eric, a bug has been identified, and a patch is available at https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1376.patch the bug is specific to singleton mode (e.g. ./a.out vs mpirun -np 1 ./a.out), so if applying a patch does not fit your test workflow, it might be

Re: [OMPI devel] OpenMPI 2.x: bug: violent break at beginning with (sequential) runs...

2016-09-15 Thread Gilles Gouaillardet
Ralph, i fixed master at https://github.com/open-mpi/ompi/commit/11ebf3ab23bdaeb0ec96818c119364c6d837cd3b and PR for v2.x at https://github.com/open-mpi/ompi-release/pull/1376 Cheers, Gilles On 9/15/2016 12:26 PM, r...@open-mpi.org wrote: Ah...I take that back. We changed this and now