Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-13 Thread Howard Pritchard
port or Intel's support at > ibsupp...@intel.com. They might have seen this problem before. Since > you're running the RHEL versions of PSM and related software, one thing you > could try is IFS. I think I was running IFS 7.3.0, so that's a difference > bet

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-13 Thread Adrian Reber
L versions of PSM and related software, one thing > > >> you could try is IFS. I think I was running IFS 7.3.0, so that's a > > >> difference between your setup and mine. At the least, it may help > > >> support nail down the issue. > > >> >

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-12 Thread Jeff Squyres (jsquyres)
> >> -Original Message- > >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph > >> Castain > >> Sent: Tuesday, November 11, 2014 2:23 PM > >> To: Open MPI Developers > >> Subject: Re: [OMPI devel] 1.8.3 and PSM errors >

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-12 Thread Howard Pritchard
PSM and OMPI 1.6.5; it fails on 1.8.1 > and 1.8.3. > > > > Andrew > > > >> -Original Message- > >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph > >> Castain > >> Sent: Tuesday, November 11, 2014 2:23 PM > >> To

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-12 Thread Rainer Keller
Andrew > >> -Original Message- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph >> Castain >> Sent: Tuesday, November 11, 2014 2:23 PM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] 1.8.3 and PSM errors >> >> I

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-11 Thread Howard Pritchard
Hi Folks, I remember in the psm provider for libfabric, that there is a check in the av_insert method for endpoints that had previously been inserted into the av. In the libfabric psm provider, a mask array is created and fed in to the psm_ep_connect call to handle ep's that were already "connect

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-11 Thread George Bosilca
> On Nov 11, 2014, at 17:13 , Jeff Squyres (jsquyres) > wrote: > >> More particularly, it looks like add_procs is being called a second time >> during MPI_Intercomm_create and being passed a process that is already >> connected (passed into the first add_procs call). Is that right? Should

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-11 Thread Friedley, Andrew
; Sent: Tuesday, November 11, 2014 2:23 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] 1.8.3 and PSM errors > > I thought PSM didn’t support dynamic operations such as Intercomm_create > - yes? The PSM security key wouldn’t match between the two jobs, and so > there i

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-11 Thread Ralph Castain
I thought PSM didn’t support dynamic operations such as Intercomm_create - yes? The PSM security key wouldn’t match between the two jobs, and so there is no way for them to communicate. Which is why I thought PSM can’t be used for dynamic operations at all, including comm_spawn and connect/acce

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-11 Thread Jeff Squyres (jsquyres)
On Nov 11, 2014, at 4:56 PM, Friedley, Andrew wrote: > OK, I'm able to reproduce this now, not sure why I couldn't before. I took a > look at the diff of the PSM MTL from 1.6.5 to 1.8.1, and nothing is standing > out to me. > > Question more for the general group: Did anything related to the

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-11 Thread Friedley, Andrew
OK, I'm able to reproduce this now, not sure why I couldn't before. I took a look at the diff of the PSM MTL from 1.6.5 to 1.8.1, and nothing is standing out to me. Question more for the general group: Did anything related to the behavior/usage of MTL add_procs() change in this time window?

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-11 Thread Adrian Reber
d software, one thing > >> you could try is IFS. I think I was running IFS 7.3.0, so that's a > >> difference between your setup and mine. At the least, it may help support > >> nail down the issue. > >> > >> Andrew > >> > >>> -Original Messag

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-11 Thread Ralph Castain
, it may help support nail down >> the issue. >> >> Andrew >> >>> -Original Message- >>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian >>> Reber >>> Sent: Monday, November 10, 2014 12:39 PM >>> To:

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-11 Thread Adrian Reber
help support nail down the issue. > > Andrew > > > -Original Message- > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian > > Reber > > Sent: Monday, November 10, 2014 12:39 PM > > To: Open MPI Developers > > Subject: Re:

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-10 Thread Friedley, Andrew
g] On Behalf Of Adrian > Reber > Sent: Monday, November 10, 2014 1:19 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] 1.8.3 and PSM errors > > What is IFS? > > On Mon, Nov 10, 2014 at 09:12:41PM +, Friedley, Andrew wrote: > > Hi Adrian, > > > > Y

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-10 Thread Adrian Reber
l [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian > > Reber > > Sent: Monday, November 10, 2014 12:39 PM > > To: Open MPI Developers > > Subject: Re: [OMPI devel] 1.8.3 and PSM errors > > > > Andrew, > > > > thanks for looking into this. I w

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-10 Thread Friedley, Andrew
h various np from 8 to 32. Your original case: > > > > $ mpirun -np 32 ./mpi_test_suite -t "All,^io,^one-sided" > > > > Runs for a while and eventually hits send cancellation errors. > > > > Any chance you could try updating your infinipath libraries? > > > &

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-10 Thread Adrian Reber
; > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian > > Reber > > Sent: Monday, October 27, 2014 9:11 AM > > To: Open MPI Developers > > Subject: Re: [OMPI devel] 1.8.3 and PSM errors > > > > This is a simpler test setup: > > >

Re: [OMPI devel] 1.8.3 and PSM errors

2014-10-28 Thread Adrian Reber
rs. > > Any chance you could try updating your infinipath libraries? > > Andrew > > > -Original Message- > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian > > Reber > > Sent: Monday, October 27, 2014 9:11 AM > > To: Open MPI Developers &g

Re: [OMPI devel] 1.8.3 and PSM errors

2014-10-27 Thread Friedley, Andrew
d try updating your infinipath libraries? Andrew > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Adrian > Reber > Sent: Monday, October 27, 2014 9:11 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] 1.8.3 and PSM errors > > This

Re: [OMPI devel] 1.8.3 and PSM errors

2014-10-27 Thread Ralph Castain
Andrew@Intel is looking into it - he has some PSM patches coming that may resolve this already. > On Oct 27, 2014, at 9:10 AM, Adrian Reber wrote: > > This is a simpler test setup: > > On 8 core machines this works: > > $ mpirun -np 8 mpi_test_suite -t "environment" > [...] > Number of fai

Re: [OMPI devel] 1.8.3 and PSM errors

2014-10-27 Thread Adrian Reber
This is a simpler test setup: On 8 core machines this works: $ mpirun -np 8 mpi_test_suite -t "environment" [...] Number of failed tests:0 Using 9 or more cores it fails: $ mpirun -np 9 mpi_test_suite -t "environment" mpi_test_suite:20293 terminated with signal 11 at PC=2b6d107fa9a4 SP=7f

Re: [OMPI devel] 1.8.3 and PSM errors

2014-10-27 Thread Ralph Castain
I’m afraid I can’t quite decipher from all this what actually fails. Of course, PSM doesn’t support dynamic operations like comm_spawn or connect_accept, so if you are running those tests that just won’t work. Is that the heart of the problem here? > On Oct 27, 2014, at 1:40 AM, Adrian Reber