Re: [OMPI users] problem with opal_list_remove_item for openmpi-v2.x-201702010255-8b16747 on Linux

2017-02-03 Thread Jeff Squyres (jsquyres)
I've filed this as https://github.com/open-mpi/ompi/issues/2920. Ralph is just heading out for about a week or so; it may not get fixed until he comes back. > On Feb 3, 2017, at 2:03 AM, Siegmar Gross > wrote: > > Hi, > > I have installed openmpi-v2.x-201702010255-8b16747 on my "SUSE Linux

[OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread Mark Dixon
Hi, Just tried upgrading from 2.0.1 to 2.0.2 and I'm getting error messages that look like openmpi is using ssh to login to remote nodes instead of qrsh (see below). Has anyone else noticed gridengine integration being broken, or am I being dumb? I built with "./configure --prefix=/apps/dev

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread Reuti
Hi, > Am 03.02.2017 um 17:10 schrieb Mark Dixon : > > Hi, > > Just tried upgrading from 2.0.1 to 2.0.2 and I'm getting error messages that > look like openmpi is using ssh to login to remote nodes instead of qrsh (see > below). Has anyone else noticed gridengine integration being broken, or am

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread Mark Dixon
On Fri, 3 Feb 2017, Reuti wrote: ... SGE on its own is not configured to use SSH? (I mean the entries in `qconf -sconf` for rsh_command resp. daemon). ... Nope, everything left as the default: $ qconf -sconf | grep _command qlogin_command builtin rlogin_command buil

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread r...@open-mpi.org
I do see a diff between 2.0.1 and 2.0.2 that might have a related impact. The way we handled the MCA param that specifies the launch agent (ssh, rsh, or whatever) was modified, and I don’t think the change is correct. It basically says that we don’t look for qrsh unless the MCA param has been ch

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread Glenn Johnson
Is this the same issue that was previously fixed in PR-1960? https://github.com/open-mpi/ompi/pull/1960/files Glenn On Fri, Feb 3, 2017 at 10:56 AM, r...@open-mpi.org wrote: > I do see a diff between 2.0.1 and 2.0.2 that might have a related impact. > The way we handled the MCA param that spe

Re: [OMPI users] Open MPI over RoCE using breakout cable and switch

2017-02-03 Thread Howard Pritchard
Hello Brendan, Sorry for the delay in responding. I've been on travel the past two weeks. I traced through the debug output you sent. It provided enough information to show that for some reason, when using the breakout cable, Open MPI is unable to complete initialization it needs to use the ope

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread r...@open-mpi.org
I don’t think so - at least, that isn’t the code I was looking at. > On Feb 3, 2017, at 9:43 AM, Glenn Johnson wrote: > > Is this the same issue that was previously fixed in PR-1960? > > https://github.com/open-mpi/ompi/pull/1960/files > > >

Re: [OMPI users] MPI_Comm_spawn question

2017-02-03 Thread r...@open-mpi.org
We know v2.0.1 has problems with comm_spawn, and so you may be encountering one of those. Regardless, there is indeed a timeout mechanism in there. It was added because people would execute a comm_spawn, and then would hang and eat up their entire allocation time for nothing. In v2.0.2, I see i