Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Paul Hargrove
Gilles, I am NOT seeing the problem with gcc. It is only occurring with the Studio compilers. As I've already reported, I have tried adding either "-mt" or "-mt=yes" to both LDFLAGS and --with-wrapper-ldflags. The "cc" manpage (on the Solaris-10 system I can get to right now) says: -mt Co

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Gilles Gouaillardet
Paul, did you manually set -mt ? if i remember correctly, solaris 11 (at least with gcc compilers) do not need any flags (except the -D_REENTRANT that is added automatically) Cheers, Gilles On 2014/12/16 12:10, Paul Hargrove wrote: > Gilles, > > I will try the patch when I can. > However, our

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Paul Hargrove
Gilles, I will try the patch when I can. However, our network is undergoing network maintenance right now, leaving me unable to reach the necessary hosts. As for -D_REENTRANT, I had already reported having verified in the "make" output that it had been added automatically. Additionally, the docs

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Ralph Castain
My correction - the fix is in the nightly tarball from tonight. You can get it here: openmpi-v1.8.3-272-g4e4f997.tar.bz2 On Mon, Dec 15, 2014 at 2:40 PM, Ralph Castain wrote: > > Hey Tom > > Note that rc2 had a bug in t

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Gilles Gouaillardet
Paul, could you please make sure configure added "-D_REENTRANT" to the CFLAGS ? /* otherwise, errno is a global variable instead of a per thread variable, which can explains some weird behaviour. note this should have been already fixed */ assuming -D_REENTRANT is set, could you please give the

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Paul Hargrove
I have tried with a oob_tcp_if_include setting so that there is now only 1 interface. Even with just one interface and -mt=yes in both LDFLAGS and wrapper-ldflags I *still* getting messages like [pcp-j-20:11470] mca_oob_tcp_accept: accept() failed: Error 0 (0).

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Ralph Castain
Hey Tom Note that rc2 had a bug in the out-of-band messaging system - might be what you are hitting. I'd suggest working with rc4. On Mon, Dec 15, 2014 at 12:57 PM, Tom Wurgler wrote: > > I have to take it back. While the first job was less than a node's > worth of cores and ran properly on t

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Paul Hargrove
A little more reading finds that... Docs says that one needs "-mt" without the "=yes". That will work for both old and new compilers, where "-mt=yes" chokes older ones. Also, man pages say "-mt" must come before "-lpthread" in the link command. -Paul On Mon, Dec 15, 2014 at 12:52 PM, Paul Hargr

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Tom Wurgler
I have to take it back. While the first job was less than a node's worth of cores and ran properly on the cores I wanted. more testing is revealing other problems. Anything that spans more than one node crashes and burns, with a core dump, and nothing in the files to indicate why. Note this i

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Paul Hargrove
On Mon, Dec 15, 2014 at 5:35 AM, Ralph Castain wrote: > > 7. Linkage issue on Solaris-11 reported by Paul Hargrove. Missing the > multi-threaded C libraries, apparently need "-mt=yes" in both compile and > link. Need someone to investigate. The lack of multi-thread libraries is my SPECULATION.

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Tom Wurgler
It seems to be working in rc2 after all. I was still trying to use a rankfile, but it appears that is no longer needed. Thanks! From: devel on behalf of Ralph Castain Sent: Monday, December 15, 2014 8:45 AM To: Open MPI Developers Subject: Re: [OMPI devel] 1.

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-509-g38d6627

2014-12-15 Thread Hjelm, Nathan Thomas
It will take about 5 mins to either fix or determine if more work is needed. From: devel on behalf of Howard Pritchard Sent: Monday, December 15, 2014 10:05:24 AM To: Open MPI Developers Subject: Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch ma

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-509-g38d6627

2014-12-15 Thread Howard Pritchard
I'd prefer Paul's suggestion to disable xpmem for sgi/uv for 1.8.X Is anyone actually supporting this? Howard 2014-12-15 8:56 GMT-07:00 Nathan Hjelm : > > > Not yet. I am still trying to pinpoint the problem. From what I can tell > the SGI version of XPMEM should be nearly identical to the Cray >

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-509-g38d6627

2014-12-15 Thread Nathan Hjelm
Not yet. I am still trying to pinpoint the problem. From what I can tell the SGI version of XPMEM should be nearly identical to the Cray version. I should have this figured out this week. If I don't get it fixed by Wed I will open a pull request to remove the check for sn/xpmem.h. -Nathan On Fri

Re: [OMPI devel] 1.8.4rc4 now out for testing

2014-12-15 Thread Marco Atzeri
On 12/14/2014 12:06 AM, Ralph Castain wrote: Hi folks I’ve rolled up the bug fixes so far, including the thread-multiple performance fix. So please give this one a whirl http://www.open-mpi.org/software/ompi/v1.8/ Ralph No regression on Cygwin 64 bit Only and usual FAIL: atomic_cmpset_noin

Re: [OMPI devel] 1.8.4rc4 now out for testing

2014-12-15 Thread Adrian Reber
1.8.4rc4 works without errors on my PSM based systems. Adrian On Sat, Dec 13, 2014 at 03:06:07PM -0800, Ralph Castain wrote: > Hi folks > > I’ve rolled up the bug fixes so far, including the thread-multiple > performance fix. So please give this one a whirl > > http://www.open-

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Ralph Castain
Should be there in rc4, and I thought it made it to rc2 for that matter. I'll take a gander. FWIW: I'm working off-list with IBM to tighten the LSF integration so we correctly read and follow their binding directives. This will also be in 1.8.4 as we are in final test with it now. Ralph On Mon,

Re: [OMPI devel] 1.8.4rc Status

2014-12-15 Thread Tom Wurgler
Forgive me if I've missed it, but I believe using physical OR logical core numbering was going to be reimplemented in the 1.8.4 series. I've checked out rc2 and as far as I can tell, it isn't there as yet. Is this correct? thanks! From: devel on behalf o

[OMPI devel] 1.8.4rc Status

2014-12-15 Thread Ralph Castain
Hi folks Trying to summarize the current situation on releasing 1.8.4. Remaining identified issues: 1. TCP/BTL hang under mpi-thread-multiple. Asked George to look into it. 2. hwloc updates required. Brice committed them to the hwloc 1.7 repo. Gilles volunteered to create the PR from there. 3.

Re: [OMPI devel] 1.8.4rc3: WARNING: No loopback interface was found

2014-12-15 Thread Ralph Castain
Yes - it's been fixed in rc4 On Mon, Dec 15, 2014 at 5:16 AM, Eric Chamberland < eric.chamberl...@giref.ulaval.ca> wrote: > > Hi, > > I first saw this message using 1.8.4rc3: > > -- > WARNING: No loopback interface was found.

Re: [OMPI devel] 1.8.4rc3: WARNING: No loopback interface was found

2014-12-15 Thread Eric Chamberland
Forgot this: ompi_info -all : http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.184rc3.txt.gz config.log: http://www.giref.ulaval.ca/~ericc/ompi_bug/config.184rc3.log.gz Eric

[OMPI devel] 1.8.4rc3: WARNING: No loopback interface was found

2014-12-15 Thread Eric Chamberland
Hi, I first saw this message using 1.8.4rc3: -- WARNING: No loopback interface was found. This can cause problems when we spawn processes as they are likely to be unable to connect back to their host daemon. Sadly, it may ta

Re: [OMPI devel] 1.8.4rc4 now out for testing

2014-12-15 Thread Paul Hargrove
On Sun, Dec 14, 2014 at 10:52 PM, Paul Hargrove wrote: > > Solaris-10/SPARC and "--enable-static --disable-shared" appears broken for > C++ apps (but OK for C). > I will report in more details when I have more information. > First the good news: The problem I was experiencing (with the Solaris S

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-15 Thread Pascal Deveze
George, Thanks for the patch. That was the solution. Pascal De : devel [mailto:devel-boun...@open-mpi.org] De la part de George Bosilca Envoyé : samedi 13 décembre 2014 08:38 À : Open MPI Developers Objet : Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_i

Re: [OMPI devel] 1.8.4rc4 now out for testing

2014-12-15 Thread Paul Hargrove
My testing on 1.8.4rc4 is not quite done, but is getting close. With two exceptions, so far all looks good to me on almost 60 different platforms. I've retested on my Solaris systems and saw none of the issues I had with rc3. The x86-64/Linux system with mtl:psm is no longer giving a SEGV at exit.