Re: [OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Gilles Gouaillardet
Paul, about the second point : mmap is called with the MAP_FIXED flag, before the fix, the required address was not aligned on a page size and hence mmap failed. the mmap failure was immediatly handled, but for some reasons i did not fully investigate yet, this failure was not correctly

[OMPI devel] [1.8.2rc3] build failure on OpenBSD (libevent)

2014-08-01 Thread Paul Hargrove
I am seeing the following on OpenBSD/amd64 with "make V=1": Making all in tools/wrappers /bin/sh ../../../libtool --tag=CC--mode=link gcc -std=gnu99 -g -finline-functions -fno-strict-aliasing -pthread -export-dynamic -o opal_wrapper opal_wrapper.o ../../../opal/libopen-pal.la -lutil -lm

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-01 Thread Paul Hargrove
I have confirmed that George's latest version works on both SPARC ABIs. ARMv7 and three MIPS ABIs still pending... -Paul On Fri, Aug 1, 2014 at 9:40 AM, George Bosilca wrote: > Another version of the atomic patch. Paul has tested it on a bunch of > platforms. At this

Re: [OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Paul Hargrove
Regarding review of the coll/ml fix: While the fix Gilles worked out overnight proved sufficient on Solaris/SPARC, Linux/PPC64 and Linux/IA64, I had two concerns: 1) As I already voiced on the list, I am concerned with the portability of _SC_PAGESIZE vs _SC_PAGE_SIZE (vs get_pagesize()). 2)

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-01 Thread Paul Hargrove
MIPS32, MIPS64 and ARMv7 tests are also pending. -Paul On Fri, Aug 1, 2014 at 9:40 AM, George Bosilca wrote: > Another version of the atomic patch. Paul has tested it on a bunch of > platforms. At this point we have confirmation from all architectures except > SPARC (v8+

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-01 Thread George Bosilca
Another version of the atomic patch. Paul has tested it on a bunch of platforms. At this point we have confirmation from all architectures except SPARC (v8+ and v9). George. atomics.patch Description: Binary data On Jul 31, 2014, at 19:13 , George Bosilca wrote: >

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32398 - in trunk: ompi/mca/bcol/basesmuma ompi/mca/coll/hierarch ompi/mca/coll/sm ompi/mca/dpm/orte ompi/mca/pml/bfo ompi/mca/rte/orte ompi/proc ompi/runtime

2014-08-01 Thread George Bosilca
I guess (r32401). George. On Fri, Aug 1, 2014 at 12:32 PM, Ralph Castain wrote: > I found the problem - the issue is that assert on the convertor. MPI apps > are setting that convertor, but not non-MPI apps, and so the field is NULL. > Can we remove that assert? > > > On

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32401 - trunk/opal/util

2014-08-01 Thread Ralph Castain
Thanks George! On Aug 1, 2014, at 9:36 AM, svn-commit-mai...@open-mpi.org wrote: > Author: bosilca (George Bosilca) > Date: 2014-08-01 12:36:23 EDT (Fri, 01 Aug 2014) > New Revision: 32401 > URL: https://svn.open-mpi.org/trac/ompi/changeset/32401 > > Log: > No more assert in the proc

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32398 - in trunk: ompi/mca/bcol/basesmuma ompi/mca/coll/hierarch ompi/mca/coll/sm ompi/mca/dpm/orte ompi/mca/pml/bfo ompi/mca/rte/orte ompi/proc ompi/runtime

2014-08-01 Thread Ralph Castain
I found the problem - the issue is that assert on the convertor. MPI apps are setting that convertor, but not non-MPI apps, and so the field is NULL. Can we remove that assert? On Aug 1, 2014, at 9:30 AM, George Bosilca wrote: > I missed the fact that the app doesn't

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32398 - in trunk: ompi/mca/bcol/basesmuma ompi/mca/coll/hierarch ompi/mca/coll/sm ompi/mca/dpm/orte ompi/mca/pml/bfo ompi/mca/rte/orte ompi/proc ompi/runtime

2014-08-01 Thread George Bosilca
I missed the fact that the app doesn't force it. But if this is indeed the case then it is extremely weird that you are seing someone else releasing your proc. Regarding the destruction of the proc, the OPAL layer only does in a single place, when the local proc is set (opal_proc_local_set).

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32398 - in trunk: ompi/mca/bcol/basesmuma ompi/mca/coll/hierarch ompi/mca/coll/sm ompi/mca/dpm/orte ompi/mca/pml/bfo ompi/mca/rte/orte ompi/proc ompi/runtime

2014-08-01 Thread Ralph Castain
On Aug 1, 2014, at 8:27 AM, George Bosilca wrote: > This commit brings two things. One if the renaming suggested by Gilles. The > second one is forcing the ORTE process down on the OPAL. This doesn't fit the > current design of the BTL move. The current design assumes

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-08-01 Thread Pritchard Jr., Howard
Sorry, finally got through all this ompi email and see this problem was fixed. -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Pritchard Jr., Howard Sent: Friday, August 01, 2014 8:59 AM To: Open MPI Developers Subject: Re: [OMPI devel] openmpi-1.8.2rc2

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-08-01 Thread Pritchard Jr., Howard
Hi Jeff, Finally got info yesterday about where the newer PGI compilers are hiding out at LANL. I'll check this out today. Howard -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) Sent: Tuesday, July 29, 2014 5:24 PM To: Open MPI

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Ralph Castain
Oh, should point out: I didn't deal with the potential btl/tcp issue you noted - I defer that to George On Aug 1, 2014, at 7:56 AM, Ralph Castain wrote: > Hi Gilles > > I'm not sure if we have a problem or not - we'll have to wait and see, I > guess. So far, I'm not

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Ralph Castain
Hi Gilles I'm not sure if we have a problem or not - we'll have to wait and see, I guess. So far, I'm not seeing any problems on x86 archs, but that's to be expected and I don't have access to anything else. I fixed the issues you noted plus a few others I found. I imagine we'll discover more

Re: [OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Ralph Castain
Okay, I fixed those two and will release rc4 once the coll/ml fix has been reviewed. Thanks On Aug 1, 2014, at 2:46 AM, Mike Dubman wrote: > Also, latest commit into openib (origin/v1.8 > https://svn.open-mpi.org/trac/ompi/changeset/32391) broke something: > >

Re: [OMPI devel] [1.8.2rc3] Build failure on FreeBSD (missing header)

2014-08-01 Thread Ralph Castain
Thanks Paul - added and cmr'd On Jul 31, 2014, at 11:23 PM, Paul Hargrove wrote: > > /home/phargrov/OMPI/openmpi-1.8.2rc3-freebsd10-amd64/openmpi-1.8.2rc3/orte/mca/ess/base/ess_base_std_app.c:412:36: > error: use of undeclared identifier 'S_IRUSR' > fd =

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Gilles Gouaillardet
one last point : in orte_process_name_t, jobid and vpid have type orte_jobid_t and orte_vpid_t which really is uint32_t. in orte/util/proc.c, the function pointers opal_process_name_vpid and opal_process_name_jobid return an int32_t should it be an uint32_t instead ? /* and then

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-01 Thread Kawashima, Takahiro
George, I compiled trunk with your patch for SPARCV9/Linux/GCC. I see following warning/errors. In file included from opal/include/opal/sys/atomic.h:175, from opal/asm/asm.c:21:

Re: [OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Mike Dubman
Also, latest commit into openib (origin/v1.8 https://svn.open-mpi.org/trac/ompi/changeset/32391) broke something: *11:45:01* + timeout -s SIGSEGV 3m /scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/bin/mpirun -np 8 -mca pml ob1 -mca btl self,openib

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul, i just commited r32393 (and made a CMR for v1.8) can you please give it a try ? in the mean time, i received your email ... sysconf is called directly (e.g. no #ifdef protected) in several other places : $ grep -R sysconf . | grep -v svn | grep -v sysconfdir | grep -v autom4te |grep

Re: [OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Paul Hargrove
Note that the Solaris unresolved alloca problem George fixed in r32388 is still present in 1.8.2rc3. I have manually confirmed that the same patch resolves the problem in 1.8.2rc3. -Paul On Thu, Jul 31, 2014 at 9:44 PM, Ralph Castain wrote: > Usual place - this is a

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Paul Hargrove
Hmm, maybe this has nothing to do with big-endian. Below is a backtrace from ring_c on an IA64 platform (definitely little-endian) that looks very similar to me. It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems. So, I wonder if that might be related. -Paul $ mpirun

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Paul Hargrove
Gilles's findings are consistent with mine which showed the SEGVs to be in the coll/ml code. I've built with --enable-debug and so below is a backtrace (well, two actually) that might be helpful. Unfortunately the output of the two ranks did get slightly entangled. -Paul $ ../INST/bin/mpirun

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul and Ralph, for what it's worth : a) i faced the very same issue on my (slw) qemu emulated ppc64 vm b) i was able to run very basic programs when passing --mca coll ^ml to mpirun Cheers, Gilles On 2014/08/01 12:30, Ralph Castain wrote: > Yes, I fear this will require some effort to

[OMPI devel] [1.8.2rc3] Build failure on FreeBSD (missing header)

2014-08-01 Thread Paul Hargrove
/home/phargrov/OMPI/openmpi-1.8.2rc3-freebsd10-amd64/openmpi-1.8.2rc3/orte/mca/ess/base/ess_base_std_app.c:412:36: error: use of undeclared identifier 'S_IRUSR' fd = open(myfile, O_CREAT, S_IRUSR); ^ To fix this it was sufficient to add the following 3

[OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Ralph Castain
Usual place - this is a last-chance check, so please hit it. Main change from rc2 is the repairs to the Fortran binding config logic http://www.open-mpi.org/software/ompi/v1.8/ Ralph

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Paul Hargrove
George's patch worked for me. Now of course since this is a big-endian system things are still busted on trunk, but ring_c is now hung instead of failing at load time. -Paul On Thu, Jul 31, 2014 at 9:30 PM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Paul, > > George just

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Gilles Gouaillardet
Paul, George just made a good point, you should test with his patch first if it still does not work, could you try to mix gnu and sun compilers ? configure ... CC=/usr/sfw/bin/gcc CXX=/usr/sfw/bin/g++ FC= Cheers, Gilles On 2014/08/01 13:19, Paul Hargrove wrote: > Gilles, > > This test was

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Paul Hargrove
Gilles, This test was using the Solaris Studio Compilers version 12.3. /usr/bin/gcc on this system is "gccfss" which Open MPI does NOT support. There is also a gcc-3.3.2 in /usr/local/bin and gcc-3.4.3 in /usr/sfw/bin Neither includes usable fortran compilers, which is why the Studio compilers

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread George Bosilca
I am afraid the suggestion on the mailing list only addressed half of the problem. Indeed, alloca is used in two files (isend and irecv) while the suggested patch only only fixed the one in isend. George. On Fri, Aug 1, 2014 at 12:12 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org>

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Gilles Gouaillardet
Paul, As Ralph pointed, this issue was reported last month on the user mailing list. #include did not help : http://www.open-mpi.org/community/lists/users/2014/07/24883.php I will try if i can reproduce and fix this issue on a solaris10 (but x86) VM BTW, are you using the GNU compiler ?

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Paul Hargrove
In general I am only setup to build from tarballs, not svn. However, I can (and will) apply this change manually w/o difficulty. I will report back when I've had a chance to try that. I already have many builds in-flight to test George's atomics patch and am in danger of confusing myself if I am

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Ralph Castain
Yes, I fear this will require some effort to chase all the breakage down given that (to my knowledge, at least) we lack PPC machines in the devel group. On Jul 31, 2014, at 5:46 PM, Paul Hargrove wrote: > On the path to verifying George's atomics patch, I have started just

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Ralph Castain
FWIW: we had Siegmar try that and it didn't solve the problem. Paul? On Jul 31, 2014, at 8:28 PM, svn-commit-mai...@open-mpi.org wrote: > Author: bosilca (George Bosilca) > Date: 2014-07-31 23:28:23 EDT (Thu, 31 Jul 2014) > New Revision: 32388 > URL:

Re: [OMPI devel] trunk link failure on Solaris-10/SPARC

2014-08-01 Thread Ralph Castain
Anything you can suggest to resolve that problem would be most appreciated, Paul. It's been reported before, but we have no idea what it is looking for. On Jul 31, 2014, at 8:15 PM, Paul Hargrove wrote: > > $ INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c' >

Re: [OMPI devel] trunk link failure on Solaris-10/SPARC

2014-08-01 Thread George Bosilca
A missing include. Should be fixed by r32388. Thanks, George. On Thu, Jul 31, 2014 at 11:15 PM, Paul Hargrove wrote: > > $ INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c' > ld.so.1: ring_c: fatal: relocation error: file >

[OMPI devel] trunk link failure on Solaris-10/SPARC

2014-08-01 Thread Paul Hargrove
$ INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c' ld.so.1: ring_c: fatal: relocation error: file /home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u3-v8plus/INST/lib/openmpi/mca_pml_ob1.so: symbol alloca: referenced symbol not found This platform has worked in the past. I will be