Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Paul Hargrove
On Wed, Jul 30, 2014 at 8:53 PM, Paul Hargrove wrote: [...] > I have a clear answer to *what* is different (below) and am next looking > into the why/how now. > It seems that 1.8.1 has included all dependencies into libmpi_usempif08 > while 1.8.2rc2 does not. > [...] The difference appears to s

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread tmishima
Hi Paul, Thank you for your investigation. I'm sure it's very close to fix the problem although I myself can't do that. So I must owe you something... Please try Awamori, which is Okinawa's sake and very good in such a hot day. Tetsuya > On Wed, Jul 30, 2014 at 8:53 PM, Paul Hargrove wrote: >

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Gilles Gouaillardet
Paul and all, For what it's worth, with openmpi 1.8.2rc2 and the intel fortran compiler version 14.0.3.174 : $ nm libmpi_usempif08.so| grep -i sizeof there is no such undefined symbol (mpi_f08_sizeof_) as a temporary workaround, did you try to force the linker use libforce_usempif08_internal_mo

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Gilles Gouaillardet
Paul, in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb test program, compile and run nm | grep f08 on the object : $ cat foo.f90 program foo use mpi_f08_sizeof implicit none real :: x integer :: size, ierror call MPI_Sizeof_real_s_4(x, size, ierror) stop end program wi

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Paul Hargrove
Gilles, Just as you speculate, PGI is creating a _-suffixed reference to the module name: $ pgf90 -c test.f90 $ nm -u test.o | grep f08 U mpi_f08_sizeof_ U mpi_f08_sizeof_mpi_sizeof_real_s_4_ You suggested the following work-around in a previous email: $ INS

Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol

2014-07-31 Thread Kenneth A. Lloyd
Doesn't namespacing obviate the need for this convoluted identifier scheme? See, for example, UML package import and include behaviors. -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Dave Goodell (dgoodell) Sent: Wednesday, July 30, 2014 3:35 PM To: Open MP

[OMPI devel] RFC: Change default behavior of calling ibv_fork_init

2014-07-31 Thread Rolf vandeVaart
WHAT: Change default behavior in openib to not call ibv_fork_init() even if available. WHY: There are some strange interactions with ummunotify that cause errors. In addition, see the additional points below. WHEN: After next weekly meeting, August 5, 2014 DETAILS: This change will just be a co

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm
+2^1000 This information is absolutely necessary at this point. If someone has a better solution they can provide it as an alternative RFC. Until then this is how it should be done... Otherwise we loose uGNI support on the trunk. Because we ARE NOT going to remove the mailbox size optimizatio

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Pritchard Jr., Howard
Hi Folks, I think given the way we want to use the btl's in lower levels like opal, it is pretty disgusting for a btl to need to figure out on its own something like a "global job size". That's not its business. Can't we add some attributes to the component's initialization method that provides

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread George Bosilca
What is your definition of “global job size”? George. On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard wrote: > Hi Folks, > > I think given the way we want to use the btl's in lower levels like opal, > it is pretty disgusting for a btl to need to figure out on its own something > like a "gl

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Ralph Castain
I'd like to suggest an alternative solution. A BTL can exploit whatever data it wants, but should first test if the data is available. If the data is *required*, then the BTL gracefully disqualifies itself. If the data is *desirable* for optimization, then the BTL writer (if they choose) can inc

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm
The maximum number of peer processes that may be added over the course of the job will suffice. So either the world or universe size. This is a reasonable piece of information to expect the upper layers to provide to the communication layer. And the impact of providing this information is no less

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread George Bosilca
I definitively think you misunderstood this scope of this RFC. The information that is so important to you to configure the mailbox size is available to you when you need it. This information is made available by the PML through the call to add_procs, which comes with all the procs in the MPI_CO

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Pritchard Jr., Howard
Hi George, The ompi_process_info.num_procs thing that seems to have been an object of some contention yesterday. The ugni use of this is cloned off of the way I designed the mpich netmod. Leveraging off size of the job was an easy way to scale the mailbox size. If I'd been asked to have the ne

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm
I do not like the fact that add_procs is called with every proc in the MPI_COMM_WORLD. That needs to change, so, I will not rely on the number of procs being added being the same as the world or universe size. -Nathan On Thu, Jul 31, 2014 at 09:22:00AM -0600, George Bosilca wrote: >I definiti

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Ralph Castain
Like I said, why don't we just do the following: > I'd like to suggest an alternative solution. A BTL can exploit whatever data > it wants, but should first test if the data is available. If the data is > *required*, then the BTL gracefully disqualifies itself. If the data is > *desirable* for

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm
This approach will work now but we need to start thinking about how we want to support multiple simultaneous btl users. Does each user call add_procs with a single module (or set of modules) or does each user call btl_component_init and get their own module? If we do the latter then it might make

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Ralph Castain
Fair enough - yeah, that is an issue I've been avoiding :-) On Jul 31, 2014, at 9:14 AM, Nathan Hjelm wrote: > > This approach will work now but we need to start thinking about how we > want to support multiple simultaneous btl users. Does each user call > add_procs with a single module (or set

Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol

2014-07-31 Thread Kenneth A. Lloyd
Yeah, I forgot that pure ANSI C doesn't really have namespaces, other than to fully qualify modules and variables. Bummer. Makes writing large, maintainable middleware more difficult. -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Kenneth A. Lloyd Sent: Th

[OMPI devel] RFC: namespaces to isolate against code moves

2014-07-31 Thread Dave Goodell (dgoodell)
WHAT: Allow reservation of a symbol namespace that is independent of component location. WHY: All of the framework location/abstraction churn over the years has made it challenging to maintain single source versions of MCA components (e.g., the "usnic" BTL) that work with multiple versions of O

[OMPI devel] Further questions about BTL OPAL move...

2014-07-31 Thread Jeff Squyres (jsquyres)
George -- Got 2 questions for ya: 1. I see some orte_* specific symbols/functions in ompi_mpi_init.c. Was that intentional? Shouldn't that stuff be in the RTE framework, or some such? 2. In tracking down some stuff relating to process names, it looks like names are now setting set by ompi/pr

Re: [OMPI devel] Further questions about BTL OPAL move...

2014-07-31 Thread George Bosilca
On Jul 31, 2014, at 18:26 , Jeff Squyres (jsquyres) wrote: > George -- > > Got 2 questions for ya: > > 1. I see some orte_* specific symbols/functions in ompi_mpi_init.c. Was that > intentional? Shouldn’t that stuff be in the RTE framework, or some such? Good catch. Fixed in r32384. > 2.

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-31 Thread George Bosilca
All, Here is the patch that change the meaning of the atomics to make them always return the previous value (similar to sync_fetch_and_<*>). I tested this with the following atomics: OS X, gcc style intrinsics and AMD64. I did not change the base assembly files used when GCC style assembly ope

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-31 Thread Paul Hargrove
On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca wrote: > Paul, I know you have a pretty diverse range computers. Can you try to > compile and run a "make check" with the following patch? I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever subset of those is still supported). The

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-31 Thread Paul Hargrove
On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove wrote: > > On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca > wrote: > >> Paul, I know you have a pretty diverse range computers. Can you try to >> compile and run a "make check" with the following patch? > > > I will see what I can do for ARMv7, MIP

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-31 Thread George Bosilca
Awesome, thanks Paul. When the results will be in we will fix whatever is needed for these less common architectures. George. On Thu, Jul 31, 2014 at 7:24 PM, Paul Hargrove wrote: > > > On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove wrote: > >> >> On Thu, Jul 31, 2014 at 4:13 PM, George Bo

Re: [OMPI devel] Further questions about BTL OPAL move...

2014-07-31 Thread Ralph Castain
On Jul 31, 2014, at 3:41 PM, George Bosilca wrote: > > On Jul 31, 2014, at 18:26 , Jeff Squyres (jsquyres) > wrote: > >> George -- >> >> Got 2 questions for ya: >> >> 1. I see some orte_* specific symbols/functions in ompi_mpi_init.c. Was >> that intentional? Shouldn’t that stuff be in

[OMPI devel] Trunk broken for PPC64?

2014-07-31 Thread Paul Hargrove
On the path to verifying George's atomics patch, I have started just by verifying that I can still build the UNPATCHED trunk on each of the platforms I listed. I have tried two PPC64/Linux systems so far and am seeing the same problem on both. Though I can pass "make check" both platforms SEGV on

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Jeff Squyres (jsquyres)
Many thanks guys, this thread was most helpful in finding the fix. Paul H. nailed 80% of it on the head in the post where he identified the Makefile.am change. That Makefile.am change was due to three things: 1. Fixing a real bug (elsewhere in that commit) 2. My misunderstanding of how module f

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Paul Hargrove
Related question: If I am understanding PGI's list of fixed-TPRs (bugs) then it looks like one (certainly not the only) difference between 13.x and 14.1 is a fix to a problem with PROCEDURE and zero-argument subroutines. As it happens, the configure probe for PROCEEDURE is a zero-argument subrout

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Paul Hargrove
Second related issue: Can/should examples/hello_usempif08.f90 be extended to use more of the module such that it would have illustrated the bug found with Tetsuya's example code? I don't know about MTT, but my scripts for testing a release candidate includes running "make" in the example subdir.

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Paul Hargrove
Nevermind my suggestion to revise examples/hello_usempif08.f90 I've just determined that it is already sufficient to reproduce the problem. (So now I need to see what's wrong in my testing scripts). -Paul On Thu, Jul 31, 2014 at 7:04 PM, Paul Hargrove wrote: > Second related issue: > > Can/sho

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Gilles Gouaillardet
Paul, the ibm test suite from the non public ompi-tests repository has several tests for usempif08. Cheers, Gilles On 2014/08/01 11:04, Paul Hargrove wrote: > Second related issue: > > Can/should examples/hello_usempif08.f90 be extended to use more of the > module such that it would have illust

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-31 Thread Paul Hargrove
George: Have a failure with your patch applied on PPC64/Linux and gcc-4.4.6: Making all in asm make[2]: Entering directory `/home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/BLD/opal/asm' CC asm.lo In file included from /home/hargrov1/OMPI/openmpi-trunk-linux-ppc64-gcc/openmpi-1.9a1r32369

[OMPI devel] trunk link failure on Solaris-10/SPARC

2014-07-31 Thread Paul Hargrove
$ INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c' ld.so.1: ring_c: fatal: relocation error: file /home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u3-v8plus/INST/lib/openmpi/mca_pml_ob1.so: symbol alloca: referenced symbol not found This platform has worked in the past. I will be tr

Re: [OMPI devel] trunk link failure on Solaris-10/SPARC

2014-07-31 Thread George Bosilca
A missing include. Should be fixed by r32388. Thanks, George. On Thu, Jul 31, 2014 at 11:15 PM, Paul Hargrove wrote: > > $ INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c' > ld.so.1: ring_c: fatal: relocation error: file > /home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u

Re: [OMPI devel] trunk link failure on Solaris-10/SPARC

2014-07-31 Thread Ralph Castain
Anything you can suggest to resolve that problem would be most appreciated, Paul. It's been reported before, but we have no idea what it is looking for. On Jul 31, 2014, at 8:15 PM, Paul Hargrove wrote: > > $ INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c' > ld.so.1: ring_c: fatal: re

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-07-31 Thread Ralph Castain
FWIW: we had Siegmar try that and it didn't solve the problem. Paul? On Jul 31, 2014, at 8:28 PM, svn-commit-mai...@open-mpi.org wrote: > Author: bosilca (George Bosilca) > Date: 2014-07-31 23:28:23 EDT (Thu, 31 Jul 2014) > New Revision: 32388 > URL: https://svn.open-mpi.org/trac/ompi/changeset/

Re: [OMPI devel] Trunk broken for PPC64?

2014-07-31 Thread Ralph Castain
Yes, I fear this will require some effort to chase all the breakage down given that (to my knowledge, at least) we lack PPC machines in the devel group. On Jul 31, 2014, at 5:46 PM, Paul Hargrove wrote: > On the path to verifying George's atomics patch, I have started just by > verifying that