[OMPI devel] make install (libtool) failure on Solaris 10 (1.5rc5 and 1.4.3rc1)

2010-08-25 Thread Paul H. Hargrove
This has got to be the stupidest failure I have ever seen! $ make install [...] make[3]: Entering directory `/export/home/phargrov/openmpi-1.5rc5/BLD-gcc-vt/ompi' test -z "/usr/local/pkg/ompi-1.5rc5/lib" || ../../config/install-sh -c -d "/usr/local/pkg/ompi-1.5rc5/lib" /bin/bash ../libtool --

[OMPI devel] atomic_spinlock test failure with xlc/ppc64 (1.5rc5 and 1.4.3rc1)

2010-08-25 Thread Paul H. Hargrove
I know Linux/PPC64 is listed as an under-tested platform, and the BG/P release of XLC is probably not supported at all, but I tested it anyway (on a front-end, not the BG/P compute nodes) and have the following to report. I report here only the 1.5rc5 case, but results are identical with 1.4.3

[OMPI devel] bitbucket announced downtime for upgrade

2010-08-25 Thread Jeff Squyres
For all of you using bitbucket: http://blog.bitbucket.org/2010/08/25/bitbucket-downtime-for-a-hardware-upgrade/ -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/

[OMPI devel] "make check" (libtool?) failure on Solaris/SPARC (1.5rc5 and 1.4.3rc1)

2010-08-25 Thread Paul H. Hargrove
I have been able to configure and build both 1.5rc5 and 1.4.3rc1 on Solaris 10 for SPARC, using Sun C 5.10. I have also build 1.5rc5 w/ gcc-3.3.2 (and expect 1.4.3rc1 to build w/ gcc as well, once I have time) All 3 builds fail "make check" in a way that suggests to me that libtool is not work

[OMPI devel] Checkpoint/restart question

2010-08-25 Thread Tomas Oppelstrup
Hi, I have a question about checkpoint-restart operation with opem-mpi. I hope this is an apropriate forum for my question. I do not have access to recopmile the kernel or load kernel modules, so I would like to use the condor checkpoint-restart library. Can that me made to work with openmpi's che

Re: [OMPI devel] Problem w/ documented SPARC/gcc flags (1.5rc5 and 1.4.3rc1)

2010-08-25 Thread Paul H. Hargrove
In the message below I fouled up some cut-and-paste. Please mentally replace And have configured (again stopping after the Assembler ABI probe) with gcc-4.3.3 AND Rolf's flags CC=gcc-4.3.3 CXX=g++-4.3.3 CFLAGS=-mv8plus CC=gcc-4.3.3 CXX=g++-4.3.3 CFLAGS=-mv8plus with And have configured (ag

Re: [OMPI devel] Problem w/ documented SPARC/gcc flags (1.5rc5 and 1.4.3rc1)

2010-08-25 Thread Paul H. Hargrove
Trying Rolf's suggestion, I configure 1.4.3rc1 with CFLAGS="-mv8plus -Wa,-xarch=v8plus" CXXFLAGS="-mv8plus -Wa,-xarch=v8plus" I find that I get configure past the v8+/v9 Assembler ABI probe (but didn't wait for the full configure to run). Another datapoint in favor of #2 is that I can succes

Re: [OMPI devel] Problem w/ documented SPARC/gcc flags (1.5rc5 and 1.4.3rc1)

2010-08-25 Thread Rolf vandeVaart
Paul, is it possible for you to try one more thing. Can you reconfigure with CFLAGS="-mv8plus -Wa,-xarch=v8plus" I think this will get past the configure test as the configure test is compiling a piece of assembly, and for some reason, the -mv8plus is not finding its way to the assembler.

[OMPI devel] nit-pick: typo in README (1.4.3rc1 and 1.5rc5)

2010-08-25 Thread Paul H. Hargrove
The following patch applies to both 1.4.3rc1 and 1.5rc5 to fix a typo in the README: --- README.orig2010-08-25 14:45:09.0 -0700 +++ README 2010-08-25 14:45:20.0 -0700 @@ -69,7 +69,7 @@ - Asynchronous, transparent checkpoint/restart support - Fully coordinate

[OMPI devel] 1.5rc5: attribute((noreturn)) and pointers to functions

2010-08-25 Thread Paul H. Hargrove
Building 1.5rc5 with xlc on linux/ppc I see many instances of the following warnings "../../../../orte/mca/ess/ess.h", line 61.16: 1506-959 (W) The attribute "noreturn" is not a valid type attribute and is ignored. "../../../../orte/mca/errmgr/errmgr.h", line 134.16: 1506-959 (W) The attribute

[OMPI devel] Some positive test results (1.5rc5 and 1.4.3rc1)

2010-08-25 Thread Paul H. Hargrove
I have mostly be sending the negative findings, bur rest assured that I have had positive ones too. I won't list them all, but wanted in particular to note that I have built successfully both 1.5rc5 and 1.4.3rc1 for the following transports that may or may not be getting tested by others el

[OMPI devel] Problem w/ documented SPARC/gcc flags (1.5rc5 and 1.4.3rc1)

2010-08-25 Thread Paul H. Hargrove
In both 1.5rc5 and 1.4.3rc1, README says: - Open MPI does not support the Sparc v8 CPU target, which is the default on Sun Solaris. The v8plus (32 bit) or v9 (64 bit) targets must be used to build Open MPI on Solaris. This can be done by including a flag in CFLAGS, CXXFLAGS, FFLAGS, and FCFLA

[OMPI devel] VT "platform" selection needs documentation

2010-08-25 Thread Paul H. Hargrove
I wanted to test builds of OpenMPI 1.5rc5 and 1.4.3rc1 on Linux/PPC64. As it happens the only such hast I currently have access to is the front-end for a BG/P. It was NOT my intention to build Open MPI (or VapirTrace) for the BG/P, but VT's configure logic decided I was on a BG/P and so built

Re: [OMPI devel] delivering SIGUSR2 to an ompi process

2010-08-25 Thread Ralph Castain
Could be a bug on our part. If you --enable-debug in your configure, you can then set -mca odls_base_verbose 5 and (amidst a lot of other stuff) you'll see the signal being delivered to the proc if you sent it to mpirun. If you send the signal direct to the proc yourself, we shouldn't touch it.

Re: [OMPI devel] delivering SIGUSR2 to an ompi process

2010-08-25 Thread Steve Wise
On 08/25/2010 12:43 PM, Ralph Castain wrote: On Aug 25, 2010, at 11:26 AM, Steve Wise wrote: On 08/25/2010 11:33 AM, Ralph Castain wrote: We don't use it - mpirun traps it and then propagates it by default to all remote procs. So I should send the signal to the mpirun pro

Re: [OMPI devel] delivering SIGUSR2 to an ompi process

2010-08-25 Thread Ralph Castain
On Aug 25, 2010, at 11:26 AM, Steve Wise wrote: > On 08/25/2010 11:33 AM, Ralph Castain wrote: >> We don't use it - mpirun traps it and then propagates it by default to all >> remote procs. >> >> > > So I should send the signal to the mpirun process? Yes - however, note that it will be pro

Re: [OMPI devel] Suspicious warnings from gcc-4.5.0 (both 1.4.3rc1 and 1.5rc5)

2010-08-25 Thread Jeff Squyres
Ralph pinged Edgar and me about this off-list -- I speculated that that chunk of code could be replaced with: OBJ_RELEASE(ompi_mpi_comm_parent); ompi_mpi_comm_parent = newcomm; ...but that was after only a quick look at the code. :-) There might well have been a good reason why it wasn'

Re: [OMPI devel] delivering SIGUSR2 to an ompi process

2010-08-25 Thread Steve Wise
On 08/25/2010 11:33 AM, Ralph Castain wrote: We don't use it - mpirun traps it and then propagates it by default to all remote procs. So I should send the signal to the mpirun process? What OMPI version is this? 1.4.1 On Aug 25, 2010, at 10:23 AM, Steve Wise wrote: Hey

Re: [OMPI devel] Suspicious warnings from gcc-4.5.0 (both 1.4.3rc1 and 1.5rc5)

2010-08-25 Thread George Bosilca
In this particular case the compiler is both right and wrong. It is right to complain, because as Paul pointed out, there is a free on a non-malloced object (ompi_mpi_comm_null). However, this free is protected by the reference count going to zero, and this should never happens in this particula

Re: [OMPI devel] 1.5rc5 - warnings from Sun C 5.10

2010-08-25 Thread George Bosilca
Right, that was a pretty intense discussion. However, I don't think we talked about replacing the = by a +. The difference is that = means write and + means read&write. Here is the assembly output from gcc -O3: with output "=m" (*v) | with output "+m" (*v) and input "ir"

[OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r23664

2010-08-25 Thread Jeff Squyres
FYI. This is an ABI-changing commit. We've unfortunately had some F90 API parameters wrong *for years* and no one noticed. I'm inclined not to change this in the 1.4 series, just because it changes the ABI. But it should go into 1.5.0 since we're already breaking ABI from 1.4.x -> 1.5.0. B

Re: [OMPI devel] delivering SIGUSR2 to an ompi process

2010-08-25 Thread Ralph Castain
We don't use it - mpirun traps it and then propagates it by default to all remote procs. What OMPI version is this? On Aug 25, 2010, at 10:23 AM, Steve Wise wrote: > Hey Open MPI wizards, > > I'm trying to debug something in my library that gets loaded into my mpi > processes when they are st

[OMPI devel] delivering SIGUSR2 to an ompi process

2010-08-25 Thread Steve Wise
Hey Open MPI wizards, I'm trying to debug something in my library that gets loaded into my mpi processes when they are started via mpirun. With other MPIs, I've been able to deliver SIGUSR2 to the process and trigger some debug code I have in my library that sets up a handler for SIGUSR2. Ho

Re: [OMPI devel] Suspicious warnings from gcc-4.5.0 (both 1.4.3rc1 and 1.5rc5)

2010-08-25 Thread Paul H. Hargrove
Ralph, This is seen when compiling Open MPI. I suspect that gcc's analysis is seeing a free() call on a value it can prove did not come from malloc() (or equivalent). However, if as you say the value is always NULL, then this would be a false alarm. -Paul Ralph Castain wrote: Hi Paul M

Re: [OMPI devel] 1.5rc5 - warnings from Sun C 5.10

2010-08-25 Thread Rolf vandeVaart
With respect to the warnings with atomic.h, we have been down this road before. Here is the ticket with the background information. https://svn.open-mpi.org/trac/ompi/ticket/1500 Eventually, we decided to just live with the warnings. However, I will take a look at George's two suggestions.

Re: [OMPI devel] Suspicious warnings from gcc-4.5.0 (both 1.4.3rc1 and 1.5rc5)

2010-08-25 Thread Ralph Castain
Hi Paul Much appreciate all your testing! Quick question here: is this on compile, or were you trying to run something? We haven't seen this before, but I'm wondering if it is due to us failing to initialize an object's fields. If so, then it might be we don't see it because those fields usual

[OMPI devel] 1.5rc5: opal_path_nfs test failure on GPFS filesystem

2010-08-25 Thread Paul H. Hargrove
Testing 1.5rc5 on Linux/PPC64 I get a test failure in "make check" that probably relates to the GPFS filesystems used on this machine. Not sure if this is a serious error or just an annoyance: $ cat /etc/SuSE-release SUSE Linux Enterprise Server 10 (ppc) VERSION = 10 PATCHLEVEL = 3 $ uname -

[OMPI devel] 1.5rc5 and 1.4.3rc1: type size warnings in ompi_rb_tree test

2010-08-25 Thread Paul H. Hargrove
Saw this one on Mac OS X 10.6.1 on an x86_64, but not on older Mac's w/ their default 32-bit void*. This is present in both the current RCs. I've not checked if this is present on non-Mac systems. $ uname -a; echo; sw_vers; echo; gcc --version Darwin ijog.lbl.gov 10.0.0 Darwin Kernel Version 10.

[OMPI devel] 1.5rc5: warning from gcc-4.0.1 on Mac OS 10.4.11 (x86 and ppc)

2010-08-25 Thread Paul H. Hargrove
I get a warning building 1.5rc5 that appears unique to this Mac OS version. It does NOT occur with Mac OS 10.5.8 for x86 Mac OS 10.6.1 for x86 Mac OS 10.5.8 for ppc Nor does this occur with Open MPI 1.4.3rc1. $ uname -a Darwin irun.lbl.gov 8.11.1 Darwin Kernel Version 8.11.1: Wed Oct 10 18:23

[OMPI devel] Suspicious warnings from gcc-4.5.0 (both 1.4.3rc1 and 1.5rc5)

2010-08-25 Thread Paul H. Hargrove
With both recent RCs I get the following suspicious warnings from gcc-4.5.0 on Linux/ia64 1.4.3rc1: ../../../../../ompi/mca/dpm/orte/dpm_orte.c:963:5: warning: attempt to free a non-heap object 'ompi_mpi_comm_null' ../../../../../ompi/mca/dpm/orte/dpm_orte.c:965:5: warning: attempt to free a