[OMPI devel] 1.7.0rc5 - build failure w/ gcc-3.4.6/x86-64 (regression)

2012-10-30 Thread Paul Hargrove
I have access to a Linux/x86-64 machine running "Red Hat Enterprise Linux AS release 4" It has a pretty old gcc: $ gcc --version | head -1 gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3) As shown below, this gcc is rejecting some portion of the atomics. I am certain I've tested ompi-1.5 and 1.6 on thi

[OMPI devel] 1.7.0rc5 - make check failure on OpenBSD-5.1/{i386, amd64}

2012-10-30 Thread Paul Hargrove
My OpenBSD-5.1 testers for both i386 and amd64 are failing the same tests as I reported a few minutes ago with respect to FreeBSD-6.3. Unlike FreeBSD-6.3, this is "modern" system with OpenBSD 5.1 having been released Feb 2012. On both platforms I have builds w/ gcc-4.2.1 and with llvm-gcc-4.2.1.

[OMPI devel] 1.7.0rc5 - make check failure on FreeBSD-6.3/amd64

2012-10-30 Thread Paul Hargrove
On my FreeBSD-6.3/amd64 platform I see "make check" failing 3 tests under test/datatype (see below). Of course "make" stops after that, making it possible that additional tests might fail later. However, my records do show that the v1.5 branch was just fine on this machine, as was the trunk on or

Re: [OMPI devel] Compile-time MPI_Datatype checking

2012-10-30 Thread Jeff Squyres
On Oct 28, 2012, at 10:28 AM, Dmitri Gribenko wrote: > Thank you for the feedback! Hopefully the attached patch fixes both of these. > > 1. There are two helper structs with complex numbers. I predicated > the struct declarations and use to appear only in C99. > > 2. These macros were indeed m

[OMPI devel] 1.7.0rc5

2012-10-30 Thread Ralph Castain
Hi folks We have posted the next release candidate (rc5) for the 1.7.0 release in the usual place: http://www.open-mpi.org/software/ompi/v1.7/ Please put it thru the wringer to help us validate it prior to release later this month. We think this looks pretty complete, pending someone finding a

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27526 - trunk/orte/mca/plm/rsh

2012-10-30 Thread Jeff Squyres
Coolio; thanks for checking into it. On Oct 30, 2012, at 4:16 PM, Ralph Castain wrote: > Actually, now that I look at it, I'm not sure what Jeff is talking about here > is correct. I think Nathan's patch is in fact right. > > Nathan's change doesn't in any way impact what gets passed to remote

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27526 - trunk/orte/mca/plm/rsh

2012-10-30 Thread Nathan Hjelm
On Tue, Oct 30, 2012 at 01:16:17PM -0700, Ralph Castain wrote: > Actually, now that I look at it, I'm not sure what Jeff is talking about here > is correct. I think Nathan's patch is in fact right. > > Nathan's change doesn't in any way impact what gets passed to remote procs. > All it does is m

Re: [OMPI devel] 1.7 rc4 compilation error

2012-10-30 Thread Ralph Castain
Okay, I tracked this silliness down. On odin, my platform file builds both shared and static. It appears that mpicc in that situation defaults to picking the static build, and so I wind up with a static executable. This behavior was unexpected - I thought we would default to dynamic, but support

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27526 - trunk/orte/mca/plm/rsh

2012-10-30 Thread Ralph Castain
Actually, now that I look at it, I'm not sure what Jeff is talking about here is correct. I think Nathan's patch is in fact right. Nathan's change doesn't in any way impact what gets passed to remote procs. All it does is modify what gets passed on the *orted's* command line. The orted has no i

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27526 - trunk/orte/mca/plm/rsh

2012-10-30 Thread Ralph Castain
I'll fix it, Jeff - the problem is that the plm/rsh was prepending a "-mca" in those cases, when it shouldn't. Nathan's fix is close - I can fix the rest. On Oct 30, 2012, at 12:52 PM, Jeff Squyres wrote: > WAIT. > > This contradicts the intent of what I said on the call this morning. > > Th

Re: [OMPI devel] [OMPI svn] svn:open-mpi r27451 - in trunk: ompi/mca/allocator/bucket ompi/mca/bcol/basesmuma ompi/mca/bml/base ompi/mca/btl ompi/mca/btl/base ompi/mca/btl/openib ompi/mca/btl/sm ompi/

2012-10-30 Thread Jeff Squyres
No. r27526 is wrong. See http://www.open-mpi.org/community/lists/devel/2012/10/11684.php. On Oct 30, 2012, at 3:50 PM, Nathan Hjelm wrote: > This issue should be resolved thanks to r27526. plm/rsh was incorrectly > interpreting all OMPI_* environment variables as mca parameters (including

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27526 - trunk/orte/mca/plm/rsh

2012-10-30 Thread Jeff Squyres
WAIT. This contradicts the intent of what I said on the call this morning. The point is that any env variable that begins with "OMPI_" is supposed to be propagated out to all the remote processes. It's a cheap/easy way for users to propagate their env variables to remote nodes (without needing

Re: [OMPI devel] [OMPI svn] svn:open-mpi r27451 - in trunk: ompi/mca/allocator/bucket ompi/mca/bcol/basesmuma ompi/mca/bml/base ompi/mca/btl ompi/mca/btl/base ompi/mca/btl/openib ompi/mca/btl/sm ompi/

2012-10-30 Thread Nathan Hjelm
This issue should be resolved thanks to r27526. plm/rsh was incorrectly interpreting all OMPI_* environment variables as mca parameters (including OMPI_COMMAND-- -mca AND command-- and others) and adding them to the orted argv. r27451 and r27456 are now r27527. -Nathan HPC-3, LANL On Wed, Oct

Re: [OMPI devel] sbgp problem

2012-10-30 Thread Edgar Gabriel
as far as I can tell right now, yes, its the final thing... Thanks Edgar On 10/30/2012 2:05 PM, Ralph Castain wrote: > Grrbloody verb @##$@$. > > Okay, I'll make that edit. Is that the last thing required to fix this > problem? > > On Oct 30, 2012, at 11:57 AM, Edgar Gabriel wrote: >

Re: [OMPI devel] 1.7 rc4 compilation error

2012-10-30 Thread Ralph Castain
Sure - I can do that. On Oct 30, 2012, at 11:29 AM, Edgar Gabriel wrote: > glad to hear that. However, since we are also having the problem with > the lustre-fs module for static builds, I think it would still make > sense to disable fs/lustre/ for 1.7.0 > > Edgar > > On 10/30/2012 12:34 PM, R

Re: [OMPI devel] sbgp problem

2012-10-30 Thread Ralph Castain
Grrbloody verb @##$@$. Okay, I'll make that edit. Is that the last thing required to fix this problem? On Oct 30, 2012, at 11:57 AM, Edgar Gabriel wrote: > so the sbgp problem that I mentioned on the call this morning > unfortunately is not resolved by just adding the common/verbs direc

Re: [OMPI devel] process kill signal 59

2012-10-30 Thread Ralph Castain
On Oct 30, 2012, at 11:55 AM, Sandra Guija wrote: > I am able to change the memory size parameters, so if I increase memory size > (currently 2 gb) or add caches, it could be a solution? could be > or is the program that is using too much memory? Hard to tell. In the case you show, we are ab

[OMPI devel] sbgp problem

2012-10-30 Thread Edgar Gabriel
so the sbgp problem that I mentioned on the call this morning unfortunately is not resolved by just adding the common/verbs directory into the 1.7 branch. We looked a bit into it, and the problem/difference between in the file ompi/sbgp/ibnet/sbgp_ibnet_component.c which has the following include

Re: [OMPI devel] process kill signal 59

2012-10-30 Thread Sandra Guija
I am able to change the memory size parameters, so if I increase memory size (currently 2 gb) or add caches, it could be a solution?or is the program that is using too much memory? thanks really for you input, I appreciate it. Sandra Guija From: r...@open-mpi.org List-Post: devel@lists.open-m

Re: [OMPI devel] process kill signal 59

2012-10-30 Thread Ralph Castain
Yeah, you're using too much memory for the shared memory system. Run with -mca btl ^sm on your cmd line - it'll run slower, but you probably don't have a choice. On Oct 30, 2012, at 11:38 AM, Sandra Guija wrote: > yes I think is related with my program too, when I run 1000x1000 matrix > mult

Re: [OMPI devel] process kill signal 59

2012-10-30 Thread Sandra Guija
yes I think is related with my program too, when I run 1000x1000 matrix multiplication, the program works.when I run the 10,000 matrix only on one machine I got this:mca_common_sm_mmap_init: mmap failed with errno=12mca_mpool_sm_init: unable to shared memory mapping ( /tmp/openmpi-sessions-mp

Re: [OMPI devel] 1.7 rc4 compilation error

2012-10-30 Thread Edgar Gabriel
glad to hear that. However, since we are also having the problem with the lustre-fs module for static builds, I think it would still make sense to disable fs/lustre/ for 1.7.0 Edgar On 10/30/2012 12:34 PM, Ralph Castain wrote: > I hate odin :-( > > FWIW: it all works fine today, no matter how I

Re: [OMPI devel] 1.7 rc4 compilation error

2012-10-30 Thread Ralph Castain
I hate odin :-( FWIW: it all works fine today, no matter how I configure it. No earthly idea what happened. Ignore these droids On Oct 30, 2012, at 7:28 AM, Edgar Gabriel wrote: > ok, so a couple of things. > > I still think it is the same issue that I observed 1-2 days ago. Could > you

Re: [OMPI devel] process kill signal 59

2012-10-30 Thread Ralph Castain
Ummm...not sure what I can say about that with so little info. It looks like your process died for some reason that has nothing to do with us - a bug in your "magic1" program? On Oct 30, 2012, at 10:24 AM, Sandra Guija wrote: > Hello, > I am running a 10,000x10,000 matrix multiplication

[OMPI devel] process kill signal 59

2012-10-30 Thread Sandra Guija
Hello, I am running a 10,000x10,000 matrix multiplication in 4 processors/1 core and I get the following error:mpirun -np 4 --hostfile nodes --bynode magic1 mpirun noticed that job rank1 with PID 635 on node slave1 exited on signal 509(Real-time signal 25).2 additional process aborted (not

Re: [OMPI devel] Linking with slurm pmi when slurm is not in a standard path

2012-10-30 Thread Ralph Castain
I see the problem - it isn't that slurm is not in a standard path, but rather that your slurm library subdir is named "lib64" instead of "lib". The patch looks good to me - I'll submit it for 1.6 and 1.7, plus add it to the trunk. Thanks! Ralph On Oct 30, 2012, at 9:11 AM, guillaume.papa...@ex

[OMPI devel] Linking with slurm pmi when slurm is not in a standard path

2012-10-30 Thread Guillaume . Papaure
Hi, I'm building openmpi with pmi support but there is an error during configure. Currently slurm is not installed in standard /usr directory. The configure is giving an error like : ./configure --prefix=/home_nfs/papaureg/soft/openmpi-1.9a1hg155f02ad65ba --with-slurm=/homes/papaureg/soft/slurm/

Re: [OMPI devel] 1.7 rc4 compilation error

2012-10-30 Thread Edgar Gabriel
ok, so a couple of things. I still think it is the same issue that I observed 1-2 days ago. Could you try to remove the fs/lustre component from your compilation, e.g. by adding an .ompi_ignore file into that directory, and see whether this fixes the issue? I tried on my machine (no lustre, no ib

Re: [OMPI devel] 1.7 rc4 compilation error

2012-10-30 Thread Edgar Gabriel
ok, I'll look into this. I noticed a problem with static builds on lustre file systems recently, and I was wandering whether its the same issue or not. But I'll check what's going on. THanks Edgar On 10/30/2012 7:22 AM, Ralph Castain wrote: > No to Lustre, and I didn't build static > > I'm not s

Re: [OMPI devel] 1.7 rc4 compilation error

2012-10-30 Thread Ralph Castain
No to Lustre, and I didn't build static I'm not sure what, if any, parallel file system might be present. In the case that works, I just built with no configure args other than prefix. ompi_info shows both romio and mpio built, but nothing more about what support they built internally. On Oct

Re: [OMPI devel] 1.7 rc4 compilation error

2012-10-30 Thread Edgar Gabriel
Ralph, just out curiosity: is there a lustre file system on the machine and is this a static build ? Thanks Edgar On 10/29/2012 9:17 PM, Ralph Castain wrote: > Hmmm...I added that directory and tried this on odin (which is an IB-based > machine). Any MPI proc segfaults: > > Core was generated