Re: [OMPI devel] suffix flag problems

2009-09-04 Thread Jeff Squyres
Good call; done. Thanks! On Sep 5, 2009, at 5:01 AM, Paul H. Hargrove wrote: Jeff Squyres wrote: > Excellent suggestion; thanks Paul! You are welcome. My presence on the ompi-devel list isn't totally passive. :-) > > I've incorporated this into OMPI -- Paul, can you confirm that I > obeyed

Re: [OMPI devel] suffix flag problems

2009-09-04 Thread Paul H. Hargrove
Jeff Squyres wrote: Excellent suggestion; thanks Paul! You are welcome. My presence on the ompi-devel list isn't totally passive. :-) I've incorporated this into OMPI -- Paul, can you confirm that I obeyed the license requirements properly? IANAL, but looks all proper to me. May I sug

Re: [OMPI devel] suffix flag problems

2009-09-04 Thread Jeff Squyres
Excellent suggestion; thanks Paul! I've incorporated this into OMPI -- Paul, can you confirm that I obeyed the license requirements properly? https://svn.open-mpi.org/trac/ompi/changeset/21943 On Sep 4, 2009, at 9:01 PM, Paul H. Hargrove wrote: Jeff Squyres wrote: > On Sep 4, 2009, at 7

Re: [OMPI devel] Failed datatype test

2009-09-04 Thread Jeff Squyres
Sorry if I was not clear -- I was building the test/datatype directory manually with "make check" and seeing those warnings. So what I was trying to say was: no only was it bombing, it may have experienced other flavors of bit rot before then. On Sep 4, 2009, at 4:37 PM, Ralph Castain wro

Re: [OMPI devel] Can I have the same node specified multiple times in a host file?

2009-09-04 Thread Ralph Castain
You need to use the sequential mapper instead of the default round- robin one. Set -mca rmaps seq on your cmd line. We will then assign ranks to nodes in sequential order based on the hostfile entries. On Sep 4, 2009, at 11:25 AM, Karl, Robert (RKARL) wrote: I am attempting force certain pro

Re: [OMPI devel] suffix flag problems

2009-09-04 Thread Paul H. Hargrove
Jeff Squyres wrote: On Sep 4, 2009, at 7:13 AM, David Robertson wrote: Perhaps it should be taken out of the help message in the configure script then. We can't; it's part of the built-in Autoconf options. :-( On can't disable the option, but one can prevent confusing the user by (partially)

[OMPI devel] Can I have the same node specified multiple times in a host file?

2009-09-04 Thread Karl, Robert (RKARL)
I am attempting force certain processes to run on specific nodes due to the I/O cards that are attached to the specific CPU motherboards (not all boards have the same I/O cards). I am using the -bynode and --hostname options to specify the nodes that I want the processes to run on. There are 4 pr

Re: [OMPI devel] Deadlock on openib when using hindexed types

2009-09-04 Thread Sylvain Jeaugey
Ok, I was wrong, the fix works. Actually, I rebuilt with the latest trunk but openib support was somehow dropped. I was running on tcp. Which brings us to the next issue : tcp is actually not working (I don't know why I was convinced that tcp worked). The fix fixed the problem for openib, bu

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Kenneth Lloyd
Ralph, and all, The Japanese have a term poka-yoke which means "fail-safing". This is an excellent concept to apply. The term does not mean covering all unintended consequences of error and omission, though. If folks are downloading OMPI (or any software) for unauthorized purposes, that seems a

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Sylvain Jeaugey
Understood. So, let's say that we're only implementing a hurdle to discourage users from doing things wrong. I guess the efficiency of this will reside in the message displayed to the user ("You are about to break the entire machine and you will be fined if you try to circumvent this in any way

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Ralph Castain
Just want to make this very clear, since other LANL people are on this list. I am in no way saying that LANL users are ill-intentioned or deliberately attempting to circumvent system restrictions. See my other note for the most common scenarios that lead to this problem and you will see t

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Ralph Castain
I fear you all misunderstood me. This isn't a case of sabotage or nasty users, but simply people who do something that they don't realize can cause a problem. Our example is quite simple. We have IB network for MPI messages, and several Ethernet NICs that are dedicated to system-level funct

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Nadia Derbey
On Fri, 2009-09-04 at 07:50 -0600, Ralph Castain wrote: > Let me point out the obvious since this has plagued us at LANL with > regard to this concept. If a user wants to do something different, all > they have to do is download and build their own copy of OMPI. > > Amazingly enough, that is e

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Sylvain Jeaugey
Looks like users at LANL are not very nice ;) Indeed, this is no hard security. Only a way to prevent users from doing mistakes. We often give users special tuning for their application and when they see their application is going faster, they start messing with every parameter hoping that it

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Arthur Huillet
Hi, Ralph Castain wrote: Let me point out the obvious since this has plagued us at LANL with regard to this concept. If a user wants to do something different, all they have to do is download and build their own copy of OMPI. We are well aware of that. It is relatively easy for a user to circu

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Ralph Castain
Let me point out the obvious since this has plagued us at LANL with regard to this concept. If a user wants to do something different, all they have to do is download and build their own copy of OMPI. Amazingly enough, that is exactly what they do. When we build our production versions, we

Re: [OMPI devel] Failed datatype test

2009-09-04 Thread Ralph Castain
Correcting my own statement here: it was bombing on the IU machines that generate the nightly tarball, hence the error reports. I removed it because our automated systems were unable to generate nightly tarballs until it was removed. It also bombed on my Mac when I tried to test it manually

Re: [OMPI devel] Failed datatype test

2009-09-04 Thread Ralph Castain
Before or after I removed the test? If you look at the MTT reports, you'll find that opal_datatype_test bombed on almost every system with an assert failure. On Sep 3, 2009, at 9:53 PM, Jeff Squyres wrote: FWIW, I get a bunch of valid-looking compiler warnings when running "make check" in

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Nadia Derbey
On Fri, 2009-09-04 at 13:55 +0200, Sylvain Jeaugey wrote: > On Fri, 4 Sep 2009, Jeff Squyres wrote: > > > I haven't looked at the code deeply, so forgive me if I'm parsing this > > wrong: > > is the code actually reading the file into one list and then moving the > > values to another list? If

Re: [OMPI devel] Deadlock on openib when using hindexed types

2009-09-04 Thread Sylvain Jeaugey
Hi Rolf, I was indeed running a more than 4 weeks old trunk, but after pulling the latest version (and checking the patch was in the code), it seems to make no difference. However, I know where to look at now, thanks ! Sylvain On Fri, 4 Sep 2009, Rolf Vandevaart wrote: I think you are runn

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Nadia Derbey
On Fri, 2009-09-04 at 13:34 +0200, Sylvain Jeaugey wrote: > On Fri, 4 Sep 2009, Jeff Squyres wrote: > > > -- > > *** Checking versions > > checking for SVN version... done > > checking Open MPI version... 1.4a1hgf11244ed72b5 > > up to changeset c4b117c5439b > > checking Open MPI release date..

Re: [OMPI devel] Deadlock on openib when using hindexed types

2009-09-04 Thread Rolf Vandevaart
I think you are running into a bug that we saw also and we recently fixed. We would see a hang when we were sending from a contiguous type to a non-contiguous type using a single port over openib. The problem was that the state of the request on the sending side was not being properly updated

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Sylvain Jeaugey
On Fri, 4 Sep 2009, Jeff Squyres wrote: I haven't looked at the code deeply, so forgive me if I'm parsing this wrong: is the code actually reading the file into one list and then moving the values to another list? If so, that seems a little hackish. Can't it just read directly to the target

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Sylvain Jeaugey
On Fri, 4 Sep 2009, Jeff Squyres wrote: -- *** Checking versions checking for SVN version... done checking Open MPI version... 1.4a1hgf11244ed72b5 up to changeset c4b117c5439b checking Open MPI release date... Unreleased developer copy checking Open MPI Subversion repository version... hgf11

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Nadia Derbey
On Fri, 2009-09-04 at 10:05 +0300, Jeff Squyres wrote: > On Sep 3, 2009, at 12:23 PM, Nadia Derbey wrote: > > > What: Define a way for the system administrator to prevent users from > > overwriting the default system-wide MCA parameters settings. > > > > In general, I think this is great st

[OMPI devel] Deadlock on openib when using hindexed types

2009-09-04 Thread Sylvain Jeaugey
Hi all, We're currently working with romio and we hit a problem when exchanging data with hindexed types with the openib btl. The attached reproducer (adapted from romio) is working fine on tcp, blocks on openib when using 1 port but works if we use 2 ports (!). I tested it against the trunk

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Jeff Squyres
On Sep 3, 2009, at 12:23 PM, Nadia Derbey wrote: What: Define a way for the system administrator to prevent users from overwriting the default system-wide MCA parameters settings. In general, I think this is great stuff. I have a few nit picks. (BTW: you might want to run contrib/hg/b

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Jeff Squyres
On Sep 4, 2009, at 8:26 AM, Nadia Derbey wrote: > Can the file name ( openmpi-priv-mca-params.conf ) also be configurable ? No, it isn't, presently, but this can be changed if needed. If it's configurable, it must be configurable at configure time -- not run time -- otherwise, a user co

Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-04 Thread Nadia Derbey
On Thu, 2009-09-03 at 19:29 -0400, Graham, Richard L. wrote: > What happens if $sysconfdir/openmpi-priv-mca-params.conf is missing ? If it is missing, everything works as today: any parameter declared in $sysconfdir/openmpi-mca-params.conf is considered as system-wide and can be overwritten as usu

Re: [OMPI devel] more bug/comments for current trunk

2009-09-04 Thread Jeff Squyres
Fixed all in https://svn.open-mpi.org/trac/ompi/changeset/21941. Thanks! On Sep 2, 2009, at 7:38 PM, Lisandro Dalcin wrote: Disclaimer: this is for trunk svn up'ed yesterday. The code below should fail with ERR_COMM, but it succeed... #include int main(int argc, char **argv) { int *value

Re: [OMPI devel] suffix flag problems

2009-09-04 Thread Jeff Squyres
On Sep 4, 2009, at 7:13 AM, David Robertson wrote: Perhaps it should be taken out of the help message in the configure script then. We can't; it's part of the built-in Autoconf options. :-( -- Jeff Squyres jsquy...@cisco.com

Re: [OMPI devel] suffix flag problems

2009-09-04 Thread David Robertson
Perhaps it should be taken out of the help message in the configure script then. Dave Jeff Squyres wrote: On Sep 3, 2009, at 9:55 PM, David Robertson wrote: We use both the PGI and Intel compilers over an Infiniband cluster and I was trying to find a way to have both orteruns in the path (in

Re: [OMPI devel] suffix flag problems

2009-09-04 Thread Jeff Squyres
On Sep 3, 2009, at 9:55 PM, David Robertson wrote: We use both the PGI and Intel compilers over an Infiniband cluster and I was trying to find a way to have both orteruns in the path (in separate directories) at the same time. I decided to use the --program-suffix option. However, all the sy