Re: [OMPI devel] RTE issue I. Support for non-MPI jobs

2007-12-05 Thread Rolf . Vandevaart
Ralph H Castain wrote: I. Support for non-MPI jobs Considerable complexity currently exists in ORTE because of the stipulation in our first requirements document that users be able to mpirun non-MPI jobs - i.e., that we support such calls as "mpirun -n 100 hostname". This creates a situation, ho

Re: [OMPI devel] RTE issue I. Support for non-MPI jobs

2007-12-05 Thread Ralph H Castain
On 12/5/07 7:58 AM, "rolf.vandeva...@sun.com" wrote: > Ralph H Castain wrote: > >> I. Support for non-MPI jobs >> Considerable complexity currently exists in ORTE because of the stipulation >> in our first requirements document that users be able to mpirun non-MPI jobs >> - i.e., that we supp

Re: [OMPI devel] RTE Issue II: Interaction between the ROUTED and GRPCOMM frameworks

2007-12-05 Thread Tim Prins
To me, (c) is a non-starter. I think whenever possible we should be automatically doing the right thing. The user should not need to have any idea how things work inside the library. Between options (a) and (b), I don't really care. (b) would be great if we had a mca component dependency syste

Re: [OMPI devel] RTE Issue IV: RTE/MPI relative modex responsibilities

2007-12-05 Thread Tim Prins
Well, I think it is pretty obvious that I am a fan of a attribute system :) For completeness, I will point out that we also exchange architecture and hostname info in the modex. Do we really need a complete node map? A far as I can tell, it looks like the MPI layer only needs a list of local

Re: [OMPI devel] RTE Issue III: Collective communications across daemons

2007-12-05 Thread Tim Prins
The latter issue exists for even MPI jobs. Consider the case of a single process job that comm_spawns a child job onto other nodes. The RTE will launch daemons on the new nodes, and then broadcast the "launch procs" command across all the daemons (this is done to exploit a scalable comm procedure)

Re: [OMPI devel] RTE Issue III: Collective communications across daemons

2007-12-05 Thread Ralph H Castain
On 12/5/07 8:56 AM, "Tim Prins" wrote: >> The latter issue exists for even MPI jobs. Consider the case of a single >> process job that comm_spawns a child job onto other nodes. The RTE will >> launch daemons on the new nodes, and then broadcast the "launch procs" >> command across all the daem

Re: [OMPI devel] [ofa-general] uDAPL EVD queue length issue

2007-12-05 Thread Steve Wise
Jon Mason wrote: On Tue, Dec 04, 2007 at 11:40:17AM -0800, Arlin Davis wrote: Jon Mason wrote: While working on OMPI udapl btl, I have noticed some "interesting" behavior. OFA udapl wants the evd queues to be a power of 2 and then will subtract 1 for book keeping (ie, so that internal head and

Re: [OMPI devel] RTE Issue IV: RTE/MPI relative modex responsibilities

2007-12-05 Thread Ralph H Castain
On 12/5/07 8:48 AM, "Tim Prins" wrote: > Well, I think it is pretty obvious that I am a fan of a attribute system :) > > For completeness, I will point out that we also exchange architecture > and hostname info in the modex. True - except we should note that hostname info is only exchanged i

Re: [OMPI devel] RTE Issue II: Interaction between the ROUTED and GRPCOMM frameworks

2007-12-05 Thread Brian W. Barrett
To me, (a) is dumb and (c) isn't a non-starter. The whole point of the component system is to seperate concerns. Routing topology and collectives operations are two difference concerns. While there's some overlap (a topology-aware collective doesn't make sense when using the unity routing st

Re: [OMPI devel] vt-integration

2007-12-05 Thread Jeff Squyres
I know that OS X's linker is quite different than the Linux linker -- you might want to dig into the ld(1) man page on OS X as a starting point, and/or consult developer.apple.com for more details. On Dec 5, 2007, at 10:04 AM, Matthias Jurenz wrote: Hi Jeff, I have added checks for the fu

Re: [OMPI devel] vt-integration

2007-12-05 Thread Brian W. Barrett
OS X enforces a no duplicate symbol rule when flat namespaces are in use (the default on OS X). If all the libraries are two-level namespace libraries (libSystem.dylib, aka libm.dylib is two-level), then duplicate symbols are mostly ok. Libtool by default forces a flat namespace in sharedlibr

Re: [OMPI devel] vt integration

2007-12-05 Thread Terry Dontje
Have you tested building this with vpath? I am seeing the following errors during make all (while using a vpath directory): Making all in doc gmake[5]: Entering directory `/workspace/tdd/ct7/ompi-ws-vt/ompi-vt-integration/builds/ompi-vt-integration/config-data/SunOS/sparc/2007.12.05/64/ompi/co

Re: [OMPI devel] [ofa-general] uDAPL EVD queue length issue

2007-12-05 Thread Arlin Davis
I'm running OFED 1.2.5 and using Chelsio. From the linux rdma verbs perspective, ibv_create_cq() will create a cq that is >= the requested depth. The fact that mthca always bumps the size up to the next power of 2 isn't something udapl can rely on. It doesn't. uDAPL passes the users reque

Re: [OMPI devel] Using MTT to test the newly added SCTP BTL

2007-12-05 Thread Karol Mroz
Hi... Karol Mroz wrote: > Removal of .ompi_ignore should not create build problems for anyone who > is running without some form of SCTP support. To test this claim, we > built Open MPI with .ompi_ignore removed and no SCTP support on both an > ubuntu linux and an OSX machine. Both builds succeed

[OMPI devel] 32-bit openib is broken on the trunk as of Nov 27th, r16799

2007-12-05 Thread Tim Mattox
Hello, It appears that sometime after r16777, and by r16799, that something was broken on the trunk's openib support for 32-bit builds. The 64-bit tests all seem normal, as well as the 32-bit & 64-bit tests on the 1.2 branch on the same machine (odin). See this MTT results page permalink showing t

Re: [OMPI devel] RTE Issue II: Interaction between the ROUTED and GRPCOMM frameworks

2007-12-05 Thread Ralph H Castain
I'm not sure I would call (a) "dumb", but I would agree it isn't a desirable option. ;-) The issue isn't with the current two routed components. The issue arose because additional routed components are about to be committed to the system. None of those added components are fully connected - i.e.,

[OMPI devel] opal_condition

2007-12-05 Thread Tim Prins
Hi, Last night we had one of our threaded builds on the trunk hang when running make check on the test opal_condition in test/threads/ After running the test about 30-40 times, I was only able to get it to hang once. Looking at it is gdb, we get: (gdb) info threads 3 Thread 1084229984 (LW

[OMPI devel] opal_condition

2007-12-05 Thread Tim Prins
Hi, Last night we had one of our threaded builds on the trunk hang when running make check on the test opal_condition in test/threads/ After running the test about 30-40 times, I was only able to get it to hang once. Looking at it is gdb, we get: (gdb) info threads 3 Thread 1084229984 (LW

[OMPI devel] [PATCH] openib btl: remove excess ompi_btl_openib_connect_base_open call

2007-12-05 Thread Jon Mason
There is a double call to ompi_btl_openib_connect_base_open in mca_btl_openib_mca_setup_qps(). It looks like someone just forgot to clean-up the previous call when they added the check for the return code. I ran a quick IMB test over IB to verify everything is still working. Thanks, Jon Index:

Re: [OMPI devel] IB pow wow notes

2007-12-05 Thread Richard Graham
One question ­ there is a mention a new pml that is essentially CM+matching. Why is this no just another instance of CM ? Rich On 11/26/07 7:54 PM, "Jeff Squyres" wrote: > OMPI OF Pow Wow Notes > 26 Nov 2007 > > --- > >