[OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Chris Samuel
Hi all, Not sure if this is a OpenMPI query or a PLPA query, but given that PLPA seems to have some support for it already I thought I'd start here. :-) We run a quad core Opteron cluster with Torque 2.3.x which uses the kernels cpuset support to constrain a job to just the cores it has been allo

Re: [OMPI devel] [OMPI users] where can i get a tracing tool

2009-07-15 Thread Nikolay Molchanov
Hi Eugene, The FAQ page looks very good! Some links on the left side do not work, but I assume they will work tomorrow, when the real page goes alive. Thanks, Nik Eugene Loh wrote: Zou, Lin (GE, Research, Consultant) wrote: Hi all, I want to trace my program, having used vampirTrace to genera

Re: [OMPI devel] [OMPI users] where can i get a tracing tool

2009-07-15 Thread Matthias Jurenz
Hi Eugene, the FAQ page looks very nice. I just sent the following answer to Lin Zou: . for a quick view of what is inside the trace you could try 'otfprofile' to generate a tex/ps file with some information. This tool is a component of the latest stand-alone version of the Open Trace Format

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Ralph Castain
Interesting. No, we don't take PLPA cpu sets into account when retrieving the allocation. Just to be clear: from an OMPI perspective, I don't think this is an issue of binding, but rather an issue of allocation. If we knew we had been allocated only a certain number of cores on a node, then

Re: [OMPI devel] [OMPI users] where can i get a tracing tool

2009-07-15 Thread Jeff Squyres
On Jul 15, 2009, at 6:17 AM, Matthias Jurenz wrote: the FAQ page looks very nice. Ditto -- thanks for doing it, Eugene! I just sent the following answer to Lin Zou: Did that go on-list? It would be good to see that stuff in the publicly-searchable web archives. I mention this because

Re: [OMPI devel] [RFC] Move the datatype engine in the OPAL layer

2009-07-15 Thread Jeff Squyres
On Jul 14, 2009, at 1:23 PM, Rainer Keller wrote: https://svn.open-mpi.org/trac/ompi/wiki/HowtoTesting That is most helpful -- thanks! What about the latency issue? > >> Performance tests on the ompi-ddt branch have proven that there is no > >> performance penalties associated with this c

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Chris Samuel
- "Ralph Castain" wrote: Hi Ralph, > Interesting. No, we don't take PLPA cpu sets into account when > retrieving the allocation. Understood. > Just to be clear: from an OMPI perspective, I don't think this is an > issue of binding, but rather an issue of allocation. If we knew we had >

Re: [OMPI devel] [OMPI users] where can i get a tracing tool

2009-07-15 Thread Ashley Pittman
On Tue, 2009-07-14 at 18:54 -0700, Eugene Loh wrote: > P.S. Until the page goes live, I'll also leave it at > http://www.osl.iu.edu/~eloh/faq/?category=perftools . Or, check out a > workspace. I'm happy with it. Ashley. -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for

Re: [OMPI devel] [OMPI users] where can i get a tracing tool

2009-07-15 Thread Matthias Jurenz
Hi Jeff, On Wed, 2009-07-15 at 07:13 -0400, Jeff Squyres wrote: > On Jul 15, 2009, at 6:17 AM, Matthias Jurenz wrote: > > > the FAQ page looks very nice. > > > > Ditto -- thanks for doing it, Eugene! > > > I just sent the following answer to Lin Zou: > > > > Did that go on-list? It would be g

[OMPI devel] selectively bind MPI to one HCA out of available ones

2009-07-15 Thread neeraj
Hi all, I have a cluster where both HCA's of blade are active, but connected to different subnet. Is there an option in MPI to select one HCA out of available one's? I know it can be done by making changes in openmpi code, but i need clean interface like option during mpi launch

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Ralph Castain
Hmmm...I believe I made a mis-statement. Shocking to those who know me, I am sure! :-) Just to correct my comments: OMPI knows how many "slots" have been allocated to us, but not which "cores". So I'll assign the correct number of procs to each node, but they won't know that we were allocated core

Re: [OMPI devel] [OMPI users] where can i get a tracing tool

2009-07-15 Thread Jeff Squyres
On Jul 15, 2009, at 8:57 AM, Matthias Jurenz wrote: I sent the answer directly to the user, 'cause I didn't subscribe to the user-list. I'll do that asap ;-) Thanks -- I appreciate it. I know it's a somewhat high-volume list. I can bounce you the original question so that you can reply

Re: [OMPI devel] [OMPI users] where can i get a tracing tool

2009-07-15 Thread Jeff Squyres
On Jul 15, 2009, at 10:24 AM, Jeff Squyres (jsquyres) wrote: Thanks -- I appreciate it. I know it's a somewhat high-volume list. I can bounce you the original question so that you can reply to it and have it threaded properly. Disregard -- you replied already. Many thanks! -- Jeff Squyres

Re: [OMPI devel] [OMPI users] where can i get a tracing tool

2009-07-15 Thread Matthias Jurenz
On Wed, 2009-07-15 at 10:24 -0400, Jeff Squyres wrote: > On Jul 15, 2009, at 8:57 AM, Matthias Jurenz wrote: > > > I sent the answer directly to the user, 'cause I didn't subscribe to > > the > > user-list. I'll do that asap ;-) > > > > Thanks -- I appreciate it. I know it's a somewhat high-vo

Re: [OMPI devel] [OMPI users] where can i get a tracing tool

2009-07-15 Thread Jeff Squyres
On Jul 15, 2009, at 10:37 AM, Matthias Jurenz wrote: Sure! Our SVN ID's are: jurenz and knuepfer Done! You should have write access -- let me know if you don't. I think you guys have seen it before, but here's the wiki page about adding / editing wiki pages: https://

[OMPI devel] Fwd: [all-osl-users] Upgrading of the OSL SVN server

2009-07-15 Thread Jeff Squyres
FYI. Begin forwarded message: From: "DongInn Kim" Date: July 15, 2009 10:39:01 AM EDT To: Subject: Re: [all-osl-users] Upgrading of the OSL SVN server I am sorry that we can not upgrade subversion this time because of the technical issues on the interaction between the new subversion and

[OMPI devel] Fwd: [all-osl-users] Upgrading of the OSL SVN server

2009-07-15 Thread Josh Hursey
FYI. Begin forwarded message: From: DongInn Kim Date: July 15, 2009 10:39:01 AM EDT To: all-osl-us...@osl.iu.edu Subject: Re: [all-osl-users] Upgrading of the OSL SVN server I am sorry that we can not upgrade subversion this time because of the technical issues on the interaction between t

Re: [OMPI devel] Fwd: [all-osl-users] Upgrading of the OSL SVN server

2009-07-15 Thread Holger Mickler
*Quickness competition round 1* Jeff vs. Josh 1 : 0 ;-)) Josh Hursey wrote: > FYI. > > > Begin forwarded message: > >> From: DongInn Kim >> Date: July 15, 2009 10:39:01 AM EDT >> To: all-osl-us...@osl.iu.edu >> Subject: Re: [all-osl-users] Upgrading of the OSL SVN server >> >> I am s

[OMPI devel] DDT and spawn issue?

2009-07-15 Thread Jeff Squyres
I [very briefly] read about the DDT spawn issues, so I went to look at ompi/op/op.c. I notice that there's a new comment above the op datatype<-->op map construction area that says: /* XXX TODO */ svn blame says: 21641 rusraink /* XXX TODO */ r21641 is the big merge from the pa

Re: [OMPI devel] DDT and spawn issue?

2009-07-15 Thread George Bosilca
Yes, this appears to be at least partially part of the problem Edgar is seeing. We're trying to figure out how most of the tests passed so far with a wrong mapping. Interesting enough, while the mapping seems wrong the lookup is symmetric so most of the time we end-up with the correct op by

Re: [OMPI devel] DDT and spawn issue?

2009-07-15 Thread Ralph Castain
Thanks George!! On Wed, Jul 15, 2009 at 9:57 AM, George Bosilca wrote: > Yes, this appears to be at least partially part of the problem Edgar is > seeing. We're trying to figure out how most of the tests passed so far with > a wrong mapping. Interesting enough, while the mapping seems wrong the >

Re: [OMPI devel] DDT and spawn issue?

2009-07-15 Thread Rainer Keller
Hi Jeff, Ralph and Edgar send fwd an email about this. We (George and myselve) are currently looking into this. With the changes we have I can get IBM/spawn to work "sometimes", aka sometimes, it segfaults. Thanks, Rainer On Wednesday 15 July 2009 11:50:13 am Jeff Squyres wrote: > I [very br

Re: [OMPI devel] [OMPI users] where can i get a tracing tool

2009-07-15 Thread Eugene Loh
Done. Hit "reload" on the URL below, check out an SVN repository, or wait for these changes to be pushed to the live site. Matthias Jurenz wrote: Could you also mention the tool 'otfprofile' under the section 7, please? On Tue, 2009-07-14 at 18:54 -0700, Eugene Loh wrote: P.S. Until the

Re: [OMPI devel] DDT and spawn issue?

2009-07-15 Thread Jeff Squyres
Perhaps we should add a requirement for testing on 2-3 different systems before long-term (or "big change") branches like this come to the trunk? I say this because it seems like at least some of these problems were based on bad luck -- i.e., the stuff worked on the platform that it was be

[OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r21686

2009-07-15 Thread George Bosilca
I have a question regarding the mapping. How can I declare a partial mapping ? In fact I only care about how some of the processes are mapped on some specific nodes. Right now if the rmaps doesn't contain information about all nodes, we give up (before this patch we segfaulted). Does it m

Re: [OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r21686

2009-07-15 Thread Ralph Castain
The routed comm system relies on each daemon having complete information as to where every process is located, so the expectation was that only full maps would ever be sent. Thus, the nidmap code is setup to always send a full map. I don't know how to even generate a "partial" map. I assume you ar

Re: [OMPI devel] DDT and spawn issue?

2009-07-15 Thread George Bosilca
Actually I don't think this will help. I looked on MTT and there are no errors related to this (logically all reductions should have failed) ... and MTT is supposed to run on several platforms. What happens inside is really strange, but as we do the same mistake when we look-up the op as he

Re: [OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r21686

2009-07-15 Thread George Bosilca
I think I found a better solution (in r21688). Here is what I was trying to do. I have a more or less homogeneous cluster. In fact all processors are identical, except that some are quad core and some dual core. Of course I care how my processes are mapped on the quad cores, but not reall

Re: [OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r21686

2009-07-15 Thread Ralph Castain
Ah - interesting scenario! Definitely a "bug" in the code, then. What it looks like, though, is that the jdata->num_procs is wrong. There shouldn't be any way that the num_procs in the node array is different than jdata->num_procs. My guess is that the rank_file mapper isn't correctly maintaining

Re: [OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r21686

2009-07-15 Thread Ralph Castain
Found the bug - we indeed failed to update the jdata->num_procs field when adding the non-rf-mapped procs to the job. Fix coming shortly. On Jul 15, 2009, at 2:40 PM, Ralph Castain wrote: Ah - interesting scenario! Definitely a "bug" in the code, then. What it looks like, though, is that

[OMPI devel] MPI_Accumulate() with MPI_PROC_NULL target rank

2009-07-15 Thread Lisandro Dalcin
The MPI 2-1 standard says: "MPI_PROC_NULL is a valid target rank in the MPI RMA calls MPI_ACCUMULATE, MPI_GET, and MPI_PUT. The effect is the same as for MPI_PROC_NULL in MPI point-to-point communication. After any RMA operation with rank MPI_PROC_NULL, it is still necessary to finish the RMA epoc

Re: [OMPI devel] MPI_Accumulate() with MPI_PROC_NULL target rank

2009-07-15 Thread Brian W. Barrett
On Wed, 15 Jul 2009, Lisandro Dalcin wrote: The MPI 2-1 standard says: "MPI_PROC_NULL is a valid target rank in the MPI RMA calls MPI_ACCUMULATE, MPI_GET, and MPI_PUT. The effect is the same as for MPI_PROC_NULL in MPI point-to-point communication. After any RMA operation with rank MPI_PROC_NUL

Re: [OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r21686

2009-07-15 Thread Ralph Castain
Okay, George - this is fixed in r21690. Thanks again Ralph On Jul 15, 2009, at 2:40 PM, Ralph Castain wrote: Ah - interesting scenario! Definitely a "bug" in the code, then. What it looks like, though, is that the jdata->num_procs is wrong. There shouldn't be any way that the num_procs in

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Chris Samuel
- "Ralph Castain" wrote: > Could you check this? You can run a trivial job using the -npernode x > option, where x matched the #cores you were allocated on the nodes. > If you do this, do we bind to the correct cores? Nope, I'm afraid it doesn't - submitted a job asking for 4 cores on one

Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-15 Thread Ralph Castain
Looking at your command line, did you remember to set -mca mpi_paffinity_alone 1? If not, we won't set affinity on the processes. On Jul 15, 2009, at 8:11 PM, Chris Samuel wrote: - "Ralph Castain" wrote: Could you check this? You can run a trivial job using the -npernode x option, wh