Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-16 Thread Ralph Castain
Forgot to mention this tip for debugging paffinity: There is a test module in the paffinity framework. The module has mca params that let you define the number of sockets/node (default: 4) and the #cores/socket (also default: 4). So by setting -mca paffinity test and adjusting those two paramet

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-16 Thread Ralph Castain
Well, I guess I got sucked back into paffinity again...sigh. I have committed a solution to this issue in r22984 and r22985. I have tested it against a range of scenarios, but hardly an exhaustive test. So please do stress it. The following comments are by no means intended as criticism, but ra

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-13 Thread Nadia Derbey
On Tue, 2010-04-13 at 01:27 -0600, Ralph Castain wrote: > On Apr 13, 2010, at 1:02 AM, Nadia Derbey wrote: > > > On Mon, 2010-04-12 at 10:07 -0600, Ralph Castain wrote: > >> By definition, if you bind to all available cpus in the OS, you are > >> bound to nothing (i.e., "unbound") as your process

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-13 Thread Ralph Castain
On Apr 13, 2010, at 1:02 AM, Nadia Derbey wrote: > On Mon, 2010-04-12 at 10:07 -0600, Ralph Castain wrote: >> By definition, if you bind to all available cpus in the OS, you are >> bound to nothing (i.e., "unbound") as your process runs on any >> available cpu. >> >> >> PLPA doesn't care, and I

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-13 Thread Nadia Derbey
On Mon, 2010-04-12 at 10:07 -0600, Ralph Castain wrote: > By definition, if you bind to all available cpus in the OS, you are > bound to nothing (i.e., "unbound") as your process runs on any > available cpu. > > > PLPA doesn't care, and I personally don't care. I was just explaining > why it gene

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Ralph Castain
Let me put this succinctly - I DO NOT CARE! I wrote this stuff, warning you folks from Sun in particular that you were opening a can of worms. As I said then, I'll do it once, but the vast range of corner cases will make this a nightmare that I will NOT continue to chase. Welcome to YOUR nightm

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Eugene Loh
Ralph Castain wrote: If someone tells us -bind-to-socket, but there is only one socket, then we really cannot bind them to anything. Any check by their code would reveal that they had not, in fact, been bound - raising questions as to whether or not OMPI is performing the request. Our operati

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Ralph Castain
On Apr 12, 2010, at 8:42 AM, Eugene Loh wrote: > Ralph Castain wrote: > >> If someone tells us -bind-to-socket, but there is only one socket, then we >> really cannot bind them to anything. Any check by their code would reveal >> that they had not, in fact, been bound - raising questions as to

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Ralph Castain
By definition, if you bind to all available cpus in the OS, you are bound to nothing (i.e., "unbound") as your process runs on any available cpu. PLPA doesn't care, and I personally don't care. I was just explaining why it generates an error in the odls. A user app would detect its binding by (

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Nadia Derbey
On Mon, 2010-04-12 at 07:50 -0600, Ralph Castain wrote: > Guess I'll jump in here as I finally had a few minutes to look at the code > and think about your original note. In fact, I believe your original > statement is the source of contention. > > If someone tells us -bind-to-socket, but there

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Eugene Loh
Ralph Castain wrote: If someone tells us -bind-to-socket, but there is only one socket, then we really cannot bind them to anything. Any check by their code would reveal that they had not, in fact, been bound - raising questions as to whether or not OMPI is performing the request. Our operati

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Terry Dontje
Ralph, I guess I am curious why is it that if there is only one socket we cannot bind to it? Does plpa actually error on this or is this a condition we decided was an error at odls? I am somewhat torn on whether this makes sense. On the one hand it is definitely useless as to the result if y

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Ralph Castain
Guess I'll jump in here as I finally had a few minutes to look at the code and think about your original note. In fact, I believe your original statement is the source of contention. If someone tells us -bind-to-socket, but there is only one socket, then we really cannot bind them to anything.

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Nadia Derbey
On Fri, 2010-04-09 at 14:23 -0400, Terry Dontje wrote: > Ralph Castain wrote: > > Okay, just wanted to ensure everyone was working from the same base > > code. > > > > > > Terry, Brad: you might want to look this proposed change over. > > Something doesn't quite look right to me, but I haven't

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-09 Thread Terry Dontje
Ralph Castain wrote: Okay, just wanted to ensure everyone was working from the same base code. Terry, Brad: you might want to look this proposed change over. Something doesn't quite look right to me, but I haven't really walked through the code to check it. At first blush I don't really get

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-09 Thread Ralph Castain
Okay, just wanted to ensure everyone was working from the same base code. Terry, Brad: you might want to look this proposed change over. Something doesn't quite look right to me, but I haven't really walked through the code to check it. On Apr 9, 2010, at 9:33 AM, Terry Dontje wrote: > Nadia

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-09 Thread Terry Dontje
Nadia Derbey wrote: On Fri, 2010-04-09 at 08:41 -0600, Ralph Castain wrote: Just to check: is this with the latest trunk? Brad and Terry have been making changes to this section of code, including modifying the PROCESS_IS_BOUND test... Well, it was on the v1.5. But I just checked:

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-09 Thread Nadia Derbey
On Fri, 2010-04-09 at 08:41 -0600, Ralph Castain wrote: > Just to check: is this with the latest trunk? Brad and Terry have been making > changes to this section of code, including modifying the PROCESS_IS_BOUND > test... > > Well, it was on the v1.5. But I just checked: looks like 1. the ca

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-09 Thread Ralph Castain
Just to check: is this with the latest trunk? Brad and Terry have been making changes to this section of code, including modifying the PROCESS_IS_BOUND test... On Apr 9, 2010, at 3:39 AM, Nadia Derbey wrote: > Hi, > > I am facing a problem with a test that runs fine on some nodes, and > fails