Forgot to mention this tip for debugging paffinity:
There is a test module in the paffinity framework. The module has mca params
that let you define the number of sockets/node (default: 4) and the
#cores/socket (also default: 4). So by setting -mca paffinity test and
adjusting those two paramet
Well, I guess I got sucked back into paffinity again...sigh.
I have committed a solution to this issue in r22984 and r22985. I have tested
it against a range of scenarios, but hardly an exhaustive test. So please do
stress it.
The following comments are by no means intended as criticism, but ra
On Tue, 2010-04-13 at 01:27 -0600, Ralph Castain wrote:
> On Apr 13, 2010, at 1:02 AM, Nadia Derbey wrote:
>
> > On Mon, 2010-04-12 at 10:07 -0600, Ralph Castain wrote:
> >> By definition, if you bind to all available cpus in the OS, you are
> >> bound to nothing (i.e., "unbound") as your process
On Apr 13, 2010, at 1:02 AM, Nadia Derbey wrote:
> On Mon, 2010-04-12 at 10:07 -0600, Ralph Castain wrote:
>> By definition, if you bind to all available cpus in the OS, you are
>> bound to nothing (i.e., "unbound") as your process runs on any
>> available cpu.
>>
>>
>> PLPA doesn't care, and I
On Mon, 2010-04-12 at 10:07 -0600, Ralph Castain wrote:
> By definition, if you bind to all available cpus in the OS, you are
> bound to nothing (i.e., "unbound") as your process runs on any
> available cpu.
>
>
> PLPA doesn't care, and I personally don't care. I was just explaining
> why it gene
Let me put this succinctly - I DO NOT CARE!
I wrote this stuff, warning you folks from Sun in particular that you were
opening a can of worms. As I said then, I'll do it once, but the vast range of
corner cases will make this a nightmare that I will NOT continue to chase.
Welcome to YOUR nightm
Ralph Castain wrote:
If someone tells us -bind-to-socket, but there is only one socket, then we
really cannot bind them to anything. Any check by their code would reveal that
they had not, in fact, been bound - raising questions as to whether or not OMPI
is performing the request. Our operati
On Apr 12, 2010, at 8:42 AM, Eugene Loh wrote:
> Ralph Castain wrote:
>
>> If someone tells us -bind-to-socket, but there is only one socket, then we
>> really cannot bind them to anything. Any check by their code would reveal
>> that they had not, in fact, been bound - raising questions as to
By definition, if you bind to all available cpus in the OS, you are bound to
nothing (i.e., "unbound") as your process runs on any available cpu.
PLPA doesn't care, and I personally don't care. I was just explaining why it
generates an error in the odls.
A user app would detect its binding by (
On Mon, 2010-04-12 at 07:50 -0600, Ralph Castain wrote:
> Guess I'll jump in here as I finally had a few minutes to look at the code
> and think about your original note. In fact, I believe your original
> statement is the source of contention.
>
> If someone tells us -bind-to-socket, but there
Ralph Castain wrote:
If someone tells us -bind-to-socket, but there is only one socket, then we
really cannot bind them to anything. Any check by their code would reveal that
they had not, in fact, been bound - raising questions as to whether or not OMPI
is performing the request. Our operati
Ralph, I guess I am curious why is it that if there is only one socket
we cannot bind to it? Does plpa actually error on this or is this a
condition we decided was an error at odls?
I am somewhat torn on whether this makes sense. On the one hand it is
definitely useless as to the result if y
Guess I'll jump in here as I finally had a few minutes to look at the code and
think about your original note. In fact, I believe your original statement is
the source of contention.
If someone tells us -bind-to-socket, but there is only one socket, then we
really cannot bind them to anything.
On Fri, 2010-04-09 at 14:23 -0400, Terry Dontje wrote:
> Ralph Castain wrote:
> > Okay, just wanted to ensure everyone was working from the same base
> > code.
> >
> >
> > Terry, Brad: you might want to look this proposed change over.
> > Something doesn't quite look right to me, but I haven't
Ralph Castain wrote:
Okay, just wanted to ensure everyone was working from the same base code.
Terry, Brad: you might want to look this proposed change over.
Something doesn't quite look right to me, but I haven't really walked
through the code to check it.
At first blush I don't really get
Okay, just wanted to ensure everyone was working from the same base code.
Terry, Brad: you might want to look this proposed change over. Something
doesn't quite look right to me, but I haven't really walked through the code to
check it.
On Apr 9, 2010, at 9:33 AM, Terry Dontje wrote:
> Nadia
Nadia Derbey wrote:
On Fri, 2010-04-09 at 08:41 -0600, Ralph Castain wrote:
Just to check: is this with the latest trunk? Brad and Terry have been making
changes to this section of code, including modifying the PROCESS_IS_BOUND
test...
Well, it was on the v1.5. But I just checked:
On Fri, 2010-04-09 at 08:41 -0600, Ralph Castain wrote:
> Just to check: is this with the latest trunk? Brad and Terry have been making
> changes to this section of code, including modifying the PROCESS_IS_BOUND
> test...
>
>
Well, it was on the v1.5. But I just checked: looks like
1. the ca
Just to check: is this with the latest trunk? Brad and Terry have been making
changes to this section of code, including modifying the PROCESS_IS_BOUND
test...
On Apr 9, 2010, at 3:39 AM, Nadia Derbey wrote:
> Hi,
>
> I am facing a problem with a test that runs fine on some nodes, and
> fails
19 matches
Mail list logo