Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-12 Thread Jeff Squyres
On Apr 12, 2010, at 11:10 AM, Oliver Geisler wrote: > > Is the /tmp filesystem on NFS by any chance? > > Yes, /tmp is on NFS .. those are diskless nodes all without disks and > no swap space mounted. Ah, that could do it. Open MPI's shared memory files are under /tmp. So if /tmp is NFS, you

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Ralph Castain
Let me put this succinctly - I DO NOT CARE! I wrote this stuff, warning you folks from Sun in particular that you were opening a can of worms. As I said then, I'll do it once, but the vast range of corner cases will make this a nightmare that I will NOT continue to chase. Welcome to YOUR nightm

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Eugene Loh
Ralph Castain wrote: If someone tells us -bind-to-socket, but there is only one socket, then we really cannot bind them to anything. Any check by their code would reveal that they had not, in fact, been bound - raising questions as to whether or not OMPI is performing the request. Our operati

Re: [OMPI devel] bind-to-board [was: problem when binding to socket on a single socket node]

2010-04-12 Thread Ralph Castain
Get a life :-) On Apr 12, 2010, at 11:56 AM, Eugene Loh wrote: > Ralph Castain wrote: > >> On Apr 12, 2010, at 8:42 AM, Eugene Loh wrote: >> >>> Ralph Castain wrote: >>> If someone tells us -bind-to-socket, but there is only one socket, then we really cannot bind them to anyth

[OMPI devel] bind-to-board [was: problem when binding to socket on a single socket node]

2010-04-12 Thread Eugene Loh
Ralph Castain wrote: On Apr 12, 2010, at 8:42 AM, Eugene Loh wrote: Ralph Castain wrote: If someone tells us -bind-to-socket, but there is only one socket, then we really cannot bind them to anything. Any check by their code would reveal that they had not, in fact, been bound - raisi

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Ralph Castain
On Apr 12, 2010, at 8:42 AM, Eugene Loh wrote: > Ralph Castain wrote: > >> If someone tells us -bind-to-socket, but there is only one socket, then we >> really cannot bind them to anything. Any check by their code would reveal >> that they had not, in fact, been bound - raising questions as to

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Ralph Castain
By definition, if you bind to all available cpus in the OS, you are bound to nothing (i.e., "unbound") as your process runs on any available cpu. PLPA doesn't care, and I personally don't care. I was just explaining why it generates an error in the odls. A user app would detect its binding by (

Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-12 Thread Ralph Castain
In that scenario, you need to set the session directories to point somewhere other than /tmp. I believe you will find that in our FAQs as this has been a recurring problem. The shared memory backing file resides in the session directory tree, so if that is NFS mounted, your performance will sti

Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-12 Thread Oliver Geisler
Quoting Ashley Pittman : On 10 Apr 2010, at 04:51, Eugene Loh wrote: Why is shared-memory performance about four orders of magnitude slower than it should be? The processes are communicating via memory that's shared by having the processes all mmap the same file into their address spac

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Nadia Derbey
On Mon, 2010-04-12 at 07:50 -0600, Ralph Castain wrote: > Guess I'll jump in here as I finally had a few minutes to look at the code > and think about your original note. In fact, I believe your original > statement is the source of contention. > > If someone tells us -bind-to-socket, but there

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Eugene Loh
Ralph Castain wrote: If someone tells us -bind-to-socket, but there is only one socket, then we really cannot bind them to anything. Any check by their code would reveal that they had not, in fact, been bound - raising questions as to whether or not OMPI is performing the request. Our operati

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Terry Dontje
Ralph, I guess I am curious why is it that if there is only one socket we cannot bind to it? Does plpa actually error on this or is this a condition we decided was an error at odls? I am somewhat torn on whether this makes sense. On the one hand it is definitely useless as to the result if y

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Ralph Castain
Guess I'll jump in here as I finally had a few minutes to look at the code and think about your original note. In fact, I believe your original statement is the source of contention. If someone tells us -bind-to-socket, but there is only one socket, then we really cannot bind them to anything.

Re: [OMPI devel] problem when binding to socket on a single socket node

2010-04-12 Thread Nadia Derbey
On Fri, 2010-04-09 at 14:23 -0400, Terry Dontje wrote: > Ralph Castain wrote: > > Okay, just wanted to ensure everyone was working from the same base > > code. > > > > > > Terry, Brad: you might want to look this proposed change over. > > Something doesn't quite look right to me, but I haven't