On 3/1/10 1:51 AM, Ralph Castain wrote:
Which version of OMPI are you using? We know that the 1.2 series was unreliable 
about removing the session directories, but 1.3 and above appear to be quite 
good about it. If you are having problems with the 1.3 or 1.4 series, I would 
definitely like to know about it.

Oops; sorry!  OMPI 1.4.1, compiled with PGI 10.0 compilers,
running on Scientific Linux 5.4, ofed 1.4.2.

The session directories are *frequently* left behind.  I have
not really tried to characterize under what circumstances they
are removed. But please confirm:  they *should* be removed by
OMPI.

When I was at LANL, I ran a number of tests in exactly this configuration. 
While the sm btl did provide some performance advantage, it wasn't very much 
(the bandwidth was only about 10% greater, and the latency wasn't all that 
different either). I set the default configuration for users to include sm as 
10% isn't something to sneer at, but you could disable it without an enormous 
impact.

I'd prefer to provide as much performance as possible, also.

Another option would be to run an epilog that hammers the session directory. 
That's what LANL does, even though we didn't see much trouble with cleanup 
starting with the 1.3 series (still have a bunch of users stuck on 1.2). 
Depending on what environment you are running, you might contact folks there 
and get a copy of their epilog script.

Yes, we are already planning our prologues and epilogues, just
haven't implemented them yet.  Even if I can find and fix a
reason why OMPI is currently not doing this, we will probably
do it an epilogue anyway.

Thanks for your help!

On Mar 1, 2010, at 1:42 AM, David Turner wrote:

Hi all,

Running on a large cluster of 8-core nodes.  I understand
that the SM BTL is a "good thing".  But I'm curious about
its use of memory-mapped files.  I believe these files will
be in $TMPDIR, which defaults to /tmp.

In our cluster, the compute nodes are stateless, so /tmp
is actually in RAM.  Keeping memory-mapped "files" in
memory seems kind of circular, although I know little
about these things.  A bigger problem is that it appears
OMPI does not remove the files upon completion.

Another option is to redefine $TMPDIR to point to a
"real" file system.  In our cluster, all the available
file systems are accessed over the IB fabric.  So it
seems that there will be IB traffic, even though the
point of the SM BTL is to avoid this traffic.

Given the above two constraints, might it just be
better to disable the SM BTL entirely, and use the
IB BTL even within a node?  Of course, the "self"
BTL should still be used if appropriate.

Any thoughts clarifying these issues would be
greatly appreciated.  Thanks!

--
Best regards,

David Turner
User Services Group        email: dptur...@lbl.gov
NERSC Division             phone: (510) 486-4027
Lawrence Berkeley Lab        fax: (510) 486-4316
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Best regards,

David Turner
User Services Group        email: dptur...@lbl.gov
NERSC Division             phone: (510) 486-4027
Lawrence Berkeley Lab        fax: (510) 486-4316

Reply via email to