On Oct 29, 2012, at 11:01 AM, Ralph Castain wrote:

> Wow, that would make no sense at all. If P1 and P2 are on the same node, then 
> we will use shared memory to do the transfer, as Jeff described. However, if 
> you disable shared memory, as you indicated you were doing on a previous 
> message (by adding -mca btl ^sm), then we would use a loopback device if 
> available - i.e., the packet would be handed to the network stack, which 
> would then return it to P2 without it ever leaving the node.
> 
> If there is no loopback device, and you disable shared memory, then we would 
> abort the job with an error as there is no way for P1 to communicate with P2.
> 
> We would never do what you describe.

To be clear: it would probably be a good idea to have *some* tmpfs on your 
diskless node.  Some things should simply not be on a network filesystem (e.g., 
/tmp).  Google around; there are good reasons for having a small tmpfs, even on 
a diskless server.

Indeed, Open MPI will warn you if it ends up putting a shared memory "file" 
(which, as I described, isn't really a file) on a network filesystem -- e.g., 
if /tmp is a network filesystem.  OMPI warns because corner cases can arise 
that cause performance degradation (e.g., the OS may periodically writing out 
the contents of shared memory to the network filesystem).

But as Ralph says: Open MPI primarily uses shared memory when communicating 
with processes on the same server (unless you disable shared memory).  This 
means Open MPI copies message A from P1's address space to shared memory, and 
then P2 copies message A from shared memory to its address space.  Or, if 
you're using the Linux knem kernel module, MPI copies message A from P1's 
address space directly to P2's address space.  No network transfer occurs, 
unless you possibly have /tmp on a network filesystem, and/or no /dev/shm 
filesystem, or other corner cases like that.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to