Hi, On 11/7/06, Chevchenkovic Chevchenkovic <chevchenko...@gmail.com> wrote:
Hi, I had the following setup: Rank 0 process on node 1 wants to send an array of particular size to Rank 1 process on same node. 1. What are the optimisations that can be done/invoked while running mpirun to perform this memory to memory transfer efficiently? 2. Is there any performance gain if 2 processes that are exchanging data arrays are kept on the same node rather than on different nodes connected by infiniband?
if your aplication is on one given node, sharing data is better than copying data. You can do this with unix shared memory api, or with posix threads api. If aplications share the same address space, and if copy is necessary, memcpy() is probably the faster way (and ensuring that data is aligned in memory). However, this by definition does not work on multi-computer aplications/systems.. If you can have: 1 aplication per node, several threads per node. consider using MPI only between aplications, and setup your MPI framework to launch one aplication per node. program your aplication to use #threads per rank (node), and use posix threading model for parallel execution in each node (for instance, where #threads == NCPUS) , and use MPI for comunicating between nodes. the MPI model assumes you don't have a "shared memory" system.. therefore it is "message passing" oriented, and not designed to perform optimally on shared memory systems (like SMPs, or numa-CCs). best regards, -- Miguel Sousa Filipe