Hi,

On 11/7/06, Chevchenkovic Chevchenkovic <chevchenko...@gmail.com> wrote:
Hi,
  I had the following setup:
 Rank 0 process on node 1 wants to send an array of particular size to Rank
1 process on same node.
1. What are the optimisations that can be done/invoked while running mpirun
to perform this memory to memory transfer efficiently?
2. Is there any performance gain  if 2 processes that are exchanging data
arrays are kept on the same node rather than on different nodes connected by
infiniband?

if your aplication is on one given node, sharing data is better than
copying data.
You can do this with unix shared memory api, or with posix threads api.
If aplications share the same address space, and if copy is necessary,
memcpy() is probably the faster way (and ensuring that data is aligned
in memory).
However, this by definition does not work on multi-computer
aplications/systems..

If you can have:

1 aplication per node, several threads per node.
consider using MPI only between aplications, and setup your MPI
framework to launch one aplication per node.

program your aplication to use #threads per rank (node), and use posix
threading model for parallel execution in each node (for instance,
where #threads == NCPUS) , and use MPI for comunicating between nodes.

the MPI model assumes you don't have a "shared memory" system..
therefore it is "message passing" oriented, and not designed to
perform optimally on shared memory systems (like SMPs, or numa-CCs).


best regards,

--
Miguel Sousa Filipe

Reply via email to