On Jun 9, 2010, at 12:43 AM, Gus Correa wrote:

> btl_self_priority=0 (default value)
> btl_sm_priority=0 (default value)

These are ok.  BTL selection is a combination of priority and reachability.  
The self BTL can *only* reach its own process.  So process A will use the 
"self" BTL to talk to process A.  The sm BTL can only reach *other* processes 
on the same host.  So process A will use the sm BTL to talk to process B, 
providing A != B and both A and B are on the same host.

> coll_basic_priority=10 (default value)
> coll_hierarch_priority=0 (default value)
> coll_inter_priority=40 (default value)
> coll_self_priority=75 (default value)
> coll_sm_priority=0 (default value)
> coll_sync_priority=50 (default value)
> coll_tuned_priority=30 (default value)
> 
> [Note that 'coll' priorities are *not* tied,
> 'self' is maximum (75), and 'sm' is minimum (0).]

Right.  Coll selection, in essence, is the same as BTL selection, but the 
mechanics are a little different.  Coll modules are selected on a 
per-communicator basis, and will only allow themselves to be selected if they 
can reach all members of a given communicator.  For example, the self coll will 
only allow itself to be selected for MPI_COMM_SELF (and duplicates thereof).  
sm will only allow itself to be selected when all procs in the communicator are 
on the same host.  And so on.

> coll:sm:comm_query (0/MPI_COMM_WORLD): priority too low; disqualifying
> myself
> coll:sm:comm_query (3/MPI_COMMUNICATOR 3): priority too low;
> disqualifying myself
> 
> [Therefore, 'sm' seems to give up working in collectives ... :( ]

Correct.

I believe that we simply have the priority low for the sm collectives; you 
might want to try to raise it.

We are actually working on the shared memory collectives for future releases; 
the current sm coll module that is shipped only has 4 algorithms implemented: 
barrier, bcast, reduce, allreduce for intracommunicators.  :-(

> coll:base:comm_select: Checking all available modules
> coll:base:comm_select: component available: basic, priority: 10
> coll:base:comm_select: component not available: hierarch
> coll:base:comm_select: component not available: inter
> coll:base:comm_select: component not available: self
> coll:base:comm_select: component not available: sm
> coll:base:comm_select: component available: sync, priority: 50
> coll:base:comm_select: component available: tuned, priority: 30
> coll:base:comm_select: Checking all available modules
> coll:base:comm_select: component available: basic, priority: 10
> coll:base:comm_select: component not available: hierarch
> coll:base:comm_select: component not available: inter
> coll:base:comm_select: component available: self, priority: 75
> 
> [Eventually 'sm', 'inter', and 'hierarch' seem to go out of business,
> whereas 'basic', 'sync' and 'tuned' hang in there.
> It is awkward that 'self' claims both to
> be available and not available!]

This must be selection for 2 different communicators.  Right before the 
"checking all available modules" message, there should be another one 
identifying which communicator this selection is for.  The tags on the left of 
the message should identify which process the selection is occurring in, so at 
least for MPI_THREAD_SINGLE MPI applications, the ordering should be 
deterministic and follow-able (even though the output from multiple processes 
may be interleaved -- the tags on the left should allow you to distinguish who 
is who).

> 1) Are the "coll" priorities above (default values) the best choices
> when I run in a single node, or were they chosen for a general
> situation when the job runs across node boundaries?

They're generally good.  We're probably too conservative for the sm coll 
because there was a time when it was buggy.  They should all be fixed now, 
though.

> 2) Why does "self" have the largest value (75)?

It will for MPI_COMM_SELF (and dups).

The coll priorities might be a bit confusing because they can adjust themselves 
during selection.  It's also a bit more complicated because coll's are chosen 
on a per-communicator basis, and the priority is not necessarily uniform for 
every communicator.

Hence, you should probably look at those priorities as the *max* priority a 
given coll will present itself as.  Hence, self's max will be 75.  But for 
communicators where it doesn't allow itself to be selected, it's effectively 0.

> 3) Does it mean that all collectives will use the
> self/loopback mechanism for communication?

No.

> How about 'basic' and the rest of the gang with smaller priorities?

The priorities are assessed on a per-communicator basis, and the modules can 
adjust their priorities accordingly (to either 0 or their respective max value).

It's even *more* complicated because colls are allowed to mix and match on a 
single communicator.  For example, I cited above that the sm coll only has 
bcast, barrier, reduce, and allreduce.  So sm coll will "win" for communicator 
X, but only for those 4.  The next highest coll will be used to fill in the 
others.  If there's still more left after that one, then the next coll will be 
used, etc.  The process is repeated until all MPI collective operations have a 
plugin to use.

> 4) Is it a good idea to set the 'sm' priority to a value
> larger than 75 (to beat "self" and take over the collective functions)?

It'll always beat self because self won't allow itself to be selected for 
communicators containing more than 1 process.

> 5) In this case, will the collectives only use "sm"?

If you set the sm priority large than basic and tuned, yes.

> 6) Will this improve or degrade performance ?

Depends on your app.  :-)  The idea is that it will improve performance if 
you're using those 4 operations.  The others will generally fall back to tuned.

> 7) Is there any literature where I can learn
> more about these OpenMPI collective priorities?

Unfortunately not...  :-(

> (I couldn't find anything about it on the FAQs.
> Actually, a group of FAQ about collectives would be very helpful.)

Agreed.  You wouldn't have a few cycles to write this stuff up, would you?  

    https://svn.open-mpi.org/trac/ompi/wiki/OMPIFAQEntries

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to