Hi,
I am using MPI_Reduce operation on 122880x400 matrix of doubles. The
parallel job runs on 32 machines, each having different processor in
terms of speed, but the architecture and OS is the same on all
machines (x86_64). The task is a typical map-and-reduce, i.e. each of
the processes
er every so many operations to avoid the problem.
This is done by enabling the "sync" collective component, and then
adjusting the number of operations between forced syncs.
Do an "ompi_info --params coll sync" to see the options. Then set
the coll_sync_priority to something li