Hi,
The gather hangs only in liner_sync algorithm but works with
basic_linear and binomial algorithms.
The gather algorithm is choosen dynamiclly depanding on block size and
communicator size.
So, in the beginning, binomial algorithm is chosen (communicator size
is larger then 60).
When increasing
Well, then I would suspect rdmacm vs oob QP configuration. They supposed to be
the same, but probably it's some bug there, and somehow rdmacm QP tuning
different from oob, it is potential source cause for the performance
differences that you see.
Regards,
Pavel (Pasha) Shamis
---
Application