Hello Pasha,
As the error was not repeating frequently, I didn't look into the
issue from a long time. But now I started to diagnose it:
Initially I tested with ibv_rc_pingpong (Master node to all compute nodes
using a for loop). Its working for each of the nodes.
The files generated o
Sangamesh,
The ib tunings that you added to your command line only delay the
problem but doesn't resolve it.
The node-0-2.local gets asynchronous event "IBV_EVENT_PORT_ERROR" and as
result
the processes fails to deliver packets to some remote hosts and as
result you see bunch of IB errors.
T