Bonjour, Running with OpenMPI 1.4.3 on an SGI Altix cluster with 4096 cores, I got this error message, right at startup :mca_oob_tcp_peer_recv_connect_ack: received unexpected process identifier [[13816,0],209]
and the whole job is going to spin for an undefined period, without crashing/aborting.
What could be the culprit please ? Is there a workaround ? Which parameter is to be tuned ? Thanks in advance for any help, Best, G.
