Hi,

I just compiled openmpi-4.0.1 using --with-sge to work with Univa Grid Engine 
that we have on our cluster.

I tried the basic hello world c program where a worker will print it's rank and 
the world size on stdout and then quit.

This seems to work fine and I've had it running on 64 nodes with no issues.


I moved on to a more complex test program where a worker calculates it's share 
of a sum from 1-N and then communicates its partial sum to rank 0 which 
collects all the answers using the MPI_Reduce() function.

Now that the program has workers that communicate amongst each other it is 
failing to work.

I get errors such as the following...

WARNING: Open MPI accepted a TCP connection from what appears to be a
another Open MPI process but cannot find a corresponding process
entry for that peer.

This attempted connection will be ignored; your MPI job may or may not
continue properly.

  Local host: node-hp0409
  PID:        58849


I've googled for this error and there doesn't seem to be anything relevant to 
this issue there as far as I can tell.
Does anyone have any idea what might be going on and what solutions there may 
be ?

Our nodes are running Scientific Linux release 7.2.

Regards,

Emyr James


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to