OpenMPI Users,

I hope this email finds you all well. I am writing to bring to your
attention an issue that I have encountered while using OpenMPI.

I received the following error message while running a job:

"Open MPI detected an inbound MPI TCP connection request from a peer that
appears to be part of this MPI job (i.e., it identified itself as part of
this Open MPI job), but it is from an IP address that is unexpected. This
is highly unusual. The inbound connection has been dropped, and the peer
should simply try again with a different IP interface (i.e., the job should
hopefully be able to continue).

Local host: node02 Local PID: 17805 Peer hostname: node01 ([[23078,1],2])
Source IP of socket: 192.168.0.3 Known IPs of peer: 192.168.0.225"

I have tried to troubleshoot the issue but to no avail. As a new user to
this subject, I am not sure what could be causing this issue. I did try
forcing the nodes to talk to each other using eth0 using the "-mca
btl_tcp_if_include eth0" command but it did not work.

I found a GitHub thread <https://github.com/open-mpi/ompi/issues/5818> from
2018 that discussed the issue, but since I am new to this, a lot of the
subject matter went over my head. Could you please advise on what could be
causing this issue and how to resolve it? If you need any additional
information, I would be happy to provide it.

Thank you in advance for your help.

Best regards,

Todd

Reply via email to