OpenMPI Users, I hope this email finds you all well. I am writing to bring to your attention an issue that I have encountered while using OpenMPI.
I received the following error message while running a job: "Open MPI detected an inbound MPI TCP connection request from a peer that appears to be part of this MPI job (i.e., it identified itself as part of this Open MPI job), but it is from an IP address that is unexpected. This is highly unusual. The inbound connection has been dropped, and the peer should simply try again with a different IP interface (i.e., the job should hopefully be able to continue). Local host: node02 Local PID: 17805 Peer hostname: node01 ([[23078,1],2]) Source IP of socket: 192.168.0.3 Known IPs of peer: 192.168.0.225" I have tried to troubleshoot the issue but to no avail. As a new user to this subject, I am not sure what could be causing this issue. I did try forcing the nodes to talk to each other using eth0 using the "-mca btl_tcp_if_include eth0" command but it did not work. I found a GitHub thread <https://github.com/open-mpi/ompi/issues/5818> from 2018 that discussed the issue, but since I am new to this, a lot of the subject matter went over my head. Could you please advise on what could be causing this issue and how to resolve it? If you need any additional information, I would be happy to provide it. Thank you in advance for your help. Best regards, Todd