Carlos,

By any chance, could

mpirun—mca oob_tcp_if_exclude 192.168.100.0/24 ...

work for you ?

Which Open MPI version are you running ?


IIRC, subnets are internally translated to interfaces, so that might be an
issue if
the translation if made on the first host, and then the interface name is
sent to the other hosts.

Cheers,

Gilles

On Saturday, June 23, 2018, carlos aguni <aguni...@gmail.com> wrote:

> Hi all,
>
> I'm trying to run a code on 2 machines that has at least 2 network
> interfaces in it.
> So I have them as described below:
>
>
> compute01
>
> compute02
>
> ens3
>
> 192.168.100.104/24
>
> 10.0.0.227/24
>
> ens8
>
> 10.0.0.228/24
>
> 172.21.1.128/24
>
> ens9
>
> 172.21.1.155/24
>
> ---
>
> Issue is. When I execute `mpirun -n 2 -host compute01,compute02 hostname`
> on them what I get is the correct output after a very long delay..
>
> What I've read so far is that OpenMPI performs a greedy algorithm on each
> interface that timeouts if it doesn't find the desired IP.
> Then I saw here (https://www.open-mpi.org/faq/?category=tcp#tcp-selection)
> that I can run commands like:
> `$ mpirun -n 2 --mca oob_tcp_if_include 10.0.0.0/24 -n 2 -host
> compute01,compute02 hosname`
> But this configuration doesn't reach the other host(s).
> In the end I sometimes I get the same timeout.
>
> So is there a way to let it to use the system's default route?
>
> Regards,
> Carlos.
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to