Hi Gilles.

Thank you for your reply! :)
I'm now using a compiled version of OpenMPI 3.0.2 and all seems to work
fine now.
Running `mpirun -n 3 -host c01,c02,c03 hostname` i get:
c01
c02
c03

`mpirun -n 2 -host c01,c02 hostname`:
c02
c01

`mpirun -n 2 -host c01,c03 hostname`:
c01
c03

Which is expected.

Now when I run a MPI_Spawn it prints out a warning message which refers to
it getting the wrong IP.
Check the command. I'll highlight some verbose.
`mpirun -n 1 --machinefile con_c03_hostfile --mca oob_base_verbose 10
con_c03`:
Hello world from processor c01, rank 0 out of 2 processors
Im the spawned rank 0
Hello world from processor c03, rank 1 out of 2 processors
[[35996,2],0][btl_tcp_endpoint.c:755:mca_btl_tcp_endpoint_start_connect]
from c03 to: c01 Unable to connect to the peer 10.0.0.1 on port 1024:
Network is unreachable

[c03:06355] pml_ob1_sendreq.c:235 FATAL

Verbose below:
[c01:05462] [[36010,0],0] oob:tcp:init adding 10.0.0.1 to our list of V4
connections
[c01:05462] [[36010,0],0] oob:tcp:init adding 172.16.0.1 to our list of V4
connections
[c01:05462] [[36010,0],0] oob:tcp:init adding 172.21.1.136 to our list of
V4 connections
[c03:06225] [[36010,0],1] oob:tcp:init adding 192.168.0.1 to our list of V4
connections
[c03:06225] [[36010,0],1] oob:tcp:init adding 172.16.0.2 to our list of V4
connections

Is there a way to suppress it?

My env is as described below:
*c01*
ens8 10.0.0.1/24
ens9 172.16.0.1/24
eth0 172.21.1.136/24

*c02*
eth0 10.0.0.2/24

*c03*
ens8 192.168.0.1/24
eth1 172.16.0.2/24

*c04*
eth0 192.168.0.2/24

Regards,
Carlos.

On Sun, Jul 1, 2018 at 9:01 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Carlos,
>
>
> Open MPI 3.0.2 has been released, and it contains several bug fixes, so I
> do
>
> encourage you to upgrade and try again.
>
>
>
> if it still does not work, can you please run
>
> mpirun --mca oob_base_verbose 10 ...
>
> and then compress and post the output ?
>
>
> out of curiosity, would
>
> mpirun --mca routed_radix 1 ...
>
> work in your environment ?
>
>
> once we can analyze the logs, we should be able to figure out what is
> going wrong.
>
>
> Cheers,
>
> Gilles
>
> On 6/29/2018 4:10 AM, carlos aguni wrote:
>
>> Just realized my email wasn't sent to the archive.
>>
>> On Sat, Jun 23, 2018 at 5:34 PM, carlos aguni <aguni...@gmail.com
>> <mailto:aguni...@gmail.com>> wrote:
>>
>>     Hi!
>>
>>     Thank you all for your reply Jeff, Gilles and rhc.
>>
>>     Thank you Jeff and rhc for clarifying to me some of the openmpi's
>>     internals.
>>
>>     >> FWIW: we never send interface names to other hosts - just dot
>>     addresses
>>     > Should have clarified - when you specify an interface name for the
>>     MCA param, then it is the interface name that is transferred as
>>     that is the value of the MCA param. However, once we determine our
>>     address, we only transfer dot addresses between ourselves
>>
>>     If only dot addresses are sent to the hosts then why doesn't
>>     openmpi use the default route like `ip route get <other host IP>`
>>     instead of choosing a random one? Is it an expected behaviour? Can
>>     it be changed?
>>
>>     Sorry. As Gilles pointed out I forgot to mention which openmpi
>>     version I was using. I'm using openmpi 3.0.0 gcc 7.3.0 from
>>     openhpc. Centos 7.5.
>>
>>     > mpirun—mca oob_tcp_if_exclude192.168.100.0/24
>>     <http://192.168.100.0/24>...
>>
>>     I cannot just exclude that interface cause after that I want to
>>     add another computer that's on a different network. And this is
>>     where things get messy :( I cannot just include and exclude
>>     networks cause I have different machines on different networks.
>>     This is what I want to achieve:
>>
>>
>>
>>
>>     compute01
>>
>>
>>
>>     compute02
>>
>>
>>
>>     compute03
>>
>>     ens3
>>
>>
>>
>>     192.168.100.104/24 <http://192.168.100.104/24>
>>
>>
>>
>>     10.0.0.227/24 <http://10.0.0.227/24>
>>
>>
>>
>>     192.168.100.105/24 <http://192.168.100.105/24>
>>
>>     ens8
>>
>>
>>
>>     10.0.0.228/24 <http://10.0.0.228/24>
>>
>>
>>
>>     172.21.1.128/24 <http://172.21.1.128/24>
>>
>>
>>
>>     ---
>>
>>     ens9
>>
>>
>>
>>     172.21.1.155/24 <http://172.21.1.155/24>
>>
>>
>>
>>     ---
>>
>>
>>
>>     ---
>>
>>
>>     So I'm in compute01 MPI_spawning another process on compute02 and
>>     compute03.
>>     With both MPI_Spawn and `mpirun -n 3 -host
>>     compute01,compute02,compute03 hostname`
>>
>>     Then when I include the mca parameters I get this:
>>     `mpirun --oversubscribe --allow-run-as-root -n 3 --mca
>>     oob_tcp_if_include 10.0.0.0/24,192.168.100.0/24
>>     <http://10.0.0.0/24,192.168.100.0/24> -host
>>     compute01,compute02,compute03 hostname`
>>     WARNING: An invalid value was given for oob_tcp_if_include. This
>>     value will be ignored.
>>     ...
>>     Message:    Did not find interface matching this subnet
>>
>>     This would all work if it were to use the system's internals like
>>     `ip route`.
>>
>>     Best regards,
>>     Carlos.
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to