Hi Gilles. Thank you for your reply! :) I'm now using a compiled version of OpenMPI 3.0.2 and all seems to work fine now. Running `mpirun -n 3 -host c01,c02,c03 hostname` i get: c01 c02 c03
`mpirun -n 2 -host c01,c02 hostname`: c02 c01 `mpirun -n 2 -host c01,c03 hostname`: c01 c03 Which is expected. Now when I run a MPI_Spawn it prints out a warning message which refers to it getting the wrong IP. Check the command. I'll highlight some verbose. `mpirun -n 1 --machinefile con_c03_hostfile --mca oob_base_verbose 10 con_c03`: Hello world from processor c01, rank 0 out of 2 processors Im the spawned rank 0 Hello world from processor c03, rank 1 out of 2 processors [[35996,2],0][btl_tcp_endpoint.c:755:mca_btl_tcp_endpoint_start_connect] from c03 to: c01 Unable to connect to the peer 10.0.0.1 on port 1024: Network is unreachable [c03:06355] pml_ob1_sendreq.c:235 FATAL Verbose below: [c01:05462] [[36010,0],0] oob:tcp:init adding 10.0.0.1 to our list of V4 connections [c01:05462] [[36010,0],0] oob:tcp:init adding 172.16.0.1 to our list of V4 connections [c01:05462] [[36010,0],0] oob:tcp:init adding 172.21.1.136 to our list of V4 connections [c03:06225] [[36010,0],1] oob:tcp:init adding 192.168.0.1 to our list of V4 connections [c03:06225] [[36010,0],1] oob:tcp:init adding 172.16.0.2 to our list of V4 connections Is there a way to suppress it? My env is as described below: *c01* ens8 10.0.0.1/24 ens9 172.16.0.1/24 eth0 172.21.1.136/24 *c02* eth0 10.0.0.2/24 *c03* ens8 192.168.0.1/24 eth1 172.16.0.2/24 *c04* eth0 192.168.0.2/24 Regards, Carlos. On Sun, Jul 1, 2018 at 9:01 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > Carlos, > > > Open MPI 3.0.2 has been released, and it contains several bug fixes, so I > do > > encourage you to upgrade and try again. > > > > if it still does not work, can you please run > > mpirun --mca oob_base_verbose 10 ... > > and then compress and post the output ? > > > out of curiosity, would > > mpirun --mca routed_radix 1 ... > > work in your environment ? > > > once we can analyze the logs, we should be able to figure out what is > going wrong. > > > Cheers, > > Gilles > > On 6/29/2018 4:10 AM, carlos aguni wrote: > >> Just realized my email wasn't sent to the archive. >> >> On Sat, Jun 23, 2018 at 5:34 PM, carlos aguni <aguni...@gmail.com >> <mailto:aguni...@gmail.com>> wrote: >> >> Hi! >> >> Thank you all for your reply Jeff, Gilles and rhc. >> >> Thank you Jeff and rhc for clarifying to me some of the openmpi's >> internals. >> >> >> FWIW: we never send interface names to other hosts - just dot >> addresses >> > Should have clarified - when you specify an interface name for the >> MCA param, then it is the interface name that is transferred as >> that is the value of the MCA param. However, once we determine our >> address, we only transfer dot addresses between ourselves >> >> If only dot addresses are sent to the hosts then why doesn't >> openmpi use the default route like `ip route get <other host IP>` >> instead of choosing a random one? Is it an expected behaviour? Can >> it be changed? >> >> Sorry. As Gilles pointed out I forgot to mention which openmpi >> version I was using. I'm using openmpi 3.0.0 gcc 7.3.0 from >> openhpc. Centos 7.5. >> >> > mpirun—mca oob_tcp_if_exclude192.168.100.0/24 >> <http://192.168.100.0/24>... >> >> I cannot just exclude that interface cause after that I want to >> add another computer that's on a different network. And this is >> where things get messy :( I cannot just include and exclude >> networks cause I have different machines on different networks. >> This is what I want to achieve: >> >> >> >> >> compute01 >> >> >> >> compute02 >> >> >> >> compute03 >> >> ens3 >> >> >> >> 192.168.100.104/24 <http://192.168.100.104/24> >> >> >> >> 10.0.0.227/24 <http://10.0.0.227/24> >> >> >> >> 192.168.100.105/24 <http://192.168.100.105/24> >> >> ens8 >> >> >> >> 10.0.0.228/24 <http://10.0.0.228/24> >> >> >> >> 172.21.1.128/24 <http://172.21.1.128/24> >> >> >> >> --- >> >> ens9 >> >> >> >> 172.21.1.155/24 <http://172.21.1.155/24> >> >> >> >> --- >> >> >> >> --- >> >> >> So I'm in compute01 MPI_spawning another process on compute02 and >> compute03. >> With both MPI_Spawn and `mpirun -n 3 -host >> compute01,compute02,compute03 hostname` >> >> Then when I include the mca parameters I get this: >> `mpirun --oversubscribe --allow-run-as-root -n 3 --mca >> oob_tcp_if_include 10.0.0.0/24,192.168.100.0/24 >> <http://10.0.0.0/24,192.168.100.0/24> -host >> compute01,compute02,compute03 hostname` >> WARNING: An invalid value was given for oob_tcp_if_include. This >> value will be ignored. >> ... >> Message: Did not find interface matching this subnet >> >> This would all work if it were to use the system's internals like >> `ip route`. >> >> Best regards, >> Carlos. >> >> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users >> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users