Sorry, I forgot to mention that I did get my mpi app working with: mpirun --mca oob_tcp_dynamic_ipv4_ports 46100-46117 --mca btl_tcp_port_min_v4 46118 --mca btl_tcp_port_range_v4 17
But it’s not safe just to hard code those port ranges incase someone else uses those ports, or I want to run the app multiple times at once. On 19 Mar 2021, at 13:32, Sendu Bala <s...@sanger.ac.uk<mailto:s...@sanger.ac.uk>> wrote: Hi, Thanks for the explanation. I’m trying to restrict the port range, because if I don’t, mpiexec doesn’t function reliably. With 2 hosts it always works, then as you add hosts it is more and more likely to fail, until by 16 hosts is almost always fails. “Fails” here means that mpiexec terminates itself after just over 5mins with no error message and without having created its output files, and without the mpi app really doing anything. I’ve given full details about this in a previous post to this mailing list ("Failure to do anything under LSF”), but got no response. Setting the port range seems to be a solution to the problem, but I need something robust to use in production. As it is, I’ll have to figure out a good port range to use just-in-time. But understanding why it isn’t working and coming up with a better solution would be ideal. Cheers, Sendu. On 19 Mar 2021, at 13:13, Ralph Castain via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: Let me briefly explain how MPI jobs start. mpirun launches a set of daemons, one per node. Each daemon has a "phone home" address passed to it on its cmd line. It opens a port (obtained from its local OS) and connects back to the port provided on its cmd line. This establishes a connection back to mpirun. This set of ports is the "oob" set. mpirun then sends out a "launch msg" to every daemon telling them what ranks to start. When a daemon starts a rank, it provides (in the environment) its local port so that the rank can connect back to it. Once that connection is established, each rank opens its own ports (obtained independently from the local OS) for use by MPI - these are the "btl" ports. The rank sends that port information to its local daemon, and then the daemons do a global exchange so that every daemon winds up with a complete map of ranks to ports. This map is provided back to each rank. So at no point does anyone need to know what ports are available on other hosts - they simply receive info on what port each rank is using. The problem here is that one or more of those ranks was unable to get a port in the range you specified because they were all apparently occupied. If you don't have firewalls, then why are you trying to restrict the port range? On Mar 19, 2021, at 5:47 AM, Sendu Bala <s...@sanger.ac.uk<mailto:s...@sanger.ac.uk>> wrote: No firewall between nodes in the cluster. OMPI may be asking localhost for available ports, but is it checking those ports are also available on all the other hosts it’s going to run on? On 18 Mar 2021, at 15:57, Ralph Castain via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: Hmmm...then you have something else going on. By default, OMPI will ask the local OS for an available port and use it. You only need to specify ports when working thru a firewall. Do you have firewalls on this cluster? On Mar 18, 2021, at 8:55 AM, Sendu Bala <s...@sanger.ac.uk<mailto:s...@sanger.ac.uk>> wrote: Yes, that’s the trick. I’m going to have to check port usage on all hosts and pick suitable ranges just-in-time - and hope I don’t hit a race condition with other users of the cluster. Does mpiexec not have this kind of functionality built in? When I use it with no port options set (pure default), it just doesn’t function (I’m guessing because it chose “bad” or in-use ports). On 18 Mar 2021, at 14:11, Ralph Castain via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: Hard to say - unless there is some reason, why not make it large enough to not be an issue? You may have to experiment a bit as there is nothing to guarantee that other processes aren't occupying those regions. On Mar 18, 2021, at 2:13 AM, Sendu Bala <s...@sanger.ac.uk<mailto:s...@sanger.ac.uk>> wrote: Thanks, it made it work when I was running “true” as a test, but then my real MPI app failed with: [node-5-8-2][[48139,1],0][btl_tcp_component.c:966:mca_btl_tcp_component_create_listen] bind() failed: no port available in the range [46107..46139] -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[48139,1],1]) is on host: node-12-6-2 Process 2 ([[48139,1],0]) is on host: node-5-8-2 BTLs attempted: self tcp Your MPI job is now going to abort; sorry. This was when running with 16 cores, so I thought a 32 port range would be fine. Is this telling me I have to make it a 33 port range, have different ranges for oob and btl, or that some other unrelated software is using some ports in my range? (I changed my range from my previous post, because using that range resulted in the issue I posted about here before, where mpirun just does nothing for 5mins and then terminates itself, without any error messages.) Cheers, Sendu. On 17 Mar 2021, at 13:25, Ralph Castain via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: What you are missing is that there are _two_ messaging layers in the system. You told the btl/tcp layer to use the specified ports, but left the oob/tcp one unspecified. You need to add oob_tcp_dynamic_ipv4_ports = 46207-46239 or whatever range you want to specify Note that if you want the btl/tcp layer to use those other settings (e.g., keepalive_time), then you'll need to set those as well. The names of the variables may not match between the layers - you'll need to use ompi_info to find the names and params available for each layer. On Mar 16, 2021, at 2:43 AM, Vincent via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: On 09/03/2021 11:23, Sendu Bala via users wrote: When using mpirun, how do you pick which ports are used? I???ve tried: mpirun --mca btl_tcp_port_min_v4 46207 --mca btl_tcp_port_range_v4 32 --mca oob_tcp_keepalive_time 45 --mca oob_tcp_max_recon_attempts 20 --mca oob_tcp_retry_delay 1 --mca oob_tcp_keepalive_probes 20 --mca oob_tcp_keepalive_intvl 10 true And also setting similar things in openmpi/etc/openmpi-mca-params.conf : btl_tcp_port_min_v4 = 46207 btl_tcp_port_range_v4 = 32 oob_tcp_keepalive_time = 45 oob_tcp_max_recon_attempts = 20 oob_tcp_retry_delay = 1 oob_tcp_keepalive_probes = 20 oob_tcp_keepalive_intvl = 10 But when the process is running: ss -l -p -n | grep "pid=57642," tcp LISTEN 0 128 127.0.0.1:58439 0.0.0.0:* users:(("mpirun",pid=57642,fd=14)) tcp LISTEN 0 128 0.0.0.0:36253 0.0.0.0:* users:(("mpirun",pid=57642,fd=17)) What am I doing wrong, and how do I get it to use my desired ports (and other settings above)? Hello Could this be related to some recently resolved bug ? What version are you running ? Having a look on https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_open-2Dmpi_ompi_issues_8304&d=DwIFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=R4ZUzQZ7_TZ1SVV_pAmysrrJ1zatMHFpzMNAdJSpPIo&m=Dv6xQizR35EO5Xf86whFlO2mZWbJO9kT0iMDaeL0iXs&s=RhsRamUPqN_mfRS_JffG2ZAfqgCaYGL1Fkqbv1d3WB8&e= could be possibly useful? Regards Vincent. -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.