William, On a typical HPC cluster, the internal interface is not protected by the firewall. If this is eth0, then you can
mpirun --mca oob_tcp_if_include eth0 --mca btl_tcp_if_include eth0 ... If only a small range of port is available, then you will also need to use the oob_tcp_dynamic_ipv4_ports, btl_tcp_port_min_v4 and btl_tcp_port_range_v4 MCA params in order to tell MPI which range of ports are open. Cheers, Gilles On Mon, Feb 12, 2018 at 9:23 PM, William Mitchell <wfma...@gmail.com> wrote: > Thanks, George. My sysadmin now says he is pretty sure it is the firewall, > but that "isn't going to change" so we need to find a solution. > > On 9 February 2018 at 16:58, George Bosilca <bosi...@icl.utk.edu> wrote: >> >> What are the settings of the firewall on your 2 nodes ? >> >> George. >> >> >> >> On Fri, Feb 9, 2018 at 3:08 PM, William Mitchell <wfma...@gmail.com> >> wrote: >>> >>> When I try to run an MPI program on a network with a shared file system >>> and connected by ethernet, I get the error message "tcp_peer_send_blocking: >>> send() to socket 9 failed: Broken pipe (32)" followed by some suggestions of >>> what could cause it, none of which are my problem. I have searched the FAQ, >>> mailing list archives, and googled the error message, with only a few hits >>> touching on it, none of which solved the problem. >>> >>> This is on a Linux CentOS 7 system with Open MPI 1.10.6 and Intel Fortran >>> (more detailed system information below). >>> >>> Here are details on how I encounter the problem: >>> >>> me@host1> cat hellompi.f90 >>> program hello >>> include 'mpif.h' >>> integer rank, size, ierror, nl >>> character(len=MPI_MAX_PROCESSOR_NAME) :: hostname >>> >>> call MPI_INIT(ierror) >>> call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror) >>> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) >>> call MPI_GET_PROCESSOR_NAME(hostname, nl, ierror) >>> print*, 'node', rank, ' of', size, ' on ', hostname(1:nl), ': Hello >>> world' >>> call MPI_FINALIZE(ierror) >>> end >>> >>> me@host1> mpifort --showme >>> ifort -I/usr/include/openmpi-x86_64 -pthread -m64 >>> -I/usr/lib64/openmpi/lib -Wl,-rpath -Wl,/usr/lib64/openmpi/lib >>> -Wl,--enable-new-dtags -L/usr/lib64/openmpi/lib -lmpi_usempi -lmpi_mpifh >>> -lmpi >>> >>> me@host1> ifort --version >>> ifort (IFORT) 18.0.0 20170811 >>> Copyright (C) 1985-2017 Intel Corporation. All rights reserved. >>> >>> me@host1> mpifort -o hellompi hellompi.f90 >>> >>> [Note: it runs on 1 machine, but not on two] >>> >>> me@host1> mpirun -np 2 hellompi >>> node 0 of 2 on host1.domain: Hello world >>> node 1 of 2 on host1.domain: Hello world >>> >>> me@host1> cat hosts >>> host2.domain >>> host1.domain >>> >>> me@host1> mpirun -np 2 --hostfile hosts hellompi >>> [host2.domain:250313] [[46562,0],1] tcp_peer_send_blocking: send() to >>> socket 9 failed: Broken pipe (32) >>> >>> -------------------------------------------------------------------------- >>> ORTE was unable to reliably start one or more daemons. >>> This usually is caused by: >>> [suggested causes deleted] >>> >>> Here is system information: >>> >>> me@host2> cat /etc/redhat-release >>> CentOS Linux release 7.4.1708 (Core) >>> >>> me@host1> uname -a >>> Linux host1.domain 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 >>> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux >>> >>> me@host1> rpm -qa | grep openmpi >>> mpitests-openmpi-4.1-1.el7.x86_64 >>> openmpi-1.10.6-2.el7.x86_64 >>> openmpi-devel-1.10.6-2.el7.x86_64 >>> >>> me@host1> ompi_info --all >>> [Results of this command for each host are in the attached files.] >>> >>> me@host1> ompi_info -v ompi full --parsable >>> ompi_info: Error: unknown option "-v" >>> [Is the request to run that command given on the Open MPI "Getting Help" >>> web page an error?] >>> >>> me@host1> printenv | grep OMPI >>> MPI_COMPILER=openmpi-x86_64 >>> OMPI_F77=ifort >>> OMPI_FC=ifort >>> OMPI_MCA_mpi_yield_when_idle=1 >>> OMPI_MCA_btl=tcp,self >>> >>> I am using ssh-agent, and I can ssh between the two hosts. In fact, from >>> host1 I can use ssh to request that host2 ssh back to host1: >>> >>> me@host1> ssh -A host2 "ssh host1 hostname" >>> host1.domain >>> >>> Any suggestions on how to solve this problem are appreciated. >>> >>> Bill >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users