Hmmm...perhaps we can break this out a bit? The stdin will be going to your rank=0 proc. It sounds like you have some subsequent step that calls MPI_Bcast?
Can you first verify that the input is being correctly delivered to rank=0? This will help us isolate if the problem is in the IO forwarding, or in the subsequent Bcast. > On Aug 22, 2016, at 1:11 PM, Jingchao Zhang <zh...@unl.edu> wrote: > > Hi all, > > We compiled openmpi/2.0.0 with gcc/6.1.0 and intel/13.1.3. Both of them have > odd behaviors when trying to read from standard input. > > For example, if we start the application lammps across 4 nodes, each node 16 > cores, connected by Intel QDR Infiniband, mpirun works fine for the 1st time, > but always stuck in a few seconds thereafter. > Command: > mpirun ./lmp_ompi_g++ < in.snr > in.snr is the Lammps input file. compiler is gcc/6.1. > > Instead, if we use > mpirun ./lmp_ompi_g++ -in in.snr > it works 100%. > > Some odd behaviors we gathered so far. > 1. For 1 node job, stdin always works. > 2. For multiple nodes, stdin works unstably when the number of cores per node > are relatively small. For example, for 2/3/4 nodes, each node 8 cores, mpirun > works most of the time. But for each node with >8 cores, mpirun works the 1st > time, then always stuck. There seems to be a magic number when it stops > working. > 3. We tested Quantum Expresso with compiler intel/13 and had the same issue. > > We used gdb to debug and found when mpirun was stuck, the rest of the > processes were all waiting on mpi broadcast from the master thread. The > lammps binary, input file and gdb core files (example.tar.bz2) can be > downloaded from this link > https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc > <https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc> > > Extra information: > 1. Job scheduler is slurm. > 2. configure setup: > ./configure --prefix=$PREFIX \ > --with-hwloc=internal \ > --enable-mpirun-prefix-by-default \ > --with-slurm \ > --with-verbs \ > --with-psm \ > --disable-openib-connectx-xrc \ > --with-knem=/opt/knem-1.1.2.90mlnx1 \ > --with-cma > 3. openmpi-mca-params.conf file > orte_hetero_nodes=1 > hwloc_base_binding_policy=core > rmaps_base_mapping_policy=core > opal_cuda_support=0 > btl_openib_use_eager_rdma=0 > btl_openib_max_eager_rdma=0 > btl_openib_flags=1 > > Thanks, > Jingchao > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users