Hmmm...perhaps we can break this out a bit? The stdin will be going to your 
rank=0 proc. It sounds like you have some subsequent step that calls MPI_Bcast?

Can you first verify that the input is being correctly delivered to rank=0? 
This will help us isolate if the problem is in the IO forwarding, or in the 
subsequent Bcast.

> On Aug 22, 2016, at 1:11 PM, Jingchao Zhang <zh...@unl.edu> wrote:
> 
> Hi all,
> 
> We compiled openmpi/2.0.0 with gcc/6.1.0 and intel/13.1.3. Both of them have 
> odd behaviors when trying to read from standard input.
> 
> For example, if we start the application lammps across 4 nodes, each node 16 
> cores, connected by Intel QDR Infiniband, mpirun works fine for the 1st time, 
> but always stuck in a few seconds thereafter.
> Command:
> mpirun ./lmp_ompi_g++ < in.snr
> in.snr is the Lammps input file. compiler is gcc/6.1.
> 
> Instead, if we use
> mpirun ./lmp_ompi_g++ -in in.snr
> it works 100%.
> 
> Some odd behaviors we gathered so far. 
> 1. For 1 node job, stdin always works.
> 2. For multiple nodes, stdin works unstably when the number of cores per node 
> are relatively small. For example, for 2/3/4 nodes, each node 8 cores, mpirun 
> works most of the time. But for each node with >8 cores, mpirun works the 1st 
> time, then always stuck. There seems to be a magic number when it stops 
> working.
> 3. We tested Quantum Expresso with compiler intel/13 and had the same issue. 
> 
> We used gdb to debug and found when mpirun was stuck, the rest of the 
> processes were all waiting on mpi broadcast from the master thread. The 
> lammps binary, input file and gdb core files (example.tar.bz2) can be 
> downloaded from this link 
> https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc 
> <https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc>
> 
> Extra information:
> 1. Job scheduler is slurm.
> 2. configure setup:
> ./configure     --prefix=$PREFIX \
>                 --with-hwloc=internal \
>                 --enable-mpirun-prefix-by-default \
>                 --with-slurm \
>                 --with-verbs \
>                 --with-psm \
>                 --disable-openib-connectx-xrc \
>                 --with-knem=/opt/knem-1.1.2.90mlnx1 \
>                 --with-cma
> 3. openmpi-mca-params.conf file 
> orte_hetero_nodes=1
> hwloc_base_binding_policy=core
> rmaps_base_mapping_policy=core
> opal_cuda_support=0
> btl_openib_use_eager_rdma=0
> btl_openib_max_eager_rdma=0
> btl_openib_flags=1
> 
> Thanks,
> Jingchao 
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to