Re: [OMPI users] stdin issue with openmpi/2.0.0

r...@open-mpi.org Mon, 29 Aug 2016 10:55:25 -0700

I’m sorry, but something is simply very wrong here. Are you sure you are 
pointed at the correct LD_LIBRARY_PATH? Perhaps add a “BOO” or something at the 
front of the output message to ensure we are using the correct plugin?


This looks to me like you must be picking up a stale library somewhere.

> On Aug 29, 2016, at 10:29 AM, Jingchao Zhang <zh...@unl.edu> wrote:
> 
> Hi Ralph,
> 
> I used the tarball from Aug 26 and added the patch. Tested with 2 nodes, 10 
> cores/node. Please see the results below:
> 
> $ mpirun ./a.out < test.in
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 35 for process [[43954,1],0]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 41 for process [[43954,1],0]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 43 for process [[43954,1],0]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 37 for process [[43954,1],1]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 46 for process [[43954,1],1]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 49 for process [[43954,1],1]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 38 for process [[43954,1],2]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 50 for process [[43954,1],2]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 52 for process [[43954,1],2]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 42 for process [[43954,1],3]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 53 for process [[43954,1],3]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 55 for process [[43954,1],3]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 45 for process [[43954,1],4]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 56 for process [[43954,1],4]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 58 for process [[43954,1],4]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 47 for process [[43954,1],5]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 59 for process [[43954,1],5]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 61 for process [[43954,1],5]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 57 for process [[43954,1],6]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 64 for process [[43954,1],6]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 66 for process [[43954,1],6]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 62 for process [[43954,1],7]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 68 for process [[43954,1],7]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 70 for process [[43954,1],7]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 65 for process [[43954,1],8]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 72 for process [[43954,1],8]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 74 for process [[43954,1],8]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 75 for process [[43954,1],9]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 79 for process [[43954,1],9]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
> [[43954,0],0] iof:hnp pushing fd 81 for process [[43954,1],9]
> Rank 5 has cleared MPI_Init
> Rank 9 has cleared MPI_Init
> Rank 1 has cleared MPI_Init
> Rank 2 has cleared MPI_Init
> Rank 3 has cleared MPI_Init
> Rank 4 has cleared MPI_Init
> Rank 8 has cleared MPI_Init
> Rank 0 has cleared MPI_Init
> Rank 6 has cleared MPI_Init
> Rank 7 has cleared MPI_Init
> Rank 14 has cleared MPI_Init
> Rank 15 has cleared MPI_Init
> Rank 16 has cleared MPI_Init
> Rank 18 has cleared MPI_Init
> Rank 10 has cleared MPI_Init
> Rank 11 has cleared MPI_Init
> Rank 12 has cleared MPI_Init
> Rank 13 has cleared MPI_Init
> Rank 17 has cleared MPI_Init
> Rank 19 has cleared MPI_Init
> 
> Thanks,
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> From: users <users-boun...@lists.open-mpi.org 
> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>>
> Sent: Saturday, August 27, 2016 12:31:53 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> I am finding this impossible to replicate, so something odd must be going on. 
> Can you please (a) pull down the latest v2.0.1 nightly tarball, and (b) add 
> this patch to it?
> 
> diff --git a/orte/mca/iof/hnp/iof_hnp.c b/orte/mca/iof/hnp/iof_hnp.c
> old mode 100644
> new mode 100755
> index 512fcdb..362ff46
> --- a/orte/mca/iof/hnp/iof_hnp.c
> +++ b/orte/mca/iof/hnp/iof_hnp.c
> @@ -143,16 +143,17 @@ static int hnp_push(const orte_process_name_t* 
> dst_name, orte_iof_tag_t src_tag,
>      int np, numdigs;
>      orte_ns_cmp_bitmask_t mask;
>  
> +    opal_output(0,
> +                         "%s iof:hnp pushing fd %d for process %s",
> +                         ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
> +                         fd, ORTE_NAME_PRINT(dst_name));
> +
>      /* don't do this if the dst vpid is invalid or the fd is negative! */
>      if (ORTE_VPID_INVALID == dst_name->vpid || fd < 0) {
>          return ORTE_SUCCESS;
>      }
>  
> -    OPAL_OUTPUT_VERBOSE((1, orte_iof_base_framework.framework_output,
> -                         "%s iof:hnp pushing fd %d for process %s",
> -                         ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
> -                         fd, ORTE_NAME_PRINT(dst_name)));
> -
>      if (!(src_tag & ORTE_IOF_STDIN)) {
>          /* set the file descriptor to non-blocking - do this before we setup
>           * and activate the read event in case it fires right away
> 
> 
> You can then run the test again without the "--mca iof_base_verbose 100” flag 
> to reduce the chatter - this print statement will tell me what I need to know.
> 
> Thanks!
> Ralph
> 
> 
>> On Aug 25, 2016, at 8:19 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com 
>> <mailto:jsquy...@cisco.com>> wrote:
>> 
>> The IOF fix PR for v2.0.1 was literally just merged a few minutes ago; it 
>> wasn't in last night's tarball.
>> 
>> 
>> 
>>> On Aug 25, 2016, at 10:59 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> 
>>> wrote:
>>> 
>>> ??? Weird - can you send me an updated output of that last test we ran?
>>> 
>>>> On Aug 25, 2016, at 7:51 AM, Jingchao Zhang <zh...@unl.edu 
>>>> <mailto:zh...@unl.edu>> wrote:
>>>> 
>>>> Hi Ralph,
>>>> 
>>>> I saw the pull request and did a test with v2.0.1rc1, but the problem 
>>>> persists. Any ideas?
>>>> 
>>>> Thanks,
>>>> 
>>>> Dr. Jingchao Zhang
>>>> Holland Computing Center
>>>> University of Nebraska-Lincoln
>>>> 402-472-6400
>>>> From: users <users-boun...@lists.open-mpi.org 
>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>>>> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>>
>>>> Sent: Wednesday, August 24, 2016 1:27:28 PM
>>>> To: Open MPI Users
>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>> 
>>>> Bingo - found it, fix submitted and hope to get it into 2.0.1
>>>> 
>>>> Thanks for the assist!
>>>> Ralph
>>>> 
>>>> 
>>>>> On Aug 24, 2016, at 12:15 PM, Jingchao Zhang <zh...@unl.edu 
>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>> 
>>>>> I configured v2.0.1rc1 with --enable-debug and ran the test with --mca 
>>>>> iof_base_verbose 100. I also added -display-devel-map in case it provides 
>>>>> some useful information.
>>>>> 
>>>>> Test job has 2 nodes, each node 10 cores. Rank 0 and mpirun command on 
>>>>> the same node.
>>>>> $ mpirun -display-devel-map --mca iof_base_verbose 100 ./a.out < test.in 
>>>>> &> debug_info.txt
>>>>> 
>>>>> The debug_info.txt is attached. 
>>>>> 
>>>>> Dr. Jingchao Zhang
>>>>> Holland Computing Center
>>>>> University of Nebraska-Lincoln
>>>>> 402-472-6400
>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>>>>> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>>
>>>>> Sent: Wednesday, August 24, 2016 12:14:26 PM
>>>>> To: Open MPI Users
>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>> 
>>>>> Afraid I can’t replicate a problem at all, whether rank=0 is local or 
>>>>> not. I’m also using bash, but on CentOS-7, so I suspect the OS is the 
>>>>> difference.
>>>>> 
>>>>> Can you configure OMPI with --enable-debug, and then run the test again 
>>>>> with --mca iof_base_verbose 100? It will hopefully tell us something 
>>>>> about why the IO subsystem is stuck.
>>>>> 
>>>>> 
>>>>>> On Aug 24, 2016, at 8:46 AM, Jingchao Zhang <zh...@unl.edu 
>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>> 
>>>>>> Hi Ralph,
>>>>>> 
>>>>>> For our tests, rank 0 is always on the same node with mpirun. I just 
>>>>>> tested mpirun with -nolocal and it still hangs. 
>>>>>> 
>>>>>> Information on shell and OS
>>>>>> $ echo $0
>>>>>> -bash
>>>>>> 
>>>>>> $ lsb_release -a
>>>>>> LSB Version:    
>>>>>> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
>>>>>> Distributor ID: Scientific
>>>>>> Description:    Scientific Linux release 6.8 (Carbon)
>>>>>> Release:        6.8
>>>>>> Codename:       Carbon
>>>>>> 
>>>>>> $ uname -a
>>>>>> Linux login.crane.hcc.unl.edu <http://login.crane.hcc.unl.edu/> 
>>>>>> 2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 11:25:51 CDT 2016 x86_64 
>>>>>> x86_64 x86_64 GNU/Linux
>>>>>> 
>>>>>> 
>>>>>> Dr. Jingchao Zhang
>>>>>> Holland Computing Center
>>>>>> University of Nebraska-Lincoln
>>>>>> 402-472-6400
>>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of 
>>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>>>>>> <mailto:r...@open-mpi.org>>
>>>>>> Sent: Tuesday, August 23, 2016 8:14:48 PM
>>>>>> To: Open MPI Users
>>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>>> 
>>>>>> Hmmm...that’s a good point. Rank 0 and mpirun are always on the same 
>>>>>> node on my cluster. I’ll give it a try.
>>>>>> 
>>>>>> Jingchao: is rank 0 on the node with mpirun, or on a remote node?
>>>>>> 
>>>>>> 
>>>>>>> On Aug 23, 2016, at 5:58 PM, Gilles Gouaillardet <gil...@rist.or.jp 
>>>>>>> <mailto:gil...@rist.or.jp>> wrote:
>>>>>>> 
>>>>>>> Ralph,
>>>>>>> 
>>>>>>> did you run task 0 and mpirun on different nodes ?
>>>>>>> 
>>>>>>> i observed some random hangs, though i cannot blame openmpi 100% yet
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Gilles
>>>>>>> 
>>>>>>> On 8/24/2016 9:41 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> 
>>>>>>> wrote:
>>>>>>>> Very strange. I cannot reproduce it as I’m able to run any number of 
>>>>>>>> nodes and procs, pushing over 100Mbytes thru without any problem.
>>>>>>>> 
>>>>>>>> Which leads me to suspect that the issue here is with the tty 
>>>>>>>> interface. Can you tell me what shell and OS you are running?
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Aug 23, 2016, at 3:25 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>>>>> 
>>>>>>>>> Everything stuck at MPI_Init. For a test job with 2 nodes and 10 
>>>>>>>>> cores each node, I got the following
>>>>>>>>> 
>>>>>>>>> $ mpirun ./a.out < test.in
>>>>>>>>> Rank 2 has cleared MPI_Init
>>>>>>>>> Rank 4 has cleared MPI_Init
>>>>>>>>> Rank 7 has cleared MPI_Init
>>>>>>>>> Rank 8 has cleared MPI_Init
>>>>>>>>> Rank 0 has cleared MPI_Init
>>>>>>>>> Rank 5 has cleared MPI_Init
>>>>>>>>> Rank 6 has cleared MPI_Init
>>>>>>>>> Rank 9 has cleared MPI_Init
>>>>>>>>> Rank 1 has cleared MPI_Init
>>>>>>>>> Rank 16 has cleared MPI_Init
>>>>>>>>> Rank 19 has cleared MPI_Init
>>>>>>>>> Rank 10 has cleared MPI_Init
>>>>>>>>> Rank 11 has cleared MPI_Init
>>>>>>>>> Rank 12 has cleared MPI_Init
>>>>>>>>> Rank 13 has cleared MPI_Init
>>>>>>>>> Rank 14 has cleared MPI_Init
>>>>>>>>> Rank 15 has cleared MPI_Init
>>>>>>>>> Rank 17 has cleared MPI_Init
>>>>>>>>> Rank 18 has cleared MPI_Init
>>>>>>>>> Rank 3 has cleared MPI_Init
>>>>>>>>> 
>>>>>>>>> then it just hanged.
>>>>>>>>> 
>>>>>>>>> --Jingchao
>>>>>>>>> 
>>>>>>>>> Dr. Jingchao Zhang
>>>>>>>>> Holland Computing Center
>>>>>>>>> University of Nebraska-Lincoln
>>>>>>>>> 402-472-6400
>>>>>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of 
>>>>>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>>>>>>>>> <mailto:r...@open-mpi.org>>
>>>>>>>>> Sent: Tuesday, August 23, 2016 4:03:07 PM
>>>>>>>>> To: Open MPI Users
>>>>>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>>>>>> 
>>>>>>>>> The IO forwarding messages all flow over the Ethernet, so the type of 
>>>>>>>>> fabric is irrelevant. The number of procs involved would definitely 
>>>>>>>>> have an impact, but that might not be due to the IO forwarding 
>>>>>>>>> subsystem. We know we have flow control issues with collectives like 
>>>>>>>>> Bcast that don’t have built-in synchronization points. How many reads 
>>>>>>>>> were you able to do before it hung?
>>>>>>>>> 
>>>>>>>>> I was running it on my little test setup (2 nodes, using only a few 
>>>>>>>>> procs), but I’ll try scaling up and see what happens. I’ll also try 
>>>>>>>>> introducing some forced “syncs” on the Bcast and see if that solves 
>>>>>>>>> the issue.
>>>>>>>>> 
>>>>>>>>> Ralph
>>>>>>>>> 
>>>>>>>>>> On Aug 23, 2016, at 2:30 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Ralph,
>>>>>>>>>> 
>>>>>>>>>> I tested v2.0.1rc1 with your code but has the same issue. I also 
>>>>>>>>>> installed v2.0.1rc1 on a different cluster which has Mellanox QDR 
>>>>>>>>>> Infiniband and get the same result. For the tests you have done, how 
>>>>>>>>>> many cores and nodes did you use? I can trigger the problem by using 
>>>>>>>>>> multiple nodes and each node with more than 10 cores. 
>>>>>>>>>> 
>>>>>>>>>> Thank you for looking into this.
>>>>>>>>>> 
>>>>>>>>>> Dr. Jingchao Zhang
>>>>>>>>>> Holland Computing Center
>>>>>>>>>> University of Nebraska-Lincoln
>>>>>>>>>> 402-472-6400
>>>>>>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>>>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of 
>>>>>>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>>>>>>>>>> <mailto:r...@open-mpi.org>>
>>>>>>>>>> Sent: Monday, August 22, 2016 10:23:42 PM
>>>>>>>>>> To: Open MPI Users
>>>>>>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>>>>>>> 
>>>>>>>>>> FWIW: I just tested forwarding up to 100MBytes via stdin using the 
>>>>>>>>>> simple test shown below with OMPI v2.0.1rc1, and it worked fine. So 
>>>>>>>>>> I’d suggest upgrading when the official release comes out, or going 
>>>>>>>>>> ahead and at least testing 2.0.1rc1 on your machine. Or you can test 
>>>>>>>>>> this program with some input file and let me know if it works for 
>>>>>>>>>> you.
>>>>>>>>>> 
>>>>>>>>>> Ralph
>>>>>>>>>> 
>>>>>>>>>> #include <stdlib.h>
>>>>>>>>>> #include <stdio.h>
>>>>>>>>>> #include <string.h>
>>>>>>>>>> #include <stdbool.h>
>>>>>>>>>> #include <unistd.h>
>>>>>>>>>> #include <mpi.h>
>>>>>>>>>> 
>>>>>>>>>> #define ORTE_IOF_BASE_MSG_MAX   2048
>>>>>>>>>> 
>>>>>>>>>> int main(int argc, char *argv[])
>>>>>>>>>> {
>>>>>>>>>>    int i, rank, size, next, prev, tag = 201;
>>>>>>>>>>    int pos, msgsize, nbytes;
>>>>>>>>>>    bool done;
>>>>>>>>>>    char *msg;
>>>>>>>>>> 
>>>>>>>>>>    MPI_Init(&argc, &argv);
>>>>>>>>>>    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>>>>>>>    MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>>>>>>> 
>>>>>>>>>>    fprintf(stderr, "Rank %d has cleared MPI_Init\n", rank);
>>>>>>>>>> 
>>>>>>>>>>    next = (rank + 1) % size;
>>>>>>>>>>    prev = (rank + size - 1) % size;
>>>>>>>>>>    msg = malloc(ORTE_IOF_BASE_MSG_MAX);
>>>>>>>>>>    pos = 0;
>>>>>>>>>>    nbytes = 0;
>>>>>>>>>> 
>>>>>>>>>>    if (0 == rank) {
>>>>>>>>>>        while (0 != (msgsize = read(0, msg, ORTE_IOF_BASE_MSG_MAX))) {
>>>>>>>>>>            fprintf(stderr, "Rank %d: sending blob %d\n", rank, pos);
>>>>>>>>>>            if (msgsize > 0) {
>>>>>>>>>>                MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
>>>>>>>>>> MPI_COMM_WORLD);
>>>>>>>>>>            }
>>>>>>>>>>            ++pos;
>>>>>>>>>>            nbytes += msgsize;
>>>>>>>>>>        }
>>>>>>>>>>        fprintf(stderr, "Rank %d: sending termination blob %d\n", 
>>>>>>>>>> rank, pos);
>>>>>>>>>>        memset(msg, 0, ORTE_IOF_BASE_MSG_MAX);
>>>>>>>>>>        MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
>>>>>>>>>> MPI_COMM_WORLD);
>>>>>>>>>>        MPI_Barrier(MPI_COMM_WORLD);
>>>>>>>>>>    } else {
>>>>>>>>>>        while (1) {
>>>>>>>>>>            MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
>>>>>>>>>> MPI_COMM_WORLD);
>>>>>>>>>>            fprintf(stderr, "Rank %d: recvd blob %d\n", rank, pos);
>>>>>>>>>>            ++pos;
>>>>>>>>>>            done = true;
>>>>>>>>>>            for (i=0; i < ORTE_IOF_BASE_MSG_MAX; i++) {
>>>>>>>>>>                if (0 != msg[i]) {
>>>>>>>>>>                    done = false;
>>>>>>>>>>                    break;
>>>>>>>>>>                }
>>>>>>>>>>            }
>>>>>>>>>>            if (done) {
>>>>>>>>>>                break;
>>>>>>>>>>            }
>>>>>>>>>>        }
>>>>>>>>>>        fprintf(stderr, "Rank %d: recv done\n", rank);
>>>>>>>>>>        MPI_Barrier(MPI_COMM_WORLD);
>>>>>>>>>>    }
>>>>>>>>>> 
>>>>>>>>>>    fprintf(stderr, "Rank %d has completed bcast\n", rank);
>>>>>>>>>>    MPI_Finalize();
>>>>>>>>>>    return 0;
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Aug 22, 2016, at 3:40 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> This might be a thin argument but we have many users running mpirun 
>>>>>>>>>>> in this way for years with no problem until this recent upgrade. 
>>>>>>>>>>> And some home-brewed mpi codes do not even have a standard way to 
>>>>>>>>>>> read the input files. Last time I checked, the openmpi manual still 
>>>>>>>>>>> claims it supports stdin 
>>>>>>>>>>> (https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14 
>>>>>>>>>>> <https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14>). 
>>>>>>>>>>> Maybe I missed it but the v2.0 release notes did not mention any 
>>>>>>>>>>> changes to the behaviors of stdin as well.
>>>>>>>>>>> 
>>>>>>>>>>> We can tell our users to run mpirun in the suggested way, but I do 
>>>>>>>>>>> hope someone can look into the issue and fix it.
>>>>>>>>>>> 
>>>>>>>>>>> Dr. Jingchao Zhang
>>>>>>>>>>> Holland Computing Center
>>>>>>>>>>> University of Nebraska-Lincoln
>>>>>>>>>>> 402-472-6400
>>>>>>>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>>>>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of 
>>>>>>>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>>>>>>>>>>> <mailto:r...@open-mpi.org>>
>>>>>>>>>>> Sent: Monday, August 22, 2016 3:04:50 PM
>>>>>>>>>>> To: Open MPI Users
>>>>>>>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>>>>>>>> 
>>>>>>>>>>> Well, I can try to find time to take a look. However, I will 
>>>>>>>>>>> reiterate what Jeff H said - it is very unwise to rely on IO 
>>>>>>>>>>> forwarding. Much better to just directly read the file unless that 
>>>>>>>>>>> file is simply unavailable on the node where rank=0 is running.
>>>>>>>>>>> 
>>>>>>>>>>>> On Aug 22, 2016, at 1:55 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>>>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Here you can find the source code for lammps input 
>>>>>>>>>>>> https://github.com/lammps/lammps/blob/r13864/src/input.cpp 
>>>>>>>>>>>> <https://github.com/lammps/lammps/blob/r13864/src/input.cpp>
>>>>>>>>>>>> 
>>>>>>>>>>>> Based on the gdb output, rank 0 stuck at line 167
>>>>>>>>>>>> if
>>>>>>>>>>>> 
>>>>>>>>>>>> (
>>>>>>>>>>>> fgets
>>>>>>>>>>>> (&line[m],maxline-m,infile)
>>>>>>>>>>>> == 
>>>>>>>>>>>> NULL)
>>>>>>>>>>>> and the rest threads stuck at line 203
>>>>>>>>>>>> MPI_Bcast(&n,1,MPI_INT,0,world);
>>>>>>>>>>>> 
>>>>>>>>>>>> So rank 0 possibly hangs on the fgets() function.
>>>>>>>>>>>> 
>>>>>>>>>>>> Here are the whole backtrace information:
>>>>>>>>>>>> $ cat master.backtrace worker.backtrace
>>>>>>>>>>>> #0  0x0000003c37cdb68d in read () from /lib64/libc.so.6
>>>>>>>>>>>> #1  0x0000003c37c71ca8 in _IO_new_file_underflow () from 
>>>>>>>>>>>> /lib64/libc.so.6
>>>>>>>>>>>> #2  0x0000003c37c737ae in _IO_default_uflow_internal () from 
>>>>>>>>>>>> /lib64/libc.so.6
>>>>>>>>>>>> #3  0x0000003c37c67e8a in _IO_getline_info_internal () from 
>>>>>>>>>>>> /lib64/libc.so.6
>>>>>>>>>>>> #4  0x0000003c37c66ce9 in fgets () from /lib64/libc.so.6
>>>>>>>>>>>> #5  0x00000000005c5a43 in LAMMPS_NS::Input::file() () at 
>>>>>>>>>>>> ../input.cpp:167
>>>>>>>>>>>> #6  0x00000000005d4236 in main () at ../main.cpp:31
>>>>>>>>>>>> #0  0x00002b1635d2ace2 in poll_dispatch () from 
>>>>>>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20
>>>>>>>>>>>> #1  0x00002b1635d1fa71 in opal_libevent2022_event_base_loop ()
>>>>>>>>>>>>   from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20
>>>>>>>>>>>> #2  0x00002b1635ce4634 in opal_progress () from 
>>>>>>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20
>>>>>>>>>>>> #3  0x00002b16351b8fad in ompi_request_default_wait () from 
>>>>>>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
>>>>>>>>>>>> #4  0x00002b16351fcb40 in ompi_coll_base_bcast_intra_generic ()
>>>>>>>>>>>>   from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
>>>>>>>>>>>> #5  0x00002b16351fd0c2 in ompi_coll_base_bcast_intra_binomial ()
>>>>>>>>>>>>   from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
>>>>>>>>>>>> #6  0x00002b1644fa6d9b in ompi_coll_tuned_bcast_intra_dec_fixed ()
>>>>>>>>>>>>   from 
>>>>>>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/openmpi/mca_coll_tuned.so
>>>>>>>>>>>> #7  0x00002b16351cb4fb in PMPI_Bcast () from 
>>>>>>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
>>>>>>>>>>>> #8  0x00000000005c5b5d in LAMMPS_NS::Input::file() () at 
>>>>>>>>>>>> ../input.cpp:203
>>>>>>>>>>>> #9  0x00000000005d4236 in main () at ../main.cpp:31
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> 
>>>>>>>>>>>> Dr. Jingchao Zhang
>>>>>>>>>>>> Holland Computing Center
>>>>>>>>>>>> University of Nebraska-Lincoln
>>>>>>>>>>>> 402-472-6400
>>>>>>>>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>>>>>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of 
>>>>>>>>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>>>>>>>>>>>> <mailto:r...@open-mpi.org>>
>>>>>>>>>>>> Sent: Monday, August 22, 2016 2:17:10 PM
>>>>>>>>>>>> To: Open MPI Users
>>>>>>>>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>>>>>>>>> 
>>>>>>>>>>>> Hmmm...perhaps we can break this out a bit? The stdin will be 
>>>>>>>>>>>> going to your rank=0 proc. It sounds like you have some subsequent 
>>>>>>>>>>>> step that calls MPI_Bcast?
>>>>>>>>>>>> 
>>>>>>>>>>>> Can you first verify that the input is being correctly delivered 
>>>>>>>>>>>> to rank=0? This will help us isolate if the problem is in the IO 
>>>>>>>>>>>> forwarding, or in the subsequent Bcast.
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Aug 22, 2016, at 1:11 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>>>>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We compiled openmpi/2.0.0 with gcc/6.1.0 and intel/13.1.3. Both 
>>>>>>>>>>>>> of them have odd behaviors when trying to read from standard 
>>>>>>>>>>>>> input.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For example, if we start the application lammps across 4 nodes, 
>>>>>>>>>>>>> each node 16 cores, connected by Intel QDR Infiniband, mpirun 
>>>>>>>>>>>>> works fine for the 1st time, but always stuck in a few seconds 
>>>>>>>>>>>>> thereafter.
>>>>>>>>>>>>> Command:
>>>>>>>>>>>>> mpirun ./lmp_ompi_g++ < in.snr
>>>>>>>>>>>>> in.snr is the Lammps input file. compiler is gcc/6.1.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Instead, if we use
>>>>>>>>>>>>> mpirun ./lmp_ompi_g++ -in in.snr
>>>>>>>>>>>>> it works 100%.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Some odd behaviors we gathered so far. 
>>>>>>>>>>>>> 1. For 1 node job, stdin always works.
>>>>>>>>>>>>> 2. For multiple nodes, stdin works unstably when the number of 
>>>>>>>>>>>>> cores per node are relatively small. For example, for 2/3/4 
>>>>>>>>>>>>> nodes, each node 8 cores, mpirun works most of the time. But for 
>>>>>>>>>>>>> each node with >8 cores, mpirun works the 1st time, then always 
>>>>>>>>>>>>> stuck. There seems to be a magic number when it stops working.
>>>>>>>>>>>>> 3. We tested Quantum Expresso with compiler intel/13 and had the 
>>>>>>>>>>>>> same issue. 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We used gdb to debug and found when mpirun was stuck, the rest of 
>>>>>>>>>>>>> the processes were all waiting on mpi broadcast from the master 
>>>>>>>>>>>>> thread. The lammps binary, input file and gdb core files 
>>>>>>>>>>>>> (example.tar.bz2) can be downloaded from this link 
>>>>>>>>>>>>> https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc 
>>>>>>>>>>>>> <https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Extra information:
>>>>>>>>>>>>> 1. Job scheduler is slurm.
>>>>>>>>>>>>> 2. configure setup:
>>>>>>>>>>>>> ./configure     --prefix=$PREFIX \
>>>>>>>>>>>>>                --with-hwloc=internal \
>>>>>>>>>>>>>                --enable-mpirun-prefix-by-default \
>>>>>>>>>>>>>                --with-slurm \
>>>>>>>>>>>>>                --with-verbs \
>>>>>>>>>>>>>                --with-psm \
>>>>>>>>>>>>>                --disable-openib-connectx-xrc \
>>>>>>>>>>>>>                --with-knem=/opt/knem-1.1.2.90mlnx1 \
>>>>>>>>>>>>>                --with-cma
>>>>>>>>>>>>> 3. openmpi-mca-params.conf file 
>>>>>>>>>>>>> orte_hetero_nodes=1
>>>>>>>>>>>>> hwloc_base_binding_policy=core
>>>>>>>>>>>>> rmaps_base_mapping_policy=core
>>>>>>>>>>>>> opal_cuda_support=0
>>>>>>>>>>>>> btl_openib_use_eager_rdma=0
>>>>>>>>>>>>> btl_openib_max_eager_rdma=0
>>>>>>>>>>>>> btl_openib_flags=1
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Jingchao 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Dr. Jingchao Zhang
>>>>>>>>>>>>> Holland Computing Center
>>>>>>>>>>>>> University of Nebraska-Lincoln
>>>>>>>>>>>>> 402-472-6400
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> 
>>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>> 
>>>>> <debug_info.txt>_______________________________________________
>>>>> users mailing list
>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com <mailto:jsquy...@cisco.com>
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] stdin issue with openmpi/2.0.0

Reply via email to