Re: [OMPI users] Possible bug in MPI_Barrier() ?

Gilles Gouaillardet Mon, 18 Apr 2016 00:47:38 -0400 (EDT)

here is your stack trace

#6  0x00007f72a0d09cd5 in mca_pml_ob1_send (buf=0x7fff81057db0, count=4,
    datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1,

sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x601280<ompi_mpi_comm_world>)


at line 251

that would be line 259 in current master, and this file was updated 21days ago

and that suggests your master is not quite up to date.

even if the message is sent eagerly, the ob1 pml does use an internalrequest it will wait for.


btw, did you configure with --enable-mpi-thread-multiple ?
did you configure with --enable-mpirun-prefix-by-default ?
did you configure with --disable-dlopen ?

at first, i d recommend you download a tarball fromhttps://www.open-mpi.org/nightly/master,

configure && make && make install
using a new install dir, and check if the issue is still here or not.

there could be some side effects if some old modules were not removedand/or if you are

not using the modules you expect.

/* when it hangs, you can pmap <pid> and check the path of the openmpilibraries are the one you expect */


what if you do not send/recv but invoke MPI_Barrier multiple times ?
what if you send/recv a one byte message instead ?
did you double check there is no firewall running on your nodes ?

Cheers,

Gilles





On 4/18/2016 1:06 PM, dpchoudh . wrote:

Thank you for your suggestion, Ralph. But it did not make any difference.

Let me say that my code is about a week stale. I just did a git pulland am building it right now. The build takes quite a bit of time, soI avoid doing that unless there is a reason. But what I am trying outis the most basic functionality, so I'd think a week or so of lagwould not make a difference.

Does the stack trace suggest something to you? It seems that the sendhangs; but a 4 byte send should be sent eagerly.


Best regards
'Durga

1% of the executables have 99% of CPU privilege!
Userspace code! Unite!! Occupy the kernel!!!

On Sun, Apr 17, 2016 at 11:55 PM, Ralph Castain <r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:


    Try adding -mca oob_tcp_if_include eno1 to your cmd line and see
    if that makes a difference

    On Apr 17, 2016, at 8:43 PM, dpchoudh . <dpcho...@gmail.com
    <mailto:dpcho...@gmail.com>> wrote:

    Hello Gilles and all

    I am sorry to be bugging the developers, but this issue seems to
    be nagging me, and I am surprised it does not seem to affect
    anybody else. But then again, I am using the master branch, and
    most users are probably using a released version.

    This time I am using a totally different cluster. This has NO
    verbs capable interface; just 2 Ethernet (1 of which has no IP
    address and hence is unusable) plus 1 proprietary interface that
    currently supports only IP traffic. The two IP interfaces
    (Ethernet and proprietary) are on different IP subnets.

    My test program is as follows:

    #include <stdio.h>
    #include <string.h>
    #include "mpi.h"
    int main(int argc, char *argv[])
    {
    char host[128];
    int n;
    MPI_Init(&argc, &argv);
    MPI_Get_processor_name(host, &n);
    printf("Hello from %s\n", host);
    MPI_Comm_size(MPI_COMM_WORLD, &n);
    printf("The world has %d nodes\n", n);
    MPI_Comm_rank(MPI_COMM_WORLD, &n);
    printf("My rank is %d\n",n);
    //#if 0
    if (n == 0)
    {
    strcpy(host, "ha!");
    MPI_Send(host, strlen(host) + 1, MPI_CHAR, 1, 1, MPI_COMM_WORLD);
    printf("sent %s\n", host);
    }
    else
    {
    //int len = strlen(host) + 1;
    bzero(host, 128);
    MPI_Recv(host,  4, MPI_CHAR, 0, 1, MPI_COMM_WORLD,
    MPI_STATUS_IGNORE);
    printf("Received %s from rank 0\n", host);
    }
    //#endif
    MPI_Finalize();
    return 0;
    }

    This program, when run between two nodes, hangs. The command was:
    [durga@b-1 ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl
    self,tcp -mca pml ob1 -mca btl_tcp_if_include eno1 ./mpitest

    And the hang is with the following output: (eno1 is one of the
    gigEth interfaces, that takes OOB traffic as well)

    Hello from b-1
    The world has 2 nodes
    My rank is 0
    Hello from b-2
    The world has 2 nodes
    My rank is 1

    Note that if I uncomment the #if 0 - #endif (i.e. comment out the
    MPI_Send()/MPI_Recv() part, the program runs to completion. Also
    note that the printfs following MPI_Send()/MPI_Recv() do not show
    up on console.

    Upon attaching gdb, the stack trace from the master node is as
    follows:

    Missing separate debuginfos, use: debuginfo-install
    glibc-2.17-78.el7.x86_64 libpciaccess-0.13.4-2.el7.x86_64
    (gdb) bt
    #0  0x00007f72a533eb7d in poll () from /lib64/libc.so.6
    #1  0x00007f72a4cb7146 in poll_dispatch (base=0xee33d0,
    tv=0x7fff81057b70)
        at poll.c:165
    #2  0x00007f72a4caede0 in opal_libevent2022_event_base_loop
    (base=0xee33d0,
        flags=2) at event.c:1630
    #3  0x00007f72a4c4e692 in opal_progress () at
    runtime/opal_progress.c:171
    #4  0x00007f72a0d07ac1 in opal_condition_wait (
        c=0x7f72a5bb1e00 <ompi_request_cond>, m=0x7f72a5bb1d80
    <ompi_request_lock>)
        at ../../../../opal/threads/condition.h:76
    #5  0x00007f72a0d07ca2 in ompi_request_wait_completion
    (req=0x113eb80)
        at ../../../../ompi/request/request.h:383
    #6  0x00007f72a0d09cd5 in mca_pml_ob1_send (buf=0x7fff81057db0,
    count=4,
        datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1,
    sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x601280
    <ompi_mpi_comm_world>)
        at pml_ob1_isend.c:251
    #7  0x00007f72a58d6be3 in PMPI_Send (buf=0x7fff81057db0, count=4,
        type=0x601080 <ompi_mpi_char>, dest=1, tag=1,
        comm=0x601280 <ompi_mpi_comm_world>) at psend.c:78
    #8  0x0000000000400afa in main (argc=1, argv=0x7fff81057f18) at
    mpitest.c:19
    (gdb)

    And the backtrace on the non-master node is:

    (gdb) bt
    #0  0x00007ff3b377e48d in nanosleep () from /lib64/libc.so.6
    #1  0x00007ff3b37af014 in usleep () from /lib64/libc.so.6
    #2  0x00007ff3b0c922de in OPAL_PMIX_PMIX120_PMIx_Fence
    (procs=0x0, nprocs=0,
        info=0x0, ninfo=0) at src/client/pmix_client_fence.c:100
    #3  0x00007ff3b0c5f1a6 in pmix120_fence (procs=0x0, collect_data=0)
        at pmix120_client.c:258
    #4  0x00007ff3b3cf8f4b in ompi_mpi_finalize ()
        at runtime/ompi_mpi_finalize.c:242
    #5  0x00007ff3b3d23295 in PMPI_Finalize () at pfinalize.c:47
    #6  0x0000000000400958 in main (argc=1, argv=0x7fff785e8788) at
    mpitest.c:30
    (gdb)

    The hostfile is as follows:

    [durga@b-1 ~]$ cat hostfile
    10.4.70.10 slots=1
    10.4.70.11 slots=1
    #10.4.70.12 slots=1

    And the ifconfig output from the master node is as follows (the
    other node is similar; all the IP interfaces are in their
    respective subnets) :

    [durga@b-1 ~]$ ifconfig
    eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
            inet 10.4.70.10  netmask 255.255.255.0  broadcast 10.4.70.255
            inet6 fe80::21e:c9ff:fefe:13df  prefixlen 64  scopeid
    0x20<link>
            ether 00:1e:c9:fe:13:df txqueuelen 1000  (Ethernet)
            RX packets 48215  bytes 27842846 (26.5 MiB)
            RX errors 0  dropped 0 overruns 0  frame 0
            TX packets 52746  bytes 7817568 (7.4 MiB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
            device interrupt 16

    eno2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
            ether 00:1e:c9:fe:13:e0 txqueuelen 1000  (Ethernet)
            RX packets 0  bytes 0 (0.0 B)
            RX errors 0  dropped 0 overruns 0  frame 0
            TX packets 0  bytes 0 (0.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
            device interrupt 17

    lf0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2016
            inet 192.168.1.2  netmask 255.255.255.0  broadcast
    192.168.1.255
            inet6 fe80::3002:ff:fe33:3333  prefixlen 64  scopeid
    0x20<link>
            ether 32:02:00:33:33:33 txqueuelen 1000  (Ethernet)
            RX packets 10  bytes 512 (512.0 B)
            RX errors 0  dropped 0 overruns 0  frame 0
            TX packets 22  bytes 1536 (1.5 KiB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
            inet 127.0.0.1  netmask 255.0.0.0
            inet6 ::1  prefixlen 128 scopeid 0x10<host>
            loop  txqueuelen 0  (Local Loopback)
            RX packets 26  bytes 1378 (1.3 KiB)
            RX errors 0  dropped 0 overruns 0  frame 0
            TX packets 26  bytes 1378 (1.3 KiB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

    Please help me with this. I am stuck with the TCP transport,
    which is the most basic of all transports.

    Thanks in advance
    Durga


    1% of the executables have 99% of CPU privilege!
    Userspace code! Unite!! Occupy the kernel!!!

    On Tue, Apr 12, 2016 at 9:32 PM, Gilles Gouaillardet
    <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:

        This is quite unlikely, and fwiw, your test program works for me.

        i suggest you check your 3 TCP networks are usable, for example

        $ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca
        pml ob1 --mca btl_tcp_if_include xxx ./mpitest

        in which xxx is a [list of] interface name :
        eth0
        eth1
        ib0
        eth0,eth1
        eth0,ib0
        ...
        eth0,eth1,ib0

        and see where problem start occuring.

        btw, are your 3 interfaces in 3 different subnet ? is routing
        required between two interfaces of the same type ?

        Cheers,

        Gilles

        On 4/13/2016 7:15 AM, dpchoudh . wrote:

        Hi all

        I have reported this issue before, but then had brushed it
        off as something that was caused by my modifications to the
        source tree. It looks like that is not the case.

        Just now, I did the following:

        1. Cloned a fresh copy from master.
        2. Configured with the following flags, built and installed
        it in my two-node "cluster".
        --enable-debug --enable-debug-symbols --disable-dlopen
        3. Compiled the following program, mpitest.c with these
        flags: -g3 -Wall -Wextra
        4. Ran it like this:
        [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca
        btl self,tcp -mca pml ob1 ./mpitest

        With this, the code hangs at MPI_Barrier() on both nodes,
        after generating the following output:

        Hello world from processor smallMPI, rank 0 out of 2 processors
        Hello world from processor bigMPI, rank 1 out of 2 processors
        smallMPI sent haha!
        bigMPI received haha!
        <Hangs until killed by ^C>
        Attaching to the hung process at one node gives the
        following backtrace:

        (gdb) bt
        #0  0x00007f55b0f41c3d in poll () from /lib64/libc.so.6
        #1  0x00007f55b03ccde6 in poll_dispatch (base=0x70e7b0,
        tv=0x7ffd1bb551c0) at poll.c:165
        #2  0x00007f55b03c4a90 in opal_libevent2022_event_base_loop
        (base=0x70e7b0, flags=2) at event.c:1630
        #3  0x00007f55b02f0144 in opal_progress () at
        runtime/opal_progress.c:171
        #4  0x00007f55b14b4d8b in opal_condition_wait
        (c=0x7f55b19fec40 <ompi_request_cond>, m=0x7f55b19febc0
        <ompi_request_lock>) at ../opal/threads/condition.h:76
        #5  0x00007f55b14b531b in ompi_request_default_wait_all
        (count=2, requests=0x7ffd1bb55370, statuses=0x7ffd1bb55340)
        at request/req_wait.c:287
        #6  0x00007f55b157a225 in ompi_coll_base_sendrecv_zero
        (dest=1, stag=-16, source=1, rtag=-16, comm=0x601280
        <ompi_mpi_comm_world>)
            at base/coll_base_barrier.c:63
        #7  0x00007f55b157a92a in
        ompi_coll_base_barrier_intra_two_procs (comm=0x601280
        <ompi_mpi_comm_world>, module=0x7c2630) at
        base/coll_base_barrier.c:308
        #8  0x00007f55b15aafec in
        ompi_coll_tuned_barrier_intra_dec_fixed (comm=0x601280
        <ompi_mpi_comm_world>, module=0x7c2630) at
        coll_tuned_decision_fixed.c:196
        #9  0x00007f55b14d36fd in PMPI_Barrier (comm=0x601280
        <ompi_mpi_comm_world>) at pbarrier.c:63
        #10 0x0000000000400b0b in main (argc=1, argv=0x7ffd1bb55658)
        at mpitest.c:26
        (gdb)

        Thinking that this might be a bug in tuned collectives,
        since that is what the stack shows, I ran the program like
        this (basically adding the ^tuned part)

        [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca
        btl self,tcp -mca pml ob1 -mca coll ^tuned ./mpitest

        It still hangs, but now with a different stack trace:
        (gdb) bt
        #0  0x00007f910d38ac3d in poll () from /lib64/libc.so.6
        #1  0x00007f910c815de6 in poll_dispatch (base=0x1a317b0,
        tv=0x7fff43ee3610) at poll.c:165
        #2  0x00007f910c80da90 in opal_libevent2022_event_base_loop
        (base=0x1a317b0, flags=2) at event.c:1630
        #3  0x00007f910c739144 in opal_progress () at
        runtime/opal_progress.c:171
        #4  0x00007f910db130f7 in opal_condition_wait
        (c=0x7f910de47c40 <ompi_request_cond>, m=0x7f910de47bc0
        <ompi_request_lock>)
            at ../../../../opal/threads/condition.h:76
        #5  0x00007f910db132d8 in ompi_request_wait_completion
        (req=0x1b07680) at ../../../../ompi/request/request.h:383
        #6  0x00007f910db1533b in mca_pml_ob1_send (buf=0x0,
        count=0, datatype=0x7f910de1e340 <ompi_mpi_byte>, dst=1,
        tag=-16, sendmode=MCA_PML_BASE_SEND_STANDARD,
            comm=0x601280 <ompi_mpi_comm_world>) at pml_ob1_isend.c:259
        #7  0x00007f910d9c3b38 in
        ompi_coll_base_barrier_intra_basic_linear (comm=0x601280
        <ompi_mpi_comm_world>, module=0x1b092c0) at
        base/coll_base_barrier.c:368
        #8  0x00007f910d91c6fd in PMPI_Barrier (comm=0x601280
        <ompi_mpi_comm_world>) at pbarrier.c:63
        #9  0x0000000000400b0b in main (argc=1, argv=0x7fff43ee3a58)
        at mpitest.c:26
        (gdb)

        The mpitest.c program is as follows:
        #include <mpi.h>
        #include <stdio.h>
        #include <string.h>

        int main(int argc, char** argv)
        {
            int world_size, world_rank, name_len;
            char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];

        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &world_size);
        MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
        MPI_Get_processor_name(hostname, &name_len);
            printf("Hello world from processor %s, rank %d out of %d
        processors\n", hostname, world_rank, world_size);
            if (world_rank == 1)
            {
            MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD,
        MPI_STATUS_IGNORE);
            printf("%s received %s\n", hostname, buf);
            }
            else
            {
            strcpy(buf, "haha!");
            MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
            printf("%s sent %s\n", hostname, buf);
            }
        MPI_Barrier(MPI_COMM_WORLD);
            MPI_Finalize();
            return 0;
        }

        The hostfile is as follows:
        10.10.10.10 slots=1
        10.10.10.11 slots=1

        The two nodes are connected by three physical and 3 logical
        networks:
        Physical: Gigabit Ethernet, 10G iWARP, 20G Infiniband
        Logical: IP (all 3), PSM (Qlogic Infiniband), Verbs (iWARP
        and Infiniband)

        Please note again that this is a fresh, brand new clone.

        Is this a bug (perhaps a side effect of --disable-dlopen) or
        something I am doing wrong?

        Thanks
        Durga

        We learn from history that we never learn from history.


        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this 
post:http://www.open-mpi.org/community/lists/users/2016/04/28930.php



        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post:
        http://www.open-mpi.org/community/lists/users/2016/04/28932.php


    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this post:
    http://www.open-mpi.org/community/lists/users/2016/04/28942.php



    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this post:
    http://www.open-mpi.org/community/lists/users/2016/04/28943.php




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/04/28944.php

Re: [OMPI users] Possible bug in MPI_Barrier() ?

Reply via email to