Re: [OMPI users] Possible bug in MPI_Barrier() ?

dpchoudh . Mon, 18 Apr 2016 01:12:58 -0400 (EDT)

Hello Gilles

Thank you very much for your feedback. You are right that my original stack
trace was on code that was several weeks behind, but updating it just now
did not seem to make a difference: I am copying the stack from the latest
code below:


On the master node:

(gdb) bt
#0  0x00007fc0524cbb7d in poll () from /lib64/libc.so.6
#1  0x00007fc051e53116 in poll_dispatch (base=0x1aabbe0, tv=0x7fff29fcb240)
at poll.c:165
#2  0x00007fc051e4adb0 in opal_libevent2022_event_base_loop
(base=0x1aabbe0, flags=2) at event.c:1630
#3  0x00007fc051de9a00 in opal_progress () at runtime/opal_progress.c:171
#4  0x00007fc04ce46b0b in opal_condition_wait (c=0x7fc052d3cde0
<ompi_request_cond>,
    m=0x7fc052d3cd60 <ompi_request_lock>) at
../../../../opal/threads/condition.h:76
#5  0x00007fc04ce46cec in ompi_request_wait_completion (req=0x1b7b580)
    at ../../../../ompi/request/request.h:383
#6  0x00007fc04ce48d4f in mca_pml_ob1_send (buf=0x7fff29fcb480, count=4,
    datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1,
sendmode=MCA_PML_BASE_SEND_STANDARD,
    comm=0x601280 <ompi_mpi_comm_world>) at pml_ob1_isend.c:259
#7  0x00007fc052a62d73 in PMPI_Send (buf=0x7fff29fcb480, count=4,
type=0x601080 <ompi_mpi_char>, dest=1,
    tag=1, comm=0x601280 <ompi_mpi_comm_world>) at psend.c:78
#8  0x0000000000400afa in main (argc=1, argv=0x7fff29fcb5e8) at mpitest.c:19
(gdb)

And on the non-master node

(gdb) bt
#0  0x00007fad2c32148d in nanosleep () from /lib64/libc.so.6
#1  0x00007fad2c352014 in usleep () from /lib64/libc.so.6
#2  0x00007fad296412de in OPAL_PMIX_PMIX120_PMIx_Fence (procs=0x0,
nprocs=0, info=0x0, ninfo=0)
    at src/client/pmix_client_fence.c:100
#3  0x00007fad2960e1a6 in pmix120_fence (procs=0x0, collect_data=0) at
pmix120_client.c:258
#4  0x00007fad2c89b2da in ompi_mpi_finalize () at
runtime/ompi_mpi_finalize.c:242
#5  0x00007fad2c8c5849 in PMPI_Finalize () at pfinalize.c:47
#6  0x0000000000400958 in main (argc=1, argv=0x7fff163879c8) at mpitest.c:30
(gdb)

And my configuration was done as follows:

 $ ./configure --enable-debug --enable-debug-symbols

I double checked to ensure that there is not an older installation of
OpenMPI that is getting mixed up with the master branch.
sudo yum list installed | grep -i mpi
shows nothing on both nodes, and pmap -p <pid> shows that all the libraries
are coming from /usr/local/lib, which seems to be correct. I am also quite
sure about the firewall issue (that there is none). I will try out your
suggestion on installing from a tarball and see how it goes.

Thanks
Durga

1% of the executables have 99% of CPU privilege!
Userspace code! Unite!! Occupy the kernel!!!

On Mon, Apr 18, 2016 at 12:47 AM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> here is your stack trace
>
> #6  0x00007f72a0d09cd5 in mca_pml_ob1_send (buf=0x7fff81057db0, count=4,
>     datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1,
>     sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x601280
> <ompi_mpi_comm_world>)
>
> at line 251
>
>
> that would be line 259 in current master, and this file was updated 21
> days ago
> and that suggests your master is not quite up to date.
>
> even if the message is sent eagerly, the ob1 pml does use an internal
> request it will wait for.
>
> btw, did you configure with --enable-mpi-thread-multiple ?
> did you configure with --enable-mpirun-prefix-by-default ?
> did you configure with --disable-dlopen ?
>
> at first, i d recommend you download a tarball from
> https://www.open-mpi.org/nightly/master,
> configure && make && make install
> using a new install dir, and check if the issue is still here or not.
>
> there could be some side effects if some old modules were not removed
> and/or if you are
> not using the modules you expect.
> /* when it hangs, you can pmap <pid> and check the path of the openmpi
> libraries are the one you expect */
>
> what if you do not send/recv but invoke MPI_Barrier multiple times ?
> what if you send/recv a one byte message instead ?
> did you double check there is no firewall running on your nodes ?
>
> Cheers,
>
> Gilles
>
>
>
>
>
>
> On 4/18/2016 1:06 PM, dpchoudh . wrote:
>
> Thank you for your suggestion, Ralph. But it did not make any difference.
>
> Let me say that my code is about a week stale. I just did a git pull and
> am building it right now. The build takes quite a bit of time, so I avoid
> doing that unless there is a reason. But what I am trying out is the most
> basic functionality, so I'd think a week or so of lag would not make a
> difference.
>
> Does the stack trace suggest something to you? It seems that the send
> hangs; but a 4 byte send should be sent eagerly.
>
> Best regards
> 'Durga
>
> 1% of the executables have 99% of CPU privilege!
> Userspace code! Unite!! Occupy the kernel!!!
>
> On Sun, Apr 17, 2016 at 11:55 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Try adding -mca oob_tcp_if_include eno1 to your cmd line and see if that
>> makes a difference
>>
>> On Apr 17, 2016, at 8:43 PM, dpchoudh . <dpcho...@gmail.com> wrote:
>>
>> Hello Gilles and all
>>
>> I am sorry to be bugging the developers, but this issue seems to be
>> nagging me, and I am surprised it does not seem to affect anybody else. But
>> then again, I am using the master branch, and most users are probably using
>> a released version.
>>
>> This time I am using a totally different cluster. This has NO verbs
>> capable interface; just 2 Ethernet (1 of which has no IP address and hence
>> is unusable) plus 1 proprietary interface that currently supports only IP
>> traffic. The two IP interfaces (Ethernet and proprietary) are on different
>> IP subnets.
>>
>> My test program is as follows:
>>
>> #include <stdio.h>
>> #include <string.h>
>> #include "mpi.h"
>> int main(int argc, char *argv[])
>> {
>> char host[128];
>> int n;
>> MPI_Init(&argc, &argv);
>> MPI_Get_processor_name(host, &n);
>> printf("Hello from %s\n", host);
>> MPI_Comm_size(MPI_COMM_WORLD, &n);
>> printf("The world has %d nodes\n", n);
>> MPI_Comm_rank(MPI_COMM_WORLD, &n);
>> printf("My rank is %d\n",n);
>> //#if 0
>> if (n == 0)
>> {
>> strcpy(host, "ha!");
>> MPI_Send(host, strlen(host) + 1, MPI_CHAR, 1, 1, MPI_COMM_WORLD);
>> printf("sent %s\n", host);
>> }
>> else
>> {
>> //int len = strlen(host) + 1;
>> bzero(host, 128);
>> MPI_Recv(host,  4, MPI_CHAR, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>> printf("Received %s from rank 0\n", host);
>> }
>> //#endif
>> MPI_Finalize();
>> return 0;
>> }
>>
>> This program, when run between two nodes, hangs. The command was:
>> [durga@b-1 ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca
>> pml ob1 -mca btl_tcp_if_include eno1 ./mpitest
>>
>> And the hang is with the following output: (eno1 is one of the gigEth
>> interfaces, that takes OOB traffic as well)
>>
>> Hello from b-1
>> The world has 2 nodes
>> My rank is 0
>> Hello from b-2
>> The world has 2 nodes
>> My rank is 1
>>
>> Note that if I uncomment the #if 0 - #endif (i.e. comment out the
>> MPI_Send()/MPI_Recv() part, the program runs to completion. Also note that
>> the printfs following MPI_Send()/MPI_Recv() do not show up on console.
>>
>> Upon attaching gdb, the stack trace from the master node is as follows:
>>
>> Missing separate debuginfos, use: debuginfo-install
>> glibc-2.17-78.el7.x86_64 libpciaccess-0.13.4-2.el7.x86_64
>> (gdb) bt
>> #0  0x00007f72a533eb7d in poll () from /lib64/libc.so.6
>> #1  0x00007f72a4cb7146 in poll_dispatch (base=0xee33d0, tv=0x7fff81057b70)
>>     at poll.c:165
>> #2  0x00007f72a4caede0 in opal_libevent2022_event_base_loop
>> (base=0xee33d0,
>>     flags=2) at event.c:1630
>> #3  0x00007f72a4c4e692 in opal_progress () at runtime/opal_progress.c:171
>> #4  0x00007f72a0d07ac1 in opal_condition_wait (
>>     c=0x7f72a5bb1e00 <ompi_request_cond>, m=0x7f72a5bb1d80
>> <ompi_request_lock>)
>>     at ../../../../opal/threads/condition.h:76
>> #5  0x00007f72a0d07ca2 in ompi_request_wait_completion (req=0x113eb80)
>>     at ../../../../ompi/request/request.h:383
>> #6  0x00007f72a0d09cd5 in mca_pml_ob1_send (buf=0x7fff81057db0, count=4,
>>     datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1,
>>     sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x601280
>> <ompi_mpi_comm_world>)
>>     at pml_ob1_isend.c:251
>> #7  0x00007f72a58d6be3 in PMPI_Send (buf=0x7fff81057db0, count=4,
>>     type=0x601080 <ompi_mpi_char>, dest=1, tag=1,
>>     comm=0x601280 <ompi_mpi_comm_world>) at psend.c:78
>> #8  0x0000000000400afa in main (argc=1, argv=0x7fff81057f18) at
>> mpitest.c:19
>> (gdb)
>>
>> And the backtrace on the non-master node is:
>>
>> (gdb) bt
>> #0  0x00007ff3b377e48d in nanosleep () from /lib64/libc.so.6
>> #1  0x00007ff3b37af014 in usleep () from /lib64/libc.so.6
>> #2  0x00007ff3b0c922de in OPAL_PMIX_PMIX120_PMIx_Fence (procs=0x0,
>> nprocs=0,
>>     info=0x0, ninfo=0) at src/client/pmix_client_fence.c:100
>> #3  0x00007ff3b0c5f1a6 in pmix120_fence (procs=0x0, collect_data=0)
>>     at pmix120_client.c:258
>> #4  0x00007ff3b3cf8f4b in ompi_mpi_finalize ()
>>     at runtime/ompi_mpi_finalize.c:242
>> #5  0x00007ff3b3d23295 in PMPI_Finalize () at pfinalize.c:47
>> #6  0x0000000000400958 in main (argc=1, argv=0x7fff785e8788) at
>> mpitest.c:30
>> (gdb)
>>
>> The hostfile is as follows:
>>
>> [durga@b-1 ~]$ cat hostfile
>> 10.4.70.10 slots=1
>> 10.4.70.11 slots=1
>> #10.4.70.12 slots=1
>>
>> And the ifconfig output from the master node is as follows (the other
>> node is similar; all the IP interfaces are in their respective subnets) :
>>
>> [durga@b-1 ~]$ ifconfig
>> eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>>         inet 10.4.70.10  netmask 255.255.255.0  broadcast 10.4.70.255
>>         inet6 fe80::21e:c9ff:fefe:13df  prefixlen 64  scopeid 0x20<link>
>>         ether 00:1e:c9:fe:13:df  txqueuelen 1000  (Ethernet)
>>         RX packets 48215  bytes 27842846 (26.5 MiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 52746  bytes 7817568 (7.4 MiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>         device interrupt 16
>>
>> eno2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>>         ether 00:1e:c9:fe:13:e0  txqueuelen 1000  (Ethernet)
>>         RX packets 0  bytes 0 (0.0 B)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 0  bytes 0 (0.0 B)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>         device interrupt 17
>>
>> lf0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 2016
>>         inet 192.168.1.2  netmask 255.255.255.0  broadcast 192.168.1.255
>>         inet6 fe80::3002:ff:fe33:3333  prefixlen 64  scopeid 0x20<link>
>>         ether 32:02:00:33:33:33  txqueuelen 1000  (Ethernet)
>>         RX packets 10  bytes 512 (512.0 B)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 22  bytes 1536 (1.5 KiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
>>         inet 127.0.0.1  netmask 255.0.0.0
>>         inet6 ::1  prefixlen 128  scopeid 0x10<host>
>>         loop  txqueuelen 0  (Local Loopback)
>>         RX packets 26  bytes 1378 (1.3 KiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 26  bytes 1378 (1.3 KiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> Please help me with this. I am stuck with the TCP transport, which is the
>> most basic of all transports.
>>
>> Thanks in advance
>> Durga
>>
>>
>> 1% of the executables have 99% of CPU privilege!
>> Userspace code! Unite!! Occupy the kernel!!!
>>
>> On Tue, Apr 12, 2016 at 9:32 PM, Gilles Gouaillardet <gil...@rist.or.jp>
>> wrote:
>>
>>> This is quite unlikely, and fwiw, your test program works for me.
>>>
>>> i suggest you check your 3 TCP networks are usable, for example
>>>
>>> $ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca pml ob1 --mca
>>> btl_tcp_if_include xxx ./mpitest
>>>
>>> in which xxx is a [list of] interface name :
>>> eth0
>>> eth1
>>> ib0
>>> eth0,eth1
>>> eth0,ib0
>>> ...
>>> eth0,eth1,ib0
>>>
>>> and see where problem start occuring.
>>>
>>> btw, are your 3 interfaces in 3 different subnet ? is routing required
>>> between two interfaces of the same type ?
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 4/13/2016 7:15 AM, dpchoudh . wrote:
>>>
>>> Hi all
>>>
>>> I have reported this issue before, but then had brushed it off as
>>> something that was caused by my modifications to the source tree. It looks
>>> like that is not the case.
>>>
>>> Just now, I did the following:
>>>
>>> 1. Cloned a fresh copy from master.
>>> 2. Configured with the following flags, built and installed it in my
>>> two-node "cluster".
>>> --enable-debug --enable-debug-symbols --disable-dlopen
>>> 3. Compiled the following program, mpitest.c with these flags: -g3 -Wall
>>> -Wextra
>>> 4. Ran it like this:
>>> [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp
>>> -mca pml ob1 ./mpitest
>>>
>>> With this, the code hangs at MPI_Barrier() on both nodes, after
>>> generating the following output:
>>>
>>> Hello world from processor smallMPI, rank 0 out of 2 processors
>>> Hello world from processor bigMPI, rank 1 out of 2 processors
>>> smallMPI sent haha!
>>> bigMPI received haha!
>>> <Hangs until killed by ^C>
>>> Attaching to the hung process at one node gives the following backtrace:
>>>
>>> (gdb) bt
>>> #0  0x00007f55b0f41c3d in poll () from /lib64/libc.so.6
>>> #1  0x00007f55b03ccde6 in poll_dispatch (base=0x70e7b0,
>>> tv=0x7ffd1bb551c0) at poll.c:165
>>> #2  0x00007f55b03c4a90 in opal_libevent2022_event_base_loop
>>> (base=0x70e7b0, flags=2) at event.c:1630
>>> #3  0x00007f55b02f0144 in opal_progress () at runtime/opal_progress.c:171
>>> #4  0x00007f55b14b4d8b in opal_condition_wait (c=0x7f55b19fec40
>>> <ompi_request_cond>, m=0x7f55b19febc0 <ompi_request_lock>) at
>>> ../opal/threads/condition.h:76
>>> #5  0x00007f55b14b531b in ompi_request_default_wait_all (count=2,
>>> requests=0x7ffd1bb55370, statuses=0x7ffd1bb55340) at request/req_wait.c:287
>>> #6  0x00007f55b157a225 in ompi_coll_base_sendrecv_zero (dest=1,
>>> stag=-16, source=1, rtag=-16, comm=0x601280 <ompi_mpi_comm_world>)
>>>     at base/coll_base_barrier.c:63
>>> #7  0x00007f55b157a92a in ompi_coll_base_barrier_intra_two_procs
>>> (comm=0x601280 <ompi_mpi_comm_world>, module=0x7c2630) at
>>> base/coll_base_barrier.c:308
>>> #8  0x00007f55b15aafec in ompi_coll_tuned_barrier_intra_dec_fixed
>>> (comm=0x601280 <ompi_mpi_comm_world>, module=0x7c2630) at
>>> coll_tuned_decision_fixed.c:196
>>> #9  0x00007f55b14d36fd in PMPI_Barrier (comm=0x601280
>>> <ompi_mpi_comm_world>) at pbarrier.c:63
>>> #10 0x0000000000400b0b in main (argc=1, argv=0x7ffd1bb55658) at
>>> mpitest.c:26
>>> (gdb)
>>>
>>> Thinking that this might be a bug in tuned collectives, since that is
>>> what the stack shows, I ran the program like this (basically adding the
>>> ^tuned part)
>>>
>>> [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp
>>> -mca pml ob1 -mca coll ^tuned ./mpitest
>>>
>>> It still hangs, but now with a different stack trace:
>>> (gdb) bt
>>> #0  0x00007f910d38ac3d in poll () from /lib64/libc.so.6
>>> #1  0x00007f910c815de6 in poll_dispatch (base=0x1a317b0,
>>> tv=0x7fff43ee3610) at poll.c:165
>>> #2  0x00007f910c80da90 in opal_libevent2022_event_base_loop
>>> (base=0x1a317b0, flags=2) at event.c:1630
>>> #3  0x00007f910c739144 in opal_progress () at runtime/opal_progress.c:171
>>> #4  0x00007f910db130f7 in opal_condition_wait (c=0x7f910de47c40
>>> <ompi_request_cond>, m=0x7f910de47bc0 <ompi_request_lock>)
>>>     at ../../../../opal/threads/condition.h:76
>>> #5  0x00007f910db132d8 in ompi_request_wait_completion (req=0x1b07680)
>>> at ../../../../ompi/request/request.h:383
>>> #6  0x00007f910db1533b in mca_pml_ob1_send (buf=0x0, count=0,
>>> datatype=0x7f910de1e340 <ompi_mpi_byte>, dst=1, tag=-16,
>>> sendmode=MCA_PML_BASE_SEND_STANDARD,
>>>     comm=0x601280 <ompi_mpi_comm_world>) at pml_ob1_isend.c:259
>>> #7  0x00007f910d9c3b38 in ompi_coll_base_barrier_intra_basic_linear
>>> (comm=0x601280 <ompi_mpi_comm_world>, module=0x1b092c0) at
>>> base/coll_base_barrier.c:368
>>> #8  0x00007f910d91c6fd in PMPI_Barrier (comm=0x601280
>>> <ompi_mpi_comm_world>) at pbarrier.c:63
>>> #9  0x0000000000400b0b in main (argc=1, argv=0x7fff43ee3a58) at
>>> mpitest.c:26
>>> (gdb)
>>>
>>> The mpitest.c program is as follows:
>>> #include <mpi.h>
>>> #include <stdio.h>
>>> #include <string.h>
>>>
>>> int main(int argc, char** argv)
>>> {
>>>     int world_size, world_rank, name_len;
>>>     char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];
>>>
>>>     MPI_Init(&argc, &argv);
>>>     MPI_Comm_size(MPI_COMM_WORLD, &world_size);
>>>     MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
>>>     MPI_Get_processor_name(hostname, &name_len);
>>>     printf("Hello world from processor %s, rank %d out of %d
>>> processors\n", hostname, world_rank, world_size);
>>>     if (world_rank == 1)
>>>     {
>>>     MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>>>     printf("%s received %s\n", hostname, buf);
>>>     }
>>>     else
>>>     {
>>>     strcpy(buf, "haha!");
>>>     MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
>>>     printf("%s sent %s\n", hostname, buf);
>>>     }
>>>     MPI_Barrier(MPI_COMM_WORLD);
>>>     MPI_Finalize();
>>>     return 0;
>>> }
>>>
>>> The hostfile is as follows:
>>> 10.10.10.10 slots=1
>>> 10.10.10.11 slots=1
>>>
>>> The two nodes are connected by three physical and 3 logical networks:
>>> Physical: Gigabit Ethernet, 10G iWARP, 20G Infiniband
>>> Logical: IP (all 3), PSM (Qlogic Infiniband), Verbs (iWARP and
>>> Infiniband)
>>>
>>> Please note again that this is a fresh, brand new clone.
>>>
>>> Is this a bug (perhaps a side effect of --disable-dlopen) or something I
>>> am doing wrong?
>>>
>>> Thanks
>>> Durga
>>>
>>> We learn from history that we never learn from history.
>>>
>>>
>>> _______________________________________________
>>> users mailing listus...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2016/04/28930.php
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> <http://www.open-mpi.org/community/lists/users/2016/04/28932.php>
>>> http://www.open-mpi.org/community/lists/users/2016/04/28932.php
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/04/28942.php
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/04/28943.php
>>
>
>
>
> _______________________________________________
> users mailing listus...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/04/28944.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/04/28946.php
>

Re: [OMPI users] Possible bug in MPI_Barrier() ?

Reply via email to