On Apr 17, 2016, at 8:43 PM, dpchoudh . <dpcho...@gmail.com
<mailto:dpcho...@gmail.com>> wrote:
Hello Gilles and all
I am sorry to be bugging the developers, but this issue seems to
be nagging me, and I am surprised it does not seem to affect
anybody else. But then again, I am using the master branch, and
most users are probably using a released version.
This time I am using a totally different cluster. This has NO
verbs capable interface; just 2 Ethernet (1 of which has no IP
address and hence is unusable) plus 1 proprietary interface that
currently supports only IP traffic. The two IP interfaces
(Ethernet and proprietary) are on different IP subnets.
My test program is as follows:
#include <stdio.h>
#include <string.h>
#include "mpi.h"
int main(int argc, char *argv[])
{
char host[128];
int n;
MPI_Init(&argc, &argv);
MPI_Get_processor_name(host, &n);
printf("Hello from %s\n", host);
MPI_Comm_size(MPI_COMM_WORLD, &n);
printf("The world has %d nodes\n", n);
MPI_Comm_rank(MPI_COMM_WORLD, &n);
printf("My rank is %d\n",n);
//#if 0
if (n == 0)
{
strcpy(host, "ha!");
MPI_Send(host, strlen(host) + 1, MPI_CHAR, 1, 1, MPI_COMM_WORLD);
printf("sent %s\n", host);
}
else
{
//int len = strlen(host) + 1;
bzero(host, 128);
MPI_Recv(host, 4, MPI_CHAR, 0, 1, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("Received %s from rank 0\n", host);
}
//#endif
MPI_Finalize();
return 0;
}
This program, when run between two nodes, hangs. The command was:
[durga@b-1 ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl
self,tcp -mca pml ob1 -mca btl_tcp_if_include eno1 ./mpitest
And the hang is with the following output: (eno1 is one of the
gigEth interfaces, that takes OOB traffic as well)
Hello from b-1
The world has 2 nodes
My rank is 0
Hello from b-2
The world has 2 nodes
My rank is 1
Note that if I uncomment the #if 0 - #endif (i.e. comment out the
MPI_Send()/MPI_Recv() part, the program runs to completion. Also
note that the printfs following MPI_Send()/MPI_Recv() do not show
up on console.
Upon attaching gdb, the stack trace from the master node is as
follows:
Missing separate debuginfos, use: debuginfo-install
glibc-2.17-78.el7.x86_64 libpciaccess-0.13.4-2.el7.x86_64
(gdb) bt
#0 0x00007f72a533eb7d in poll () from /lib64/libc.so.6
#1 0x00007f72a4cb7146 in poll_dispatch (base=0xee33d0,
tv=0x7fff81057b70)
at poll.c:165
#2 0x00007f72a4caede0 in opal_libevent2022_event_base_loop
(base=0xee33d0,
flags=2) at event.c:1630
#3 0x00007f72a4c4e692 in opal_progress () at
runtime/opal_progress.c:171
#4 0x00007f72a0d07ac1 in opal_condition_wait (
c=0x7f72a5bb1e00 <ompi_request_cond>, m=0x7f72a5bb1d80
<ompi_request_lock>)
at ../../../../opal/threads/condition.h:76
#5 0x00007f72a0d07ca2 in ompi_request_wait_completion
(req=0x113eb80)
at ../../../../ompi/request/request.h:383
#6 0x00007f72a0d09cd5 in mca_pml_ob1_send (buf=0x7fff81057db0,
count=4,
datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1,
sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x601280
<ompi_mpi_comm_world>)
at pml_ob1_isend.c:251
#7 0x00007f72a58d6be3 in PMPI_Send (buf=0x7fff81057db0, count=4,
type=0x601080 <ompi_mpi_char>, dest=1, tag=1,
comm=0x601280 <ompi_mpi_comm_world>) at psend.c:78
#8 0x0000000000400afa in main (argc=1, argv=0x7fff81057f18) at
mpitest.c:19
(gdb)
And the backtrace on the non-master node is:
(gdb) bt
#0 0x00007ff3b377e48d in nanosleep () from /lib64/libc.so.6
#1 0x00007ff3b37af014 in usleep () from /lib64/libc.so.6
#2 0x00007ff3b0c922de in OPAL_PMIX_PMIX120_PMIx_Fence
(procs=0x0, nprocs=0,
info=0x0, ninfo=0) at src/client/pmix_client_fence.c:100
#3 0x00007ff3b0c5f1a6 in pmix120_fence (procs=0x0, collect_data=0)
at pmix120_client.c:258
#4 0x00007ff3b3cf8f4b in ompi_mpi_finalize ()
at runtime/ompi_mpi_finalize.c:242
#5 0x00007ff3b3d23295 in PMPI_Finalize () at pfinalize.c:47
#6 0x0000000000400958 in main (argc=1, argv=0x7fff785e8788) at
mpitest.c:30
(gdb)
The hostfile is as follows:
[durga@b-1 ~]$ cat hostfile
10.4.70.10 slots=1
10.4.70.11 slots=1
#10.4.70.12 slots=1
And the ifconfig output from the master node is as follows (the
other node is similar; all the IP interfaces are in their
respective subnets) :
[durga@b-1 ~]$ ifconfig
eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.4.70.10 netmask 255.255.255.0 broadcast 10.4.70.255
inet6 fe80::21e:c9ff:fefe:13df prefixlen 64 scopeid
0x20<link>
ether 00:1e:c9:fe:13:df txqueuelen 1000 (Ethernet)
RX packets 48215 bytes 27842846 (26.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 52746 bytes 7817568 (7.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 16
eno2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 00:1e:c9:fe:13:e0 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 17
lf0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2016
inet 192.168.1.2 netmask 255.255.255.0 broadcast
192.168.1.255
inet6 fe80::3002:ff:fe33:3333 prefixlen 64 scopeid
0x20<link>
ether 32:02:00:33:33:33 txqueuelen 1000 (Ethernet)
RX packets 10 bytes 512 (512.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 22 bytes 1536 (1.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 0 (Local Loopback)
RX packets 26 bytes 1378 (1.3 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 26 bytes 1378 (1.3 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Please help me with this. I am stuck with the TCP transport,
which is the most basic of all transports.
Thanks in advance
Durga
1% of the executables have 99% of CPU privilege!
Userspace code! Unite!! Occupy the kernel!!!
On Tue, Apr 12, 2016 at 9:32 PM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
This is quite unlikely, and fwiw, your test program works for me.
i suggest you check your 3 TCP networks are usable, for example
$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca
pml ob1 --mca btl_tcp_if_include xxx ./mpitest
in which xxx is a [list of] interface name :
eth0
eth1
ib0
eth0,eth1
eth0,ib0
...
eth0,eth1,ib0
and see where problem start occuring.
btw, are your 3 interfaces in 3 different subnet ? is routing
required between two interfaces of the same type ?
Cheers,
Gilles
On 4/13/2016 7:15 AM, dpchoudh . wrote:
Hi all
I have reported this issue before, but then had brushed it
off as something that was caused by my modifications to the
source tree. It looks like that is not the case.
Just now, I did the following:
1. Cloned a fresh copy from master.
2. Configured with the following flags, built and installed
it in my two-node "cluster".
--enable-debug --enable-debug-symbols --disable-dlopen
3. Compiled the following program, mpitest.c with these
flags: -g3 -Wall -Wextra
4. Ran it like this:
[durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca
btl self,tcp -mca pml ob1 ./mpitest
With this, the code hangs at MPI_Barrier() on both nodes,
after generating the following output:
Hello world from processor smallMPI, rank 0 out of 2 processors
Hello world from processor bigMPI, rank 1 out of 2 processors
smallMPI sent haha!
bigMPI received haha!
<Hangs until killed by ^C>
Attaching to the hung process at one node gives the
following backtrace:
(gdb) bt
#0 0x00007f55b0f41c3d in poll () from /lib64/libc.so.6
#1 0x00007f55b03ccde6 in poll_dispatch (base=0x70e7b0,
tv=0x7ffd1bb551c0) at poll.c:165
#2 0x00007f55b03c4a90 in opal_libevent2022_event_base_loop
(base=0x70e7b0, flags=2) at event.c:1630
#3 0x00007f55b02f0144 in opal_progress () at
runtime/opal_progress.c:171
#4 0x00007f55b14b4d8b in opal_condition_wait
(c=0x7f55b19fec40 <ompi_request_cond>, m=0x7f55b19febc0
<ompi_request_lock>) at ../opal/threads/condition.h:76
#5 0x00007f55b14b531b in ompi_request_default_wait_all
(count=2, requests=0x7ffd1bb55370, statuses=0x7ffd1bb55340)
at request/req_wait.c:287
#6 0x00007f55b157a225 in ompi_coll_base_sendrecv_zero
(dest=1, stag=-16, source=1, rtag=-16, comm=0x601280
<ompi_mpi_comm_world>)
at base/coll_base_barrier.c:63
#7 0x00007f55b157a92a in
ompi_coll_base_barrier_intra_two_procs (comm=0x601280
<ompi_mpi_comm_world>, module=0x7c2630) at
base/coll_base_barrier.c:308
#8 0x00007f55b15aafec in
ompi_coll_tuned_barrier_intra_dec_fixed (comm=0x601280
<ompi_mpi_comm_world>, module=0x7c2630) at
coll_tuned_decision_fixed.c:196
#9 0x00007f55b14d36fd in PMPI_Barrier (comm=0x601280
<ompi_mpi_comm_world>) at pbarrier.c:63
#10 0x0000000000400b0b in main (argc=1, argv=0x7ffd1bb55658)
at mpitest.c:26
(gdb)
Thinking that this might be a bug in tuned collectives,
since that is what the stack shows, I ran the program like
this (basically adding the ^tuned part)
[durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca
btl self,tcp -mca pml ob1 -mca coll ^tuned ./mpitest
It still hangs, but now with a different stack trace:
(gdb) bt
#0 0x00007f910d38ac3d in poll () from /lib64/libc.so.6
#1 0x00007f910c815de6 in poll_dispatch (base=0x1a317b0,
tv=0x7fff43ee3610) at poll.c:165
#2 0x00007f910c80da90 in opal_libevent2022_event_base_loop
(base=0x1a317b0, flags=2) at event.c:1630
#3 0x00007f910c739144 in opal_progress () at
runtime/opal_progress.c:171
#4 0x00007f910db130f7 in opal_condition_wait
(c=0x7f910de47c40 <ompi_request_cond>, m=0x7f910de47bc0
<ompi_request_lock>)
at ../../../../opal/threads/condition.h:76
#5 0x00007f910db132d8 in ompi_request_wait_completion
(req=0x1b07680) at ../../../../ompi/request/request.h:383
#6 0x00007f910db1533b in mca_pml_ob1_send (buf=0x0,
count=0, datatype=0x7f910de1e340 <ompi_mpi_byte>, dst=1,
tag=-16, sendmode=MCA_PML_BASE_SEND_STANDARD,
comm=0x601280 <ompi_mpi_comm_world>) at pml_ob1_isend.c:259
#7 0x00007f910d9c3b38 in
ompi_coll_base_barrier_intra_basic_linear (comm=0x601280
<ompi_mpi_comm_world>, module=0x1b092c0) at
base/coll_base_barrier.c:368
#8 0x00007f910d91c6fd in PMPI_Barrier (comm=0x601280
<ompi_mpi_comm_world>) at pbarrier.c:63
#9 0x0000000000400b0b in main (argc=1, argv=0x7fff43ee3a58)
at mpitest.c:26
(gdb)
The mpitest.c program is as follows:
#include <mpi.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv)
{
int world_size, world_rank, name_len;
char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Get_processor_name(hostname, &name_len);
printf("Hello world from processor %s, rank %d out of %d
processors\n", hostname, world_rank, world_size);
if (world_rank == 1)
{
MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("%s received %s\n", hostname, buf);
}
else
{
strcpy(buf, "haha!");
MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
printf("%s sent %s\n", hostname, buf);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
The hostfile is as follows:
10.10.10.10 slots=1
10.10.10.11 slots=1
The two nodes are connected by three physical and 3 logical
networks:
Physical: Gigabit Ethernet, 10G iWARP, 20G Infiniband
Logical: IP (all 3), PSM (Qlogic Infiniband), Verbs (iWARP
and Infiniband)
Please note again that this is a fresh, brand new clone.
Is this a bug (perhaps a side effect of --disable-dlopen) or
something I am doing wrong?
Thanks
Durga
We learn from history that we never learn from history.
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2016/04/28930.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/04/28932.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/04/28942.php