Hi, I was wondering if anyone had any comments with regarding to my
posting of questions.  Am I off base with my questions or is this the
wrong forum for these types of questions?   

> 
> Hi, I hope this is the right forum for my questions.  I am running
into a
> problem when scaling >512 cores on a infiniband cluster which has
14,336
> cores. I am new to openmpi and trying to figure out the right -mca
options
> to pass to avoid the "mca_oob_tcp_peer_complete_connect: connection
> failed:" on a cluster which has infiniband HCAs and OFED v1.3GA
release.
> Other MPI implementation like Intel MPI and mvapich work fine using
uDAPL
> or VERBs IB layers for MPI communications.
> 
> I find it difficult to understand which network interface or IB layer
> being used. When I explicitly state not to use eth0,lo,ib1, or ib1:0
> interfaces with the cmdline option "-mca oob_tcp_exclude" openmpi will
> continue to probe these interfaces.  For all MPI traffic openmpi
should
> use IB0 which is the 10.148 network. But with debugging enabled I see
> references trying the 10.149 network which is IB1.  Below is the
ifconfig
> network device output for a compute node.
> 
> Questions:
> 
> 1. Is there away to determine which network device is being used and
not
> have openmpi fallback to another device? With Intel MPI or HP MPI you
can
> state not to use a fallback device.  I thought "-mca oob_tcp_exclude"
> would be the correct option to pass but I maybe wrong.
> 
> 2. How can I determine infiniband openib device is actually being
used?
> When running a MPI app I continue to see counters for in/out packets
at a
> tcp level increasing when it should be using the IB RDMA device for
all
> MPI comms over the IB0 or mtcha0 device? OpenMPI was bundled with OFED
> v1.3 so I am assuming the openib interface should work.  Running
ompi_info
> shows btl_open_* references.
> 
> /usr/mpi/openmpi-1.2-2/intel/bin/mpiexec -mca
> btl_openib_warn_default_gid_prefix 0 -mca oob_tcp_exclude
> eth0,lo,ib1,ib1:0  -mca btl openib,sm,self -machinefile mpd.hosts.$$
-np
> 1024 ~/bin/test_ompi < input1
> 
> 3. When trying to avoid the "mca_oob_tcp_peer_complete_connect:
connection
> failed:" message I tried using "-mca btl openib,sm,self" and "-mca btl
> ^tcp" but I still get these error messages.  In cases with using the
"-mca
> btl openib,sm,self" openmpi will retry to use the IB1 (10.149 net)
fabric
> to establish a connection with a node.  What are my options to avoid
these
> connection failed messages?  I suspect openmpi is overflowing the tcp
> buffer on the clients based on large core count of this job since I
see
> lots of tcp buffer errors based on netstat -s output. I reviewed all
of
> the online FAQs and I am not sure what options to pass to get around
this
> issue.
> 
> OBTW, I did check the /usr/mpi/openmpi-1.2-2/intel/etc/openmpi-mca-
> params.conf file and no defaults are being specified.
> 
> ----
> 
> Ompi_info:
>                 Open MPI: 1.2.2
>    Open MPI SVN revision: r14613
>                 Open RTE: 1.2.2
>    Open RTE SVN revision: r14613
>                     OPAL: 1.2.2
>        OPAL SVN revision: r14613
>                   Prefix: /usr/mpi/openmpi-1.2-2/intel
>  Configured architecture: x86_64-suse-linux-gnu
> 
> ------
> 
> Following is the cluster configuration:
> 1792 nodes with 8 cores per node = 14336 cores
> Ofed Rel: OFED-1.3-rc1
> IB Device(s): mthca0 FW=1.2.0 Rate=20 Gb/sec (4X DDR) mthca1 FW=1.2.0
> Rate=20 Gb/sec (4X DDR)
> Processors: 2 x 4 Cores Intel(R) Xeon(R) CPU X5365 @ 3.00GHz 8192KB
Cache
> FSB:1333MHz
> Total Mem: 16342776 KB
> OS Release: SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10 SP1
> Kernel Ver: 2.6.16.54-0.2.5-smp
> 
> ------
> 
> Ifconfig output:
> eth0      Link encap:Ethernet  HWaddr 00:30:48:7B:A7:AC
>           inet addr:192.168.159.41  Bcast:192.168.159.255
> Mask:255.255.255.0
>           inet6 addr: fe80::230:48ff:fe7b:a7ac/64 Scope:Link
>           UP BROADCAST NOTRAILERS RUNNING MULTICAST  MTU:1500
Metric:1
>           RX packets:1215826 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:1342035 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:787514337 (751.0 Mb)  TX bytes:170968505 (163.0 Mb)
>           Base address:0x2000 Memory:dfa00000-dfa20000
> 
> ib0       Link encap:UNSPEC  HWaddr
80-00-04-04-FE-80-00-00-00-00-00-00-
> 00-00-00-00
>           inet addr:10.148.3.73  Bcast:10.148.255.255
Mask:255.255.0.0
>           inet6 addr: fe80::230:487b:a7ac:1/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
>           RX packets:20823896 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:19276836 errors:0 dropped:42 overruns:0 carrier:0
>           collisions:0 txqueuelen:256
>           RX bytes:176581223103 (168400.9 Mb)  TX bytes:182691213682
> (174227.9 Mb)
> 
> ib1       Link encap:UNSPEC  HWaddr
80-00-04-04-FE-80-00-00-00-00-00-00-
> 00-00-00-00
>           inet addr:10.149.195.73  Bcast:10.149.255.255
> Mask:255.255.192.0
>           inet6 addr: fe80::230:487b:a7ad:1/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
>           RX packets:175609 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:31175 errors:0 dropped:6 overruns:0 carrier:0
>           collisions:0 txqueuelen:256
>           RX bytes:139196236 (132.7 Mb)  TX bytes:4515680 (4.3 Mb)
> 
> ib1:0     Link encap:UNSPEC  HWaddr
80-00-04-04-FE-80-00-00-00-00-00-00-
> 00-00-00-00
>           inet addr:10.149.3.73  Bcast:10.149.63.255
Mask:255.255.192.0
>           UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
> 
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:30554 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:30554 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:54170543 (51.6 Mb)  TX bytes:54170543 (51.6 Mb)
> 
> --------
> 
> Ibstatus output:
> Infiniband device 'mthca0' port 1 status:
>         default gid:     fe80:0000:0000:0000:0030:487c:04b4:0001
>         base lid:        0x4fb
>         sm lid:          0x1
>         state:           4: ACTIVE
>         phys state:      5: LinkUp
>         rate:            20 Gb/sec (4X DDR)
> 
> Infiniband device 'mthca1' port 1 status:
>         default gid:     fe80:0000:0000:0000:0030:487c:04b5:0001
>         base lid:        0x50c
>         sm lid:          0x1
>         state:           4: ACTIVE
>         phys state:      5: LinkUp
>         rate:            20 Gb/sec (4X DDR)
> 
> --------
> 
> Thanks in advance,
> Scott


Reply via email to