Dear OMPI pros

This seems to be a question in the nowhere land between OMPI and hwloc.
However, it appeared as an OMPI error, hence it may be OK to ask the question in this list.

***

A user here got this error (or warning?) message today:

+ mpiexec -np 64 $HOME/echam-aiv_ldeo_6.1.00p1/bin/echam6
****************************************************************************
* Hwloc has encountered what looks like an error from the operating system.
*
* object intersection without inclusion!
* Error occurred in topology.c line 594
*
* Please report this error message to the hwloc user's mailing list,
* along with the output from the hwloc-gather-topology.sh script.
****************************************************************************

Additional info:

1) We have OMPI 1.6.5. This user is using the one built
with Intel compilers 2011.13.367.

2) I set these MCA parameters in $OMPI/etc/openmpi-mca-params.conf (includes binding to core):

btl = ^tcp
orte_tag_output = 1
rmaps_base_schedule_policy = core
orte_process_binding = core
orte_report_bindings = 1
opal_paffinity_alone = 1


3) The machines have dual-socket 16-core AMD Opteron 6376 (Abu-Dhabi),
which have one FPU for each pair of cores, a hierarchy of caches serving
sub-groups of cores, etc.
The OS is  Linux CentOS 6.4 with stock CentOS OFED.
Interconnect is Infiniband QDR (Mellanox HW).

4) We have Torque 4.2.5, built with cpuset support.
OMPI is built with Torque (tm) support.

5) In case it helps, I attach the output of
hwloc-gather-topology, which I ran on the node that threw the error,
although not immediately after the job failure.
I used the hwloc-gather-topology script that comes with
the hwloc (version 1.5) provided by CentOS.
As far as I can tell the hwloc nuts and bits built into OMPI
do not include the hwloc-gather-topology script (although it may be a newer hwloc version. 1.8 perhaps?).
Hopefully the mail servers won't chop off the attachments.

6) I am a bit surprised by this error message, because I haven't
seen it before, although we have used OMPI 1.6.5 in
this machine with several other programs without problems.
Alas, it happened now.

**

- Is this a known hwloc problem in this processor architecture?

- Is this a known issue in this combination of HW and SW?

- Would not binding the MPI processes (to core or socket), perhaps help?

- Any workarounds or suggestions?

**

Thank you,
Gus Correa
Machine (P#0 total=134199384KB DMIProductName=H8DGU 
DMIProductVersion=1234567890 DMIProductSerial=1234567890 
DMIProductUUID=534D4349-0002-8390-2500-839025001D97 DMIBoardVendor=Supermicro 
DMIBoardName=H8DGU DMIBoardVersion=1234567890 DMIBoardSerial=NM29S71392 
DMIBoardAssetTag="To Be Filled By O.E.M." DMIChassisVendor=Supermicro 
DMIChassisType=17 DMIChassisVersion=1234567890 DMIChassisSerial=1234567890 
DMIChassisAssetTag="To Be Filled By O.E.M." DMIBIOSVendor="American Megatrends 
Inc." DMIBIOSVersion="3.0       " DMIBIOSDate=08/31/2012 
DMISysVendor=Supermicro Backend=Linux LinuxCgroup=/)
  Socket L#0 (P#0 total=67106904KB CPUModel="AMD Opteron(tm) Processor 6376     
            ")
    NUMANode L#0 (P#0 local=33552472KB total=33552472KB)
      L3Cache L#0 (size=6144KB linesize=64 ways=64)
        L2Cache L#0 (size=2048KB linesize=64 ways=16)
          L1iCache L#0 (size=64KB linesize=64 ways=2)
            L1dCache L#0 (size=16KB linesize=64 ways=4)
              Core L#0 (P#0)
                PU L#0 (P#0)
            L1dCache L#1 (size=16KB linesize=64 ways=4)
              Core L#1 (P#1)
                PU L#1 (P#1)
        L2Cache L#1 (size=2048KB linesize=64 ways=16)
          L1iCache L#1 (size=64KB linesize=64 ways=2)
            L1dCache L#2 (size=16KB linesize=64 ways=4)
              Core L#2 (P#2)
                PU L#2 (P#2)
            L1dCache L#3 (size=16KB linesize=64 ways=4)
              Core L#3 (P#3)
                PU L#3 (P#3)
        L2Cache L#2 (size=2048KB linesize=64 ways=16)
          L1iCache L#2 (size=64KB linesize=64 ways=2)
            L1dCache L#4 (size=16KB linesize=64 ways=4)
              Core L#4 (P#4)
                PU L#4 (P#4)
            L1dCache L#5 (size=16KB linesize=64 ways=4)
              Core L#5 (P#5)
                PU L#5 (P#5)
        L2Cache L#3 (size=2048KB linesize=64 ways=16)
          L1iCache L#3 (size=64KB linesize=64 ways=2)
            L1dCache L#6 (size=16KB linesize=64 ways=4)
              Core L#6 (P#6)
                PU L#6 (P#6)
            L1dCache L#7 (size=16KB linesize=64 ways=4)
              Core L#7 (P#7)
                PU L#7 (P#7)
    NUMANode L#1 (P#1 local=33554432KB total=33554432KB)
      L3Cache L#1 (size=6144KB linesize=64 ways=64)
        L2Cache L#4 (size=2048KB linesize=64 ways=16)
          L1iCache L#4 (size=64KB linesize=64 ways=2)
            L1dCache L#8 (size=16KB linesize=64 ways=4)
              Core L#8 (P#0)
                PU L#8 (P#8)
            L1dCache L#9 (size=16KB linesize=64 ways=4)
              Core L#9 (P#1)
                PU L#9 (P#9)
        L2Cache L#5 (size=2048KB linesize=64 ways=16)
          L1iCache L#5 (size=64KB linesize=64 ways=2)
            L1dCache L#10 (size=16KB linesize=64 ways=4)
              Core L#10 (P#2)
                PU L#10 (P#10)
            L1dCache L#11 (size=16KB linesize=64 ways=4)
              Core L#11 (P#3)
                PU L#11 (P#11)
        L2Cache L#6 (size=2048KB linesize=64 ways=16)
          L1iCache L#6 (size=64KB linesize=64 ways=2)
            L1dCache L#12 (size=16KB linesize=64 ways=4)
              Core L#12 (P#4)
                PU L#12 (P#12)
            L1dCache L#13 (size=16KB linesize=64 ways=4)
              Core L#13 (P#5)
                PU L#13 (P#13)
        L2Cache L#7 (size=2048KB linesize=64 ways=16)
          L1iCache L#7 (size=64KB linesize=64 ways=2)
            L1dCache L#14 (size=16KB linesize=64 ways=4)
              Core L#14 (P#6)
                PU L#14 (P#14)
            L1dCache L#15 (size=16KB linesize=64 ways=4)
              Core L#15 (P#7)
                PU L#15 (P#15)
  Socket L#1 (P#1 total=67092480KB CPUModel="AMD Opteron(tm) Processor 6376     
            ")
    NUMANode L#2 (P#2 local=33554432KB total=33554432KB)
      L3Cache L#2 (size=6144KB linesize=64 ways=64)
        L2Cache L#8 (size=2048KB linesize=64 ways=16)
          L1iCache L#8 (size=64KB linesize=64 ways=2)
            L1dCache L#16 (size=16KB linesize=64 ways=4)
              Core L#16 (P#0)
                PU L#16 (P#16)
            L1dCache L#17 (size=16KB linesize=64 ways=4)
              Core L#17 (P#1)
                PU L#17 (P#17)
        L2Cache L#9 (size=2048KB linesize=64 ways=16)
          L1iCache L#9 (size=64KB linesize=64 ways=2)
            L1dCache L#18 (size=16KB linesize=64 ways=4)
              Core L#18 (P#2)
                PU L#18 (P#18)
            L1dCache L#19 (size=16KB linesize=64 ways=4)
              Core L#19 (P#3)
                PU L#19 (P#19)
        L2Cache L#10 (size=2048KB linesize=64 ways=16)
          L1iCache L#10 (size=64KB linesize=64 ways=2)
            L1dCache L#20 (size=16KB linesize=64 ways=4)
              Core L#20 (P#4)
                PU L#20 (P#20)
            L1dCache L#21 (size=16KB linesize=64 ways=4)
              Core L#21 (P#5)
                PU L#21 (P#21)
        L2Cache L#11 (size=2048KB linesize=64 ways=16)
          L1iCache L#11 (size=64KB linesize=64 ways=2)
            L1dCache L#22 (size=16KB linesize=64 ways=4)
              Core L#22 (P#6)
                PU L#22 (P#22)
            L1dCache L#23 (size=16KB linesize=64 ways=4)
              Core L#23 (P#7)
                PU L#23 (P#23)
    NUMANode L#3 (P#3 local=33538048KB total=33538048KB)
      L3Cache L#3 (size=6144KB linesize=64 ways=64)
        L2Cache L#12 (size=2048KB linesize=64 ways=16)
          L1iCache L#12 (size=64KB linesize=64 ways=2)
            L1dCache L#24 (size=16KB linesize=64 ways=4)
              Core L#24 (P#0)
                PU L#24 (P#24)
            L1dCache L#25 (size=16KB linesize=64 ways=4)
              Core L#25 (P#1)
                PU L#25 (P#25)
        L2Cache L#13 (size=2048KB linesize=64 ways=16)
          L1iCache L#13 (size=64KB linesize=64 ways=2)
            L1dCache L#26 (size=16KB linesize=64 ways=4)
              Core L#26 (P#2)
                PU L#26 (P#26)
            L1dCache L#27 (size=16KB linesize=64 ways=4)
              Core L#27 (P#3)
                PU L#27 (P#27)
        L2Cache L#14 (size=2048KB linesize=64 ways=16)
          L1iCache L#14 (size=64KB linesize=64 ways=2)
            L1dCache L#28 (size=16KB linesize=64 ways=4)
              Core L#28 (P#4)
                PU L#28 (P#28)
            L1dCache L#29 (size=16KB linesize=64 ways=4)
              Core L#29 (P#5)
                PU L#29 (P#29)
        L2Cache L#15 (size=2048KB linesize=64 ways=16)
          L1iCache L#15 (size=64KB linesize=64 ways=2)
            L1dCache L#30 (size=16KB linesize=64 ways=4)
              Core L#30 (P#6)
                PU L#30 (P#30)
            L1dCache L#31 (size=16KB linesize=64 ways=4)
              Core L#31 (P#7)
                PU L#31 (P#31)
depth 0:        1 Machine (type #1)
 depth 1:       2 Socket (type #3)
  depth 2:      4 NUMANode (type #2)
   depth 3:     4 L3Cache (type #4)
    depth 4:    16 L2Cache (type #4)
     depth 5:   16 L1iCache (type #4)
      depth 6:  32 L1dCache (type #4)
       depth 7: 32 Core (type #5)
        depth 8:        32 PU (type #6)
latency matrix between NUMANodes (depth 2) by logical indexes:
  index     0     1     2     3
      0 1.000 1.600 1.600 1.600
      1 1.600 1.000 1.600 1.600
      2 1.600 1.600 1.000 1.600
      3 1.600 1.600 1.600 1.000
Topology not from this system

Attachment: node15.tar.bz2
Description: application/bzip

Reply via email to