Re: [OMPI users] [openib] segfault when using openib btl

Terry Dontje Mon, 27 Sep 2010 11:22:17 -0400

Ok there were no 0 value tags in your files. Are you running this withno eager RDMA? If not can you set the following options "-mcabtl_openib_use_eager_rdma 0 -mca btl_openib_max_eager_rdma 0 -mcabtl_openib_flags 1".


thanks,


--td

Eloi Gaudry wrote:

Terry,

Please find enclosed the requested check outputs (using -output-filename 
stdout.tag.null option).
I'm displaying frag->hdr->tag here.

Eloi

On Monday 27 September 2010 16:29:12 Terry Dontje wrote:

Eloi, sorry can you print out frag->hdr->tag?

Unfortunately from your last email I think it will still all have
non-zero values.
If that ends up being the case then there must be something odd with the
descriptor pointer to the fragment.

--td

Eloi Gaudry wrote:

Terry,

Please find enclosed the requested check outputs (using -output-filename
stdout.tag.null option).

For information, Nysal In his first message referred to
ompi/mca/pml/ob1/pml_ob1_hdr.h and said that hdr->tg value was wrnong on
receiving side. #define MCA_PML_OB1_HDR_TYPE_MATCH     (MCA_BTL_TAG_PML
+ 1)
#define MCA_PML_OB1_HDR_TYPE_RNDV      (MCA_BTL_TAG_PML + 2)
#define MCA_PML_OB1_HDR_TYPE_RGET      (MCA_BTL_TAG_PML + 3)

 #define MCA_PML_OB1_HDR_TYPE_ACK       (MCA_BTL_TAG_PML + 4)

#define MCA_PML_OB1_HDR_TYPE_NACK      (MCA_BTL_TAG_PML + 5)
#define MCA_PML_OB1_HDR_TYPE_FRAG      (MCA_BTL_TAG_PML + 6)
#define MCA_PML_OB1_HDR_TYPE_GET       (MCA_BTL_TAG_PML + 7)

 #define MCA_PML_OB1_HDR_TYPE_PUT       (MCA_BTL_TAG_PML + 8)

#define MCA_PML_OB1_HDR_TYPE_FIN       (MCA_BTL_TAG_PML + 9)
and in ompi/mca/btl/btl.h
#define MCA_BTL_TAG_PML             0x40

Eloi

On Monday 27 September 2010 14:36:59 Terry Dontje wrote:

I am thinking checking the value of *frag->hdr right before the return
in the post_send function in ompi/mca/btl/openib/btl_openib_endpoint.h.
It is line 548 in the trunk
https://svn.open-mpi.org/source/xref/ompi-trunk/ompi/mca/btl/openib/btl_
ope nib_endpoint.h#548

--td

Eloi Gaudry wrote:

Hi Terry,

Do you have any patch that I could apply to be able to do so ? I'm
remotely working on a cluster (with a terminal) and I cannot use any
parallel debugger or sequential debugger (with a call to xterm...). I
can track frag->hdr->tag value in
ompi/mca/btl/openib/btl_openib_component.c::handle_wc in the
SEND/RDMA_WRITE case, but this is all I can think of alone.

You'll find a stacktrace (receive side) in this thread (10th or 11th
message) but it might be pointless.

Regards,
Eloi

On Monday 27 September 2010 11:43:55 Terry Dontje wrote:

So it sounds like coalescing is not your issue and that the problem
has something to do with the queue sizes.  It would be helpful if we
could detect the hdr->tag == 0 issue on the sending side and get at
least a stack trace.  There is something really odd going on here.

--td

Eloi Gaudry wrote:

Hi Terry,

I'm sorry to say that I might have missed a point here.

I've lately been relaunching all previously failing computations with
the message coalescing feature being switched off, and I saw the same
hdr->tag=0 error several times, always during a collective call
(MPI_Comm_create, MPI_Allreduce and MPI_Broadcast, so far). And as
soon as I switched to the peer queue option I was previously using
(--mca btl_openib_receive_queues P,65536,256,192,128 instead of using
--mca btl_openib_use_message_coalescing 0), all computations ran
flawlessly.

As for the reproducer, I've already tried to write something but I
haven't succeeded so far at reproducing the hdr->tag=0 issue with it.

Eloi

On 24/09/2010 18:37, Terry Dontje wrote:

Eloi Gaudry wrote:

Terry,

You were right, the error indeed seems to come from the message
coalescing feature. If I turn it off using the "--mca
btl_openib_use_message_coalescing 0", I'm not able to observe the
"hdr->tag=0" error.

There are some trac requests associated to very similar error
(https://svn.open-mpi.org/trac/ompi/search?q=coalescing) but they
are all closed (except
https://svn.open-mpi.org/trac/ompi/ticket/2352 that might be
related), aren't they ? What would you suggest Terry ?

Interesting, though it looks to me like the segv in ticket 2352
would have happened on the send side instead of the receive side
like you have.  As to what to do next it would be really nice to
have some sort of reproducer that we can try and debug what is
really going on.  The only other thing to do without a reproducer
is to inspect the code on the send side to figure out what might
make it generate at 0 hdr->tag.  Or maybe instrument the send side
to stop when it is about ready to send a 0 hdr->tag and see if we
can see how the code got there.

I might have some cycles to look at this Monday.

--td

Eloi

On Friday 24 September 2010 16:00:26 Terry Dontje wrote:

Eloi Gaudry wrote:

Terry,

No, I haven't tried any other values than P,65536,256,192,128
yet.

The reason why is quite simple. I've been reading and reading
again this thread to understand the btl_openib_receive_queues
meaning and I can't figure out why the default values seem to
induce the hdr-

tag=0 issue
(http://www.open-mpi.org/community/lists/users/2009/01/7808.php)
.

Yeah, the size of the fragments and number of them really should
not cause this issue.  So I too am a little perplexed about it.

Do you think that the default shared received queue parameters
are erroneous for this specific Mellanox card ? Any help on
finding the proper parameters would actually be much
appreciated.

I don't necessarily think it is the queue size for a specific card
but more so the handling of the queues by the BTL when using
certain sizes. At least that is one gut feel I have.

In my mind the tag being 0 is either something below OMPI is
polluting the data fragment or OMPI's internal protocol is some
how getting messed up.  I can imagine (no empirical data here)
the queue sizes could change how the OMPI protocol sets things
up. Another thing may be the coalescing feature in the openib BTL
which tries to gang multiple messages into one packet when
resources are running low.   I can see where changing the queue
sizes might affect the coalescing. So, it might be interesting to
turn off the coalescing.  You can do that by setting "--mca
btl_openib_use_message_coalescing 0" in your mpirun line.

If that doesn't solve the issue then obviously there must be
something else going on :-).

Note, the reason I am interested in this is I am seeing a similar
error condition (hdr->tag == 0) on a development system.  Though
my failing case fails with np=8 using the connectivity test
program which is mainly point to point and there are not a
significant amount of data transfers going on either.

--td

Eloi

On Friday 24 September 2010 14:27:07 you wrote:

That is interesting.  So does the number of processes affect
your runs any.  The times I've seen hdr->tag be 0 usually has
been due to protocol issues.  The tag should never be 0.  Have
you tried to do other receive_queue settings other than the
default and the one you mention.

I wonder if you did a combination of the two receive queues
causes a failure or not.  Something like

P,128,256,192,128:P,65536,256,192,128

I am wondering if it is the first queuing definition causing the
issue or possibly the SRQ defined in the default.

--td

Eloi Gaudry wrote:

Hi Terry,

The messages being send/received can be of any size, but the
error seems to happen more often with small messages (as an int
being broadcasted or allreduced). The failing communication
differs from one run to another, but some spots are more likely
to be failing than another. And as far as I know, there are
always located next to a small message (an int being
broadcasted for instance) communication. Other typical
messages size are

10k but can be very much larger.

I've been checking the hca being used, its' from mellanox (with
vendor_part_id=26428). There is no receive_queues parameters
associated to it.

 $ cat share/openmpi/mca-btl-openib-device-params.ini as well:
[...]

  # A.k.a. ConnectX
  [Mellanox Hermon]
  vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3
  vendor_part_id =
  25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,2
  64 88 use_eager_rdma = 1
  mtu = 2048
  max_inline_data = 128

[..]

$ ompi_info --param btl openib --parsable | grep receive_queues

 mca:btl:openib:param:btl_openib_receive_queues:value:P,128,256
 ,1 92 ,128

:S ,2048,256,128,32:S,12288,256,128,32:S,65536,256,128,32mca:btl:openib:param:btl_openib_receive_queues:data_source:def

 au lt value
 mca:btl:openib:param:btl_openib_receive_queues:status:writable
 mca:btl:openib:param:btl_openib_receive_queues:help:Colon-deli
 mi t ed, comma delimited list of receive queues:
 P,4096,8,6,4:P,32768,8,6,4
 mca:btl:openib:param:btl_openib_receive_queues:deprecated:no

I was wondering if these parameters (automatically computed at
openib btl init for what I understood) were not incorrect in
some way and I plugged some others values:
"P,65536,256,192,128" (someone on the list used that values
when encountering a different issue) . Since that, I haven't
been able to observe the segfault (occuring as hrd->tag = 0 in
btl_openib_component.c:2881) yet.

Eloi


/home/pp_fr/st03230/EG/Softs/openmpi-custom-1.4.2/bin/

On Thursday 23 September 2010 23:33:48 Terry Dontje wrote:

Eloi, I am curious about your problem.  Can you tell me what
size of job it is?  Does it always fail on the same bcast,  or
same process?

Eloi Gaudry wrote:

Hi Nysal,

Thanks for your suggestions.

I'm now able to get the checksum computed and redirected to
stdout, thanks (I forgot the  "-mca pml_base_verbose 5"
option, you were right). I haven't been able to observe the
segmentation fault (with hdr->tag=0) so far (when using pml
csum) but I 'll let you know when I am.

I've got two others question, which may be related to the
error observed:

1/ does the maximum number of MPI_Comm that can be handled by
OpenMPI somehow depends on the btl being used (i.e. if I'm
using openib, may I use the same number of MPI_Comm object as
with tcp) ? Is there something as MPI_COMM_MAX in OpenMPI ?

2/ the segfaults only appears during a mpi collective call,
with very small message (one int is being broadcast, for
instance) ; i followed the guidelines given at
http://icl.cs.utk.edu/open-
mpi/faq/?category=openfabrics#ib-small-message-rdma but the
debug-build of OpenMPI asserts if I use a different min-size
that 255. Anyway, if I deactivate eager_rdma, the segfaults
remains. Does the openib btl handle very small message
differently (even with eager_rdma
deactivated) than tcp ?

Others on the list does coalescing happen with non-eager_rdma?
If so then that would possibly be one difference between the
openib btl and tcp aside from the actual protocol used.

 is there a way to make sure that large messages and small
 messages are handled the same way ?

Do you mean so they all look like eager messages?  How large
of messages are we talking about here 1K, 1M or 10M?

--td

Regards,
Eloi

On Friday 17 September 2010 17:57:17 Nysal Jan wrote:

Hi Eloi,
Create a debug build of OpenMPI (--enable-debug) and while
running with the csum PML add "-mca pml_base_verbose 5" to
the command line. This will print the checksum details for
each fragment sent over the wire. I'm guessing it didnt
catch anything because the BTL failed. The checksum
verification is done in the PML, which the BTL calls via a
callback function. In your case the PML callback is never
called because the hdr->tag is invalid. So enabling
checksum tracing also might not be of much use. Is it the
first Bcast that fails or the nth Bcast and what is the
message size? I'm not sure what could be the problem at
this moment. I'm afraid you will have to debug the BTL to
find out more.

--Nysal

On Fri, Sep 17, 2010 at 4:39 PM, Eloi Gaudry <e...@fft.be> wrote:

Hi Nysal,

thanks for your response.

I've been unable so far to write a test case that could
illustrate the hdr->tag=0 error.
Actually, I'm only observing this issue when running an
internode computation involving infiniband hardware from
Mellanox (MT25418, ConnectX IB DDR, PCIe 2.0
2.5GT/s, rev a0) with our time-domain software.

I checked, double-checked, and rechecked again every MPI
use performed during a parallel computation and I couldn't
find any error so far. The fact that the very
same parallel computation run flawlessly when using tcp
(and disabling openib support) might seem to indicate that
the issue is somewhere located inside the
openib btl or at the hardware/driver level.

I've just used the "-mca pml csum" option and I haven't
seen any related messages (when hdr->tag=0 and the
segfaults occurs). Any suggestion ?

Regards,
Eloi

On Friday 17 September 2010 16:03:34 Nysal Jan wrote:

Hi Eloi,
Sorry for the delay in response. I haven't read the entire
email thread, but do you have a test case which can
reproduce this error? Without that it will be difficult to
nail down the cause. Just to clarify, I do not work for an
iwarp vendor. I can certainly try to reproduce it on an IB
system. There is also a PML called csum, you can use it
via "-mca pml csum", which will checksum the MPI messages
and verify it at the receiver side for any data
corruption. You can try using it to see if it is able

to

catch anything.

Regards
--Nysal

On Thu, Sep 16, 2010 at 3:48 PM, Eloi Gaudry <e...@fft.be> wrote:

Hi Nysal,

I'm sorry to intrrupt, but I was wondering if you had a
chance to look

at

this error.

Regards,
Eloi



--


Eloi Gaudry

Free Field Technologies
Company Website: http://www.fft.be
Company Phone:   +32 10 487 959


---------- Forwarded message ----------
From: Eloi Gaudry <e...@fft.be>
To: Open MPI Users <us...@open-mpi.org>
Date: Wed, 15 Sep 2010 16:27:43 +0200
Subject: Re: [OMPI users] [openib] segfault when using
openib btl Hi,

I was wondering if anybody got a chance to have a look at
this issue.

Regards,
Eloi

On Wednesday 18 August 2010 09:16:26 Eloi Gaudry wrote:

Hi Jeff,

Please find enclosed the output (valgrind.out.gz) from
/opt/openmpi-debug-1.4.2/bin/orterun -np 2 --host
pbn11,pbn10 --mca

btl

openib,self --display-map --verbose --mca
mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0
-tag-output /opt/valgrind-3.5.0/bin/valgrind
--tool=memcheck
--suppressions=/opt/openmpi-debug-1.4.2/share/openmpi/o
pen mp i- valgrind.supp
--suppressions=./suppressions.python.supp
/opt/actran/bin/actranpy_mp ...

Thanks,
Eloi

On Tuesday 17 August 2010 09:32:53 Eloi Gaudry wrote:

On Monday 16 August 2010 19:14:47 Jeff Squyres wrote:

On Aug 16, 2010, at 10:05 AM, Eloi Gaudry wrote:

I did run our application through valgrind but it
couldn't find any "Invalid write": there is a bunch
of "Invalid read" (I'm using

1.4.2

with the suppression file), "Use of uninitialized
bytes" and "Conditional jump depending on
uninitialized bytes" in

different

ompi

routines. Some of them are located in
btl_openib_component.c. I'll send you an output of
valgrind shortly.

A lot of them in btl_openib_* are to be expected --
OpenFabrics uses OS-bypass methods for some of its
memory, and therefore valgrind is unaware of them (and
therefore incorrectly marks them as
uninitialized).

would it  help if i use the upcoming 1.5 version of
openmpi ? i

read

that

a huge effort has been done to clean-up the valgrind
output ? but maybe that this doesn't concern this btl
(for the reasons you mentionned).

Another question, you said that the callback function
pointer

should

never be 0. But can the tag be null (hdr->tag) ?

The tag is not a pointer -- it's just an integer.

I was worrying that its value could not be null.

I'll send a valgrind output soon (i need to build
libpython without pymalloc first).

Thanks,
Eloi

Thanks for your help,
Eloi

On 16/08/2010 18:22, Jeff Squyres wrote:

Sorry for the delay in replying.

Odd; the values of the callback function pointer
should never

be

0.

This seems to suggest some kind of memory corruption
is occurring.

I don't know if it's possible, because the stack
trace looks like you're calling through python, but
can you run this application through valgrind, or
some other memory-checking debugger?

On Aug 10, 2010, at 7:15 AM, Eloi Gaudry wrote:

Hi,

sorry, i just forgot to add the values of the
function

parameters:

(gdb) print reg->cbdata
$1 = (void *) 0x0
(gdb) print openib_btl->super
$2 = {btl_component = 0x2b341edd7380,
btl_eager_limit =

12288,

btl_rndv_eager_limit = 12288, btl_max_send_size =
65536, btl_rdma_pipeline_send_length = 1048576,

  btl_rdma_pipeline_frag_size = 1048576,

btl_min_rdma_pipeline_size

  = 1060864, btl_exclusivity = 1024, btl_latency =
  10, btl_bandwidth = 800, btl_flags = 310,
  btl_add_procs =
  0x2b341eb8ee47<mca_btl_openib_add_procs>,
  btl_del_procs =
  0x2b341eb90156<mca_btl_openib_del_procs>,
  btl_register = 0, btl_finalize =
  0x2b341eb93186<mca_btl_openib_finalize>,

btl_alloc

  = 0x2b341eb90a3e<mca_btl_openib_alloc>, btl_free
  = 0x2b341eb91400<mca_btl_openib_free>,
  btl_prepare_src =
  0x2b341eb91813<mca_btl_openib_prepare_src>,
  btl_prepare_dst

  0x2b341eb91f2e<mca_btl_openib_prepare_dst>,
  btl_send = 0x2b341eb94517<mca_btl_openib_send>,
  btl_sendi = 0x2b341eb9340d<mca_btl_openib_sendi>,
  btl_put = 0x2b341eb94660<mca_btl_openib_put>,
  btl_get = 0x2b341eb94c4e<mca_btl_openib_get>,
  btl_dump = 0x2b341acd45cb<mca_btl_base_dump>,
  btl_mpool = 0xf3f4110, btl_register_error =
  0x2b341eb90565<mca_btl_openib_register_error_cb>,
  btl_ft_event

  0x2b341eb952e7<mca_btl_openib_ft_event>}

(gdb) print hdr->tag
$3 = 0 '\0'
(gdb) print des
$4 = (mca_btl_base_descriptor_t *) 0xf4a6700
(gdb) print reg->cbfunc
$5 = (mca_btl_base_module_recv_cb_fn_t) 0

Eloi

On Tuesday 10 August 2010 16:04:08 Eloi Gaudry wrote:

Hi,

Here is the output of a core file generated during
a

segmentation

fault observed during a collective call (using
openib):

#0  0x0000000000000000 in ?? ()
(gdb) where
#0  0x0000000000000000 in ?? ()
#1  0x00002aedbc4e05f4 in
btl_openib_handle_incoming
(openib_btl=0x1902f9b0, ep=0x1908a1c0,
frag=0x190d9700, byte_len=18) at
btl_openib_component.c:2881 #2 0x00002aedbc4e25e2
in handle_wc (device=0x19024ac0, cq=0,
wc=0x7ffff279ce90) at
btl_openib_component.c:3178 #3  0x00002aedbc4e2e9d
in

poll_device

(device=0x19024ac0, count=2) at
btl_openib_component.c:3318

#4

0x00002aedbc4e34b8 in progress_one_device

(device=0x19024ac0)

at btl_openib_component.c:3426 #5
0x00002aedbc4e3561 in
btl_openib_component_progress () at
btl_openib_component.c:3451

#6

0x00002aedb8b22ab8 in opal_progress () at
runtime/opal_progress.c:207 #7 0x00002aedb859f497
in opal_condition_wait (c=0x2aedb888ccc0,
m=0x2aedb888cd20) at
../opal/threads/condition.h:99 #8
0x00002aedb859fa31 in
ompi_request_default_wait_all

(count=2,

requests=0x7ffff279d0e0, statuses=0x0) at
request/req_wait.c:262 #9 0x00002aedbd7559ad in
ompi_coll_tuned_allreduce_intra_recursivedoubling
(sbuf=0x7ffff279d444, rbuf=0x7ffff279d440,
count=1, dtype=0x6788220, op=0x6787a20,
comm=0x19d81ff0, module=0x19d82b20) at

coll_tuned_allreduce.c:223

#10 0x00002aedbd7514f7 in
ompi_coll_tuned_allreduce_intra_dec_fixed
(sbuf=0x7ffff279d444, rbuf=0x7ffff279d440,
count=1, dtype=0x6788220, op=0x6787a20,
comm=0x19d81ff0, module=0x19d82b20) at
coll_tuned_decision_fixed.c:63
#11 0x00002aedb85c7792 in PMPI_Allreduce

(sendbuf=0x7ffff279d444,

recvbuf=0x7ffff279d440, count=1,
datatype=0x6788220,

op=0x6787a20,

comm=0x19d81ff0) at pallreduce.c:102 #12
0x0000000004387dbf

in

FEMTown::MPI::Allreduce (sendbuf=0x7ffff279d444,
recvbuf=0x7ffff279d440, count=1,
datatype=0x6788220,

op=0x6787a20,

comm=0x19d81ff0) at stubs.cpp:626 #13
0x0000000004058be8 in FEMTown::Domain::align (itf=

{<FEMTown::Boost::shared_base_ptr<FEMTown::Domain::Int

            er fa ce>>

= {_vptr.shared_base_ptr = 0x7ffff279d620, ptr_ =
{px = 0x199942a4, pn = {pi_ = 0x6}}},<No data
fields>}) at interface.cpp:371 #14
0x00000000040cb858 in
FEMTown::Field::detail::align_itfs_and_neighbhors
(dim=2,

set={px

= 0x7ffff279d780, pn = {pi_ = 0x2f279d640}},
check_info=@0x7ffff279d7f0) at check.cpp:63 #15

0x00000000040cbfa8

in FEMTown::Field::align_elements (set={px =
0x7ffff279d950, pn

{pi_ = 0x66e08d0}}, check_info=@0x7ffff279d7f0) at
check.cpp:159 #16 0x00000000039acdd4 in
PyField_align_elements (self=0x0,
args=0x2aaab0765050, kwds=0x19d2e950) at
check.cpp:31 #17
0x0000000001fbf76d in
FEMTown::Main::ExErrCatch<_object* (*)(_object*,
_object*, _object*)>::exec<_object>
(this=0x7ffff279dc20, s=0x0, po1=0x2aaab0765050,
po2=0x19d2e950) at
/home/qa/svntop/femtown/modules/main/py/exception.
hp p: 463

#18

0x00000000039acc82 in PyField_align_elements_ewrap

(self=0x0,

args=0x2aaab0765050, kwds=0x19d2e950) at
check.cpp:39 #19 0x00000000044093a0 in
PyEval_EvalFrameEx (f=0x19b52e90, throwflag=<value
optimized out>) at Python/ceval.c:3921 #20
0x000000000440aae9 in PyEval_EvalCodeEx
(co=0x2aaab754ad50, globals=<value optimized out>,
locals=<value optimized out>, args=0x3,
argcount=1, kws=0x19ace4a0, kwcount=2,
defs=0x2aaab75e4800, defcount=2, closure=0x0) at
Python/ceval.c:2968
#21 0x0000000004408f58 in PyEval_EvalFrameEx
(f=0x19ace2d0, throwflag=<value optimized out>) at
Python/ceval.c:3802 #22 0x000000000440aae9 in
PyEval_EvalCodeEx (co=0x2aaab7550120,
globals=<value optimized out>, locals=<value
optimized out>, args=0x7, argcount=1,
kws=0x19acc418, kwcount=3, defs=0x2aaab759e958,
defcount=6, closure=0x0) at Python/ceval.c:2968
#23 0x0000000004408f58 in PyEval_EvalFrameEx
(f=0x19acc1c0, throwflag=<value optimized out>) at
Python/ceval.c:3802 #24 0x000000000440aae9 in
PyEval_EvalCodeEx (co=0x2aaab8b5e738,
globals=<value optimized out>, locals=<value
optimized out>, args=0x6, argcount=1,
kws=0x19abd328, kwcount=5, defs=0x2aaab891b7e8,
defcount=3, closure=0x0) at Python/ceval.c:2968
#25 0x0000000004408f58 in PyEval_EvalFrameEx
(f=0x19abcea0, throwflag=<value optimized out>) at
Python/ceval.c:3802 #26 0x000000000440aae9 in
PyEval_EvalCodeEx (co=0x2aaab3eb4198,
globals=<value optimized out>, locals=<value
optimized out>, args=0xb, argcount=1,
kws=0x19a89df0, kwcount=10, defs=0x0, defcount=0,
closure=0x0) at
Python/ceval.c:2968 #27 0x0000000004408f58 in
PyEval_EvalFrameEx
(f=0x19a89c40, throwflag=<value optimized out>) at
Python/ceval.c:3802 #28 0x000000000440aae9 in
PyEval_EvalCodeEx (co=0x2aaab3eb4288,
globals=<value optimized out>, locals=<value
optimized out>, args=0x1, argcount=0,
kws=0x19a89330, kwcount=0, defs=0x2aaab8b66668,
defcount=1, closure=0x0) at Python/ceval.c:2968
#29 0x0000000004408f58 in PyEval_EvalFrameEx
(f=0x19a891b0, throwflag=<value optimized out>) at
Python/ceval.c:3802 #30 0x000000000440aae9 in
PyEval_EvalCodeEx (co=0x2aaab8b6a738,
globals=<value optimized out>, locals=<value
optimized out>, args=0x0, argcount=0, kws=0x0,
kwcount=0, defs=0x0, defcount=0, closure=0x0) at
Python/ceval.c:2968
#31 0x000000000440ac02 in PyEval_EvalCode
(co=0x1902f9b0, globals=0x0, locals=0x190d9700) at
Python/ceval.c:522 #32 0x000000000442853c in
PyRun_StringFlags (str=0x192fd3d8
"DIRECT.Actran.main()", start=<value optimized
out>, globals=0x192213d0, locals=0x192213d0,
flags=0x0) at Python/pythonrun.c:1335 #33
0x0000000004429690 in PyRun_SimpleStringFlags
(command=0x192fd3d8 "DIRECT.Actran.main()",
flags=0x0) at
Python/pythonrun.c:957 #34 0x0000000001fa1cf9 in
FEMTown::Python::FEMPy::run_application

(this=0x7ffff279f650)

at fempy.cpp:873 #35 0x000000000434ce99 in

FEMTown::Main::Batch::run

(this=0x7ffff279f650) at batch.cpp:374 #36

0x0000000001f9aa25

in main (argc=8, argv=0x7ffff279fa48) at
main.cpp:10 (gdb) f 1 #1  0x00002aedbc4e05f4 in
btl_openib_handle_incoming (openib_btl=0x1902f9b0,
ep=0x1908a1c0, frag=0x190d9700, byte_len=18) at
btl_openib_component.c:2881 2881 reg->cbfunc(
&openib_btl->super, hdr->tag, des, reg->cbdata

);

Current language: auto; currently c
(gdb)
#1  0x00002aedbc4e05f4 in
btl_openib_handle_incoming
(openib_btl=0x1902f9b0, ep=0x1908a1c0,
frag=0x190d9700, byte_len=18) at
btl_openib_component.c:2881 2881 reg->cbfunc(
&openib_btl->super, hdr->tag, des, reg->cbdata

);

(gdb) l 2876
2877        if(OPAL_LIKELY(!(is_credit_msg =
is_credit_message(frag)))) { 2878            /*
call registered callback */
2879            mca_btl_active_message_callback_t*
reg; 2880            reg =
mca_btl_base_active_message_trigger + hdr->tag;
2881 reg->cbfunc(&openib_btl->super, hdr->tag,
des, reg->cbdata ); 2882
if(MCA_BTL_OPENIB_RDMA_FRAG(frag)) { 2883
cqp

(hdr->credits>>  11)&  0x0f;
2884                hdr->credits&= 0x87ff;
2885            } else {

Regards,
Eloi

On Friday 16 July 2010 16:01:02 Eloi Gaudry wrote:

Hi Edgar,

The only difference I could observed was that the
segmentation fault appeared sometimes later
during the parallel computation.

I'm running out of idea here. I wish I could use
the "--mca

coll

tuned" with "--mca self,sm,tcp" so that I could
check that the issue is not somehow limited to
the tuned collective routines.

Thanks,
Eloi

On Thursday 15 July 2010 17:24:24 Edgar Gabriel wrote:

On 7/15/2010 10:18 AM, Eloi Gaudry wrote:

hi edgar,

thanks for the tips, I'm gonna try this option
as well.

the

segmentation fault i'm observing always
happened during a collective communication
indeed... does it basically

switch

all

collective communication to basic mode, right ?

sorry for my ignorance, but what's a NCA ?

sorry, I meant to type HCA (InifinBand
networking card)

Thanks
Edgar

thanks,
éloi

On Thursday 15 July 2010 16:20:54 Edgar Gabriel wrote:

you could try first to use the algorithms in
the basic

module,

e.g.

mpirun -np x --mca coll basic ./mytest

and see whether this makes a difference. I
used to

observe

sometimes a (similar ?) problem in the openib
btl triggered from the tuned collective
component, in cases where the ofed libraries
were installed but no NCA was found on a node.
It used to work however with the basic
component.

Thanks
Edgar

On 7/15/2010 3:08 AM, Eloi Gaudry wrote:

hi Rolf,

unfortunately, i couldn't get rid of that
annoying segmentation fault when selecting
another bcast algorithm. i'm now going to
replace MPI_Bcast with a naive
implementation (using MPI_Send and MPI_Recv)
and see if

that

helps.

regards,
éloi

On Wednesday 14 July 2010 10:59:53 Eloi Gaudry wrote:

Hi Rolf,

thanks for your input. You're right, I miss
the coll_tuned_use_dynamic_rules option.

I'll check if I the segmentation fault
disappears when

using

the basic bcast linear algorithm using the
proper command line you provided.

Regards,
Eloi

On Tuesday 13 July 2010 20:39:59 Rolf
vandeVaart

wrote:

Hi Eloi:
To select the different bcast algorithms,
you need to add an extra mca parameter
that tells the library to use dynamic
selection. --mca
coll_tuned_use_dynamic_rules 1

One way to make sure you are typing this in
correctly is

to

use it with ompi_info.  Do the following:
ompi_info -mca coll_tuned_use_dynamic_rules
1 --param

coll

You should see lots of output with all the
different algorithms that can be selected
for the various collectives. Therefore,
you need this:

--mca coll_tuned_use_dynamic_rules 1 --mca
coll_tuned_bcast_algorithm 1

Rolf

On 07/13/10 11:28, Eloi Gaudry wrote:

Hi,

I've found that "--mca
coll_tuned_bcast_algorithm 1" allowed to
switch to the basic linear algorithm.
Anyway whatever the algorithm used, the
segmentation fault remains.

Does anyone could give some advice on ways
to

diagnose

the

issue I'm facing ?

Regards,
Eloi

On Monday 12 July 2010 10:53:58 Eloi Gaudry wrote:

Hi,

I'm focusing on the MPI_Bcast routine
that seems to randomly segfault when
using the openib btl. I'd

like

to

know if there is any way to make OpenMPI
switch to

different algorithm than the default one
being selected for MPI_Bcast.

Thanks for your help,
Eloi

On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote:

Hi,

I'm observing a random segmentation
fault during

an

internode parallel computation involving
the

openib

btl

and OpenMPI-1.4.2 (the same issue can be
observed with OpenMPI-1.3.3).

   mpirun (Open MPI) 1.4.2
   Report bugs to
   http://www.open-mpi.org/community/hel
   p/ [pbn08:02624] *** Process received
   signal *** [pbn08:02624] Signal:
   Segmentation fault (11)
   [pbn08:02624] Signal code: Address
   not mapped

(1)

   [pbn08:02624] Failing at address:
   (nil) [pbn08:02624] [ 0]
   /lib64/libpthread.so.0 [0x349540e4c0]
   [pbn08:02624] *** End of error

message

   ***
   sh: line 1:  2624 Segmentation fault

\/share\/hpc3\/actran_suite\/Actran_11\.0\.rc2\.41872\/R

ed Ha tE L\ -5 \/ x 86 _6 4\
/bin\/actranpy_mp

'--apl=/share/hpc3/actran_suite/Actran_11.0.rc2.41872/Re

dH at EL -5 /x 86 _ 64 /A c
tran_11.0.rc2.41872'

'--inputfile=/work/st25652/LSF_130073_0_47696_0/Case1_3D

re al _m 4_ n2 .d a t'

'--scratch=/scratch/st25652/LSF_130073_0_47696_0/scratch

' '--mem=3200' '--threads=1'
'--errorlevel=FATAL' '--t_max=0.1'
'--parallel=domain'

If I choose not to use the openib btl
(by using --mca btl self,sm,tcp on the
command line, for instance), I don't
encounter any problem and the parallel
computation runs flawlessly.

I would like to get some help to be
able: - to diagnose the issue I'm
facing with the openib btl - understand
why this issue is observed only when

using

the openib btl and not when using
self,sm,tcp

Any help would be very much appreciated.

The outputs of ompi_info and the
configure scripts of OpenMPI are
enclosed to this email, and some

information

on the infiniband drivers as well.

Here is the command line used when
launching a

parallel

computation

using infiniband:
   path_to_openmpi/bin/mpirun -np
   $NPROCESS --hostfile host.list --mca

btl openib,sm,self,tcp  --display-map
--verbose --version --mca
mpi_warn_on_fork 0 --mca
btl_openib_want_fork_support 0 [...]

and the command line used if not using infiniband:
   path_to_openmpi/bin/mpirun -np
   $NPROCESS --hostfile host.list --mca

btl self,sm,tcp  --display-map --verbose
--version

--mca

mpi_warn_on_fork 0 --mca
btl_openib_want_fork_support

[...]

Thanks,
Eloi

__________________________________________
__ __ _

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>

Re: [OMPI users] [openib] segfault when using openib btl

Reply via email to