Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-10 Thread Fischer, Greg A.
onday, June 09, 2014 8:24 PM To: Open MPI Users Subject: Re: [OMPI users] intermittent segfaults with openib on ring_c.c I'm digging out from mail backlog from being at the MPI Forum last week... Yes, from looking at the stack traces, it's segv'ing inside the memory allocator, which

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-09 Thread Jeff Squyres (jsquyres)
---Original Message- >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain >> Sent: Wednesday, June 04, 2014 4:48 PM >> To: Open MPI Users >> Subject: Re: [OMPI users] intermittent segfaults with openib on ring_c.c >> >> Urggg...

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Wednesday, June 04, 2014 4:48 PM > To: Open MPI Users > Subject: Re: [OMPI users] intermittent segfaults with openib on ring_c.c > > Urggg...unfortunately, the people who know the most about that code

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Fischer, Greg A.
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Wednesday, June 04, 2014 4:48 PM To: Open MPI Users Subject: Re: [OMPI users] intermittent segfaults with openib on ring_c.c Urggg...unfortunately, the people who know the most about that code are all at the MPI

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
t; #13 0x in ?? () > > Greg > > -----Original Message- > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Wednesday, June 04, 2014 3:49 PM > To: Open MPI Users > Subject: Re: [OMPI users] intermittent segfaults with

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Fischer, Greg A.
t;>> (db_restrict_local=32 ' ') at >>>> ../../../../openmpi-1.8.1/orte/mca/ess/base/ess_base_std_app.c:245 >>>> #10 0x2b48f45b069f in rte_init () at >>>> ../../../../../openmpi-1.8.1/orte/mca/ess/env/ess_env_module.c:146 >>>> #11 0

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
gt;>>> #10 0x2b48f45b069f in rte_init () at >>>> ../../../../../openmpi-1.8.1/orte/mca/ess/env/ess_env_module.c:146 >>>> #11 0x2b48f26935ab in orte_init (pargc=0x2b48f6300020, >>>> pargv=0x2b48f63000b8, flags=8) at >>>> ../../openmp

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
;>> #12 0x2b48f1739d38 in ompi_mpi_init (argc=1, argv=0x7fffebf0d1f8, >>> requested=8, provided=0x0) at >>> ../../openmpi-1.8.1/ompi/runtime/ompi_mpi_init.c:464 >>> #13 0x2b48f1760a37 in PMPI_Init (argc=0x2b48f6300020, >>> argv=0x2b48f63000b8) at pi

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Gus Correa
/../../openmpi-1.8.1/opal/mca/memory/linux/malloc.c:4098 #1 0x in ?? () Is that helpful? Greg *From:*Fischer, Greg A. *Sent:*Wednesday, June 04, 2014 10:17 AM *To:*'Open MPI Users' *Cc:*Fischer, Greg A. *Subject:*RE: [OMPI users] intermittent segfaults with openib on ring_c.c

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
i-1.8.1/ompi/runtime/ompi_mpi_init.c:464 > #13 0x2b48f1760a37 in PMPI_Init (argc=0x2b48f6300020, > argv=0x2b48f63000b8) at pinit.c:84 > #14 0x004024ef in main (argc=1, argv=0x7fffebf0d1f8) at ring_c.c:19 > > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph C

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Fischer, Greg A.
24382136) at ../../../../../openmpi-1.8.1/opal/mca/memory/linux/malloc.c:4098 #1 0x in ?? () Is that helpful? Greg From: Fischer, Greg A. Sent: Wednesday, June 04, 2014 10:17 AM To: 'Open MPI Users' Cc: Fischer, Greg A. Subject: RE: [OMPI users] intermittent segfaults

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Ralph Castain
-1.8.1/opal/mca/memory/linux/malloc.c:4098 > #1 0x in ?? () > > Is that helpful? > > Greg > > From: Fischer, Greg A. > Sent: Wednesday, June 04, 2014 10:17 AM > To: 'Open MPI Users' > Cc: Fischer, Greg A. > Subject: RE: [OMPI users] int

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Fischer, Greg A.
eg From: Fischer, Greg A. Sent: Wednesday, June 04, 2014 10:17 AM To: 'Open MPI Users' Cc: Fischer, Greg A. Subject: RE: [OMPI users] intermittent segfaults with openib on ring_c.c I recompiled with "-enable-debug" but it doesn't seem to be providing any more informat

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-04 Thread Fischer, Greg A.
.@open-mpi.org] On Behalf Of Ralph Castain Sent: Tuesday, June 03, 2014 11:54 PM To: Open MPI Users Subject: Re: [OMPI users] intermittent segfaults with openib on ring_c.c Sounds odd - can you configure OMPI --enable-debug and run it again? If it fails and you can get a core dump, could you tell

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-03 Thread Ralph Castain
Sounds odd - can you configure OMPI --enable-debug and run it again? If it fails and you can get a core dump, could you tell us the line number where it is failing? On Jun 3, 2014, at 9:58 AM, Fischer, Greg A. wrote: > Apologies – I forgot to add some of the information requested by the FAQ:

Re: [OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-03 Thread Fischer, Greg A.
Apologies - I forgot to add some of the information requested by the FAQ: 1. OpenFabrics is provided by the Linux distribution: [binf102:fischega] $ rpm -qa | grep ofed ofed-kmp-default-1.5.4.1_3.0.76_0.11-0.11.5 ofed-1.5.4.1-0.11.5 ofed-doc-1.5.4.1-0.11.5 2. Linux Distro / Kernel:

[OMPI users] intermittent segfaults with openib on ring_c.c

2014-06-03 Thread Fischer, Greg A.
Hello openmpi-users, I'm running into a perplexing problem on a new system, whereby I'm experiencing intermittent segmentation faults when I run the ring_c.c example and use the openib BTL. See an example below. Approximately 50% of the time it provides the expected output, but the other 50% of