!
_
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gus Correa
Sent: Wednesday, June 11, 2014 7:13 PM
To: Open MPI Users
Subject: Re: [OMPI users] openib segfaults with Torque
If that could help Greg,
on the compute nodes I normally add this to /etc/security/limits.conf
re any other work around that I might try? Something that
avoids UDCM?
>
> -Original Message-
> From: Fischer, Greg A.
> Sent: Tuesday, June 10, 2014 2:59 PM
> To: Nathan Hjelm
> Cc: Open MPI Users; Fischer, Greg A.
> Subject: RE: [
PM, Jeff Squyres (jsquyres)
> >> ><jsquy...@cisco.com> wrote:
> >> >
> >> > Mellanox --
> >> >
> >> > What would cause a CQ to fail to be created?
> >> >
> >> > On Jun 1
PM, "Fischer, Greg A."
>> > <fisch...@westinghouse.com> wrote:
>> >
>> > > Is there any other work around that I might try? Something that
>> > avoids UDCM?
>> > >
>> > > -Original Message---
4, at 3:42 PM, "Fischer, Greg A."
> > <fisch...@westinghouse.com> wrote:
> >
> > > Is there any other work around that I might try? Something that
> > avoids UDCM?
> > >
> > > -Original Message-
> >
ng that
> > avoids UDCM?
> > >
> > > -Original Message-
> > > From: Fischer, Greg A.
> > > Sent: Tuesday, June 10, 2014 2:59 PM
> > > To: Nathan Hjelm
> > > Cc: Open MPI Users; Fischer, Greg A.
&g
eg
> >
> > -Original Message-
> > From: Nathan Hjelm [mailto:hje...@lanl.gov]
> > Sent: Tuesday, June 10, 2014 2:58 PM
> > To: Fischer, Greg A.
> > Cc: Open MPI Users
> > Subject: Re: [OMPI users] openib segfa
er work around that I might try? Something that avoids
> UDCM?
> >
> > -Original Message-
> > From: Fischer, Greg A.
> > Sent: Tuesday, June 10, 2014 2:59 PM
> > To: Nathan Hjelm
> > Cc: Open MPI Users; Fischer, Greg A.
> > Subject: RE:
her, Greg A.
> Sent: Tuesday, June 10, 2014 2:59 PM
> To: Nathan Hjelm
> Cc: Open MPI Users; Fischer, Greg A.
> Subject: RE: [OMPI users] openib segfaults with Torque
>
> [binf316:fischega] $ ulimit -m
> unlimited
>
> Greg
>
> -Original Message-
> From
Is there any other work around that I might try? Something that avoids UDCM?
-Original Message-
From: Fischer, Greg A.
Sent: Tuesday, June 10, 2014 2:59 PM
To: Nathan Hjelm
Cc: Open MPI Users; Fischer, Greg A.
Subject: RE: [OMPI users] openib segfaults with Torque
[binf316:fischega
[binf316:fischega] $ ulimit -m
unlimited
Greg
-Original Message-
From: Nathan Hjelm [mailto:hje...@lanl.gov]
Sent: Tuesday, June 10, 2014 2:58 PM
To: Fischer, Greg A.
Cc: Open MPI Users
Subject: Re: [OMPI users] openib segfaults with Torque
Out of curiosity what is the mlock limit
; Cc: Open MPI Users
> Subject: Re: [OMPI users] openib segfaults with Torque
>
>
> Well, thats interesting. The output shows that ibv_create_cq is failing.
> Strange since an identical call had just succeeded (udcm creates two
> completion queues). Some questions that might i
: Open MPI Users
Subject: Re: [OMPI users] openib segfaults with Torque
Well, thats interesting. The output shows that ibv_create_cq is failing.
Strange since an identical call had just succeeded (udcm creates two completion
queues). Some questions that might indicate where the failure might
m: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres
> (jsquyres)
> Sent: Tuesday, June 10, 2014 10:31 AM
> To: Nathan Hjelm
> Cc: Open MPI Users
> Subject: Re: [OMPI users] openib segfaults with Torque
>
> Greg:
>
> Can you run with "--mca btl
me know if I can provide anything else.
Thanks for looking into this,
Greg
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres
(jsquyres)
Sent: Tuesday, June 10, 2014 10:31 AM
To: Nathan Hjelm
Cc: Open MPI Users
Subject: Re: [OMPI users] openib segf
Greg:
Can you run with "--mca btl_base_verbose 100" on your debug build so that we
can get some additional output to see why UDCM is failing to setup properly?
On Jun 10, 2014, at 10:25 AM, Nathan Hjelm wrote:
> On Tue, Jun 10, 2014 at 12:10:28AM +, Jeff Squyres
On Tue, Jun 10, 2014 at 12:10:28AM +, Jeff Squyres (jsquyres) wrote:
> I seem to recall that you have an IB-based cluster, right?
>
> From a *very quick* glance at the code, it looks like this might be a simple
> incorrect-finalization issue. That is:
>
> - you run the job on a single
I seem to recall that you have an IB-based cluster, right?
>From a *very quick* glance at the code, it looks like this might be a simple
>incorrect-finalization issue. That is:
- you run the job on a single server
- openib disqualifies itself because you're running on a single server
- openib
Process 0 decremented value: 0
> Process 0 exiting
> Process 1 exiting
>
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Friday, June 06, 2014 10:34 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] openib segfaults with Torque
>
Of Ralph Castain
Sent: Friday, June 06, 2014 10:34 AM
To: Open MPI Users
Subject: Re: [OMPI users] openib segfaults with Torque
Huh - how strange. I can't imagine what it has to do with Torque vs rsh - this
is failing when the openib BTL is trying to create the connection, which comes
way after
64/libc.so.6(__libc_start_main+0xe6)[0x7f3b58301c36]
> [binf316:21583] [17] ring_c[0x400889]
> [binf316:21583] *** End of error message ***
> --
> mpirun noticed that process rank 0 with PID 21583 on node 316 exited on
> signal 6 (Aborte
rs-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, June 05, 2014 7:57 PM
To: Open MPI Users
Subject: Re: [OMPI users] openib segfaults with Torque
Hmmm...I'm not sure how that is going to run with only one proc (I don't know
if the program is protected against that scenario). If you
Hmmm...I'm not sure how that is going to run with only one proc (I don't know
if the program is protected against that scenario). If you run with -np 2 -mca
btl openib,sm,self, is it happy?
On Jun 5, 2014, at 2:16 PM, Fischer, Greg A. wrote:
> Here’s the command
Here's the command I'm invoking and the terminal output. (Some of this
information doesn't appear to be captured in the backtrace.)
[binf316:fischega] $ mpirun -np 1 -mca btl openib,self ring_c
ring_c:
../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:734:
OpenMPI Users,
After encountering difficulty with the Intel compilers (see the "intermittent
segfaults with openib on ring_c.c" thread), I installed GCC-4.8.3 and
recompiled OpenMPI. I ran the simple examples (ring, etc.) with the openib BTL
in a typical BASH environment. Everything appeared
25 matches
Mail list logo