Re: [OMPI devel] NP64 _gather_ problem

2010-09-20 Thread Steve Wise
iwarp connection mode. Once I fixed this, the gather operation for > NP60 behaves much much better... Thanks Terry for helping. Steve. On 09/17/2010 03:46 PM, Steve Wise wrote: I'll look into Solaris Studio. I think somehow the connections are getting single threaded or somehow funnel

[OMPI devel] openib btl and cq overflows

2012-07-02 Thread Steve Wise
Hello, I'm debugging an issue with openmpi-1.4.5 and the openib btl over chelsio iwarp devices. I am the iwarp driver developer for this device. I have debug code that detects cq overflows, and I'm seeing rcq overflows during finalize for certain IMB runs with ompi.So as the recv wrs a

Re: [OMPI devel] openib btl and cq overflows

2012-07-02 Thread Steve Wise
someone point Steve to the right place to look in the openib BTL? On Jul 2, 2012, at 11:24 AM, Steve Wise wrote: Hello, I'm debugging an issue with openmpi-1.4.5 and the openib btl over chelsio iwarp devices. I am the iwarp driver developer for this device. I have debug code that

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26707 - in trunk/ompi: config mca/btl/ofud mca/btl/openib mca/common/ofacm mca/common/ofautils mca/dpm

2012-07-02 Thread Steve Wise
On 7/2/2012 4:14 PM, Jeff Squyres wrote: Steve -- Can you extend this new stuff to support RDMACM, including the warp-needed connector-sends-first stuff? I have no time right now. I could test something perhaps if someone can do the initial pull of the rdma cpc code into the ofacm... I

[OMPI devel] openib unloaded before last mem dereg

2013-01-25 Thread Steve Wise
Hello, I'm tracking an issue I see in openmpi-1.6.3. Running this command on my chelsio iwarp/rdma setup causes a seg fault every time: /usr/mpi/gcc/openmpi-1.6.3-dbg/bin/mpirun --np 2 --host hpc-hn1,hpc-cn2 --mca btl openib,sm,self --mca btl_openib_ipaddr_include "192.168.170.0/24" /usr/mp

Re: [OMPI devel] openib unloaded before last mem dereg

2013-01-28 Thread Steve Wise
On 1/25/2013 12:19 PM, Steve Wise wrote: Hello, I'm tracking an issue I see in openmpi-1.6.3. Running this command on my chelsio iwarp/rdma setup causes a seg fault every time: /usr/mpi/gcc/openmpi-1.6.3-dbg/bin/mpirun --np 2 --host hpc-hn1,hpc-cn2 --mca btl openib,sm,self

Re: [OMPI devel] openib unloaded before last mem dereg

2013-01-28 Thread Steve Wise
On 1/28/2013 11:48 AM, Ralph Castain wrote: On Jan 28, 2013, at 9:12 AM, Steve Wise wrote: On 1/25/2013 12:19 PM, Steve Wise wrote: Hello, I'm tracking an issue I see in openmpi-1.6.3. Running this command on my chelsio iwarp/rdma setup causes a seg fault every time: /usr/mpi/gcc/op

Re: [OMPI devel] openib unloaded before last mem dereg

2013-01-28 Thread Steve Wise
port 2... Steve. On Jan 28, 2013, at 10:03 AM, Steve Wise wrote: On 1/28/2013 11:48 AM, Ralph Castain wrote: On Jan 28, 2013, at 9:12 AM, Steve Wise wrote: On 1/25/2013 12:19 PM, Steve Wise wrote: Hello, I'm tracking an issue I see in openmpi-1.6.3. Running this command on my che

Re: [OMPI devel] openib unloaded before last mem dereg

2013-01-28 Thread Steve Wise
ing at all, and it ran to completion without problem. I suspect the problem is that the system I can use just isn't configured like yours, and so I can't trigger the problem. Afraid I can't be of help after all... :-( On Jan 28, 2013, at 11:25 AM, Steve Wise wrote: On 1/28/201

Re: [OMPI devel] openib unloaded before last mem dereg

2013-01-28 Thread Steve Wise
On 1/28/2013 2:04 PM, Ralph Castain wrote: On Jan 28, 2013, at 11:55 AM, Steve Wise wrote: Do you know if the rdmacm CPC is really being used for your connection setup (vs other CPCs supported by IB)? Cuz iwarp only supports rdmacm. Maybe that's the difference? Dunno for certain,

Re: [OMPI devel] openib unloaded before last mem dereg

2013-01-28 Thread Steve Wise
On 1/28/2013 7:32 PM, Ralph Castain wrote: Out of curiosity, could you tell us how you configured OMPI? ./configure --enable-debug --enable-mpirun-prefix-by-default --prefix=/usr/mpi/gcc/openmpi-1.6.4rc2-dbg On Jan 28, 2013, at 12:46 PM, Steve Wise wrote: On 1/28/2013 2:04 PM, Ralph

Re: [OMPI devel] openib unloaded before last mem dereg

2013-01-29 Thread Steve Wise
b component init, destroy it. -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Monday, January 28, 2013 8:35 PM To: Steve Wise Cc: Open MPI Developers Subject: Re: [OMPI devel] openib unloaded before last mem

Re: [OMPI devel] openib unloaded before last mem dereg

2013-01-29 Thread Steve Wise
nal Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) Sent: Tuesday, January 29, 2013 10:05 AM To: Steve Wise Cc: Open MPI Developers Subject: Re: [OMPI devel] openib unloaded before last mem dereg It's on the ticket tha

Re: [OMPI devel] openib unloaded before last mem dereg

2013-01-29 Thread Steve Wise
Tests good on 1.6.3 too. Thanks Josh! On 1/29/2013 9:17 AM, Steve Wise wrote: I applied it to 1.6.4rc2 and it fixed the seg fault issue. Lemme try 1.6.3 too. On 1/29/2013 9:11 AM, Joshua Ladd wrote: It should apply cleanly to 1.6.3 branch, I tested it this morning. From top level OMPI

Re: [OMPI devel] Annual OMPI membership review: SVN accounts

2013-07-15 Thread Steve Wise
Yeoh bbenton: Brad Benton tonyb:Tony Breeds **NO COMMITS IN LAST YEAR** swise:Steve Wise On Jul 8, 2013, at 6:32 PM, Jeff Squyres (jsquyres) wrote: According to https://svn.open-mpi.org/trac/ompi/wiki/Admistrative%20rules, it is time for our annual review of Open MPI SVN accoun

[OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-19 Thread Steve Wise
Hello, I just tried to run openmpi-1.7.2 over chelsio's IWARP device, and it no longer works. It appears that 1.7.2 fails to use the RDMACM CPC. I guess it is trying to use OOB, which is IB-specific. If I explicitly specify the RDMACM CPC via '--mca btl_openib_cpc_include rdmacm' then it wor

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-19 Thread Steve Wise
> -Original Message- > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > Sent: Monday, August 19, 2013 12:06 PM > To: Steve Wise > Cc: > Subject: Re: openmpi-1.7.2 fails to use the RDMACM CPC > > Not offhand. > > Given the lack of iWARP test

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-19 Thread Steve Wise
I confirmed that this is a regression from 1.7.1... I'll see if I can figure out what's going on... > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise > Sent: Monday, August 19, 2013 12:15 PM > To: 'Jeff Squyres (

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-19 Thread Steve Wise
> -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise > Sent: Monday, August 19, 2013 2:42 PM > To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)' > Cc: 'Indranil Choudhury' > Subject: Re: [OMPI

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-19 Thread Steve Wise
> -Original Message- > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > Sent: Monday, August 19, 2013 3:23 PM > To: Steve Wise > Cc: Open MPI Developers; Indranil Choudhury > Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > No

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-19 Thread Steve Wise
> -Original Message- > From: Steve Wise [mailto:sw...@opengridcomputing.com] > Sent: Monday, August 19, 2013 3:25 PM > To: 'Jeff Squyres (jsquyres)' > Cc: 'Open MPI Developers'; 'Indranil Choudhury' > Subject: RE: [OMPI

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-19 Thread Steve Wise
;t use "#ifdef OMPI_HAVE_RDMAOE",use "#if defined(HAVE_IBV_LINK_LAYER_ETHERNET)" * Update the following to include/link against common/verbs * bcol/iboffload * sbgp/ibnet * btl/openib > > > > > > > On Aug 19, 2013, at 4:17 PM, Steve Wise &

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-19 Thread Steve Wise
This patch fixes iwarp. dunno if it breaks RoCE though :) [root@r9 ompi-trunk]# svn diff Index: ompi/mca/btl/openib/connect/btl_openib_connect_oob.c === --- ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(revision 290

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-19 Thread Steve Wise
field in the ibv_port_attr structure. > -Original Message- > From: Steve Wise [mailto:sw...@opengridcomputing.com] > Sent: Monday, August 19, 2013 3:53 PM > To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)' > Cc: 'Indranil Choudhury' > Sub

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-19 Thread Steve Wise
> -Original Message- > From: Steve Wise [mailto:sw...@opengridcomputing.com] > Sent: Monday, August 19, 2013 4:02 PM > To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)' > Cc: 'Indranil Choudhury' > Subject: RE: [OMPI devel] openmp

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-20 Thread Steve Wise
> Thanks for finding r27212. It was about a year ago, and had clearly fallen > out of my cache (I have very > little to do with the openib BTL these days). > > Your solution isn't correct, because HAVE_IBV_LINK_LAYER_ETHERNET is defined > (nor not) via this m4 > macro in config/ompi_check_ope

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-20 Thread Steve Wise
> -Original Message- > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > Sent: Tuesday, August 20, 2013 8:59 AM > To: Steve Wise > Cc: Open MPI Developers; Indranil Choudhury > Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > On

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-20 Thread Steve Wise
RNET 1 Note the #define is HAVE_DECL_IBV_LINK_LAYER_ETHERNET but the code is checking for HAVE_IBV_LINK_LAYER_ETHERNET! No _DECL_... > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise > Sent: Tuesday, August 20, 2013 9:07 AM > To: 'Jeff Sq

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-20 Thread Steve Wise
THERNET) +#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) else if (flags & OMPI_COMMON_VERBS_FLAGS_LINK_LAYER_IB) { if (IBV_LINK_LAYER_INFINIBAND == port_attr.link_layer) { want = true; > -Original Message- > From: devel [mailto:devel-boun...@

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-20 Thread Steve Wise
> -Original Message- > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > Sent: Tuesday, August 20, 2013 11:07 AM > To: Steve Wise > Cc: Open MPI Developers; Indranil Choudhury > Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC > > I t

Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC

2013-08-20 Thread Steve Wise
> > Don't forget that Chelsio is still on the hook for adding iWARP support into ompi/mca/common/ofacm, > however. :-) > You won't let me forget. ;) I will do it. > Specifically: At some point iWARP support will break because we'll be removing > ompi/mca/btl/openib/cpc and exclusively using om

Re: [OMPI devel] [OMPI svn] svn:open-mpi r29060 - in branches/v1.7: . ompi/mca/btl ompi/mca/btl/openib ompi/mca/btl/openib/connect ompi/mca/common/verbs

2013-08-23 Thread Steve Wise
Why is the 1.7 changeset different from the trunk changeset? Specifically, #if defined(HAVE_IBV_LINK_LAYER_ETHERENET) Is changed to #if HAVE_DECL_IBV_LINK_LAYER_ETHERNET Instead of #if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET) > -Original Message- > From: svn [mailto:svn-boun...

Re: [OMPI devel] [OMPI svn] svn:open-mpi r29060 - in branches/v1.7: . ompi/mca/btl ompi/mca/btl/openib ompi/mca/btl/openib/connect ompi/mca/common/verbs

2013-08-26 Thread Steve Wise
to be 0 or 1 (vs. #define'ing or > >> #undef'ing it). So don't check for "#if defined(..."; just check for > >> "#if ...". > > > On Aug 23, 2013, at 8:10 AM, "Steve Wise" wrote: > > > Why is the 1.7 c

Re: [OMPI devel] What to do about openib/ofacm/cpc

2013-11-15 Thread Steve Wise
On 11/14/2013 12:16 PM, Jeff Squyres (jsquyres) wrote: On Nov 14, 2013, at 1:03 PM, Ralph Castain wrote: 1) What the status of UDCM is (does it work reliably, does it support XRC, etc.) Seems to be working okay on the IB systems at LANL and IU. Don't know about XRC - I seem to recall the ans

Re: [OMPI devel] What to do about openib/ofacm/cpc

2013-11-15 Thread Steve Wise
On 11/15/2013 5:12 PM, Ralph Castain wrote: Perhaps if Pasha or somebody else proficient in the OMPI code could help out, then the iWARP CPC could be moved. W/O help from OMPI developers, its going to take me a very long time... I believe we would all be willing to provide advice - we just ha

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-21 Thread Steve Wise
On 11/14/2013 3:12 PM, Shamis, Pavel wrote: Comments inline. 3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm in that move. Never changed openib to use ofacm/common. Pasha: This is not entirely true. I changed openib btl ~3 year ago before my departure from Mellano

[OMPI devel] Seg fault running OpenMPI-1.3.1rc4

2009-03-29 Thread Steve Wise
Hey Jeff, Have you seen this? I'm hitting this regularly running on ofed-1.4.1-rc2. Test: [ompi@vic12 ~]$ cat doit-ompi #!/bin/sh while : ; do mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g --mca btl openib,self,sm --mca btl_openib_max_btls 1 /usr/mpi/gcc/openmpi-1.3.1rc4

Re: [OMPI devel] [ewg] Seg fault running OpenMPI-1.3.1rc4

2009-03-29 Thread Steve Wise
When this happens, that node logs this type of message also in /var/log/messages: IMB-MPI1[8859]: segfault at 0018 rip 2b7bfc880800 rsp 7fffb1021330 error 4 Steve Wise wrote: Hey Jeff, Have you seen this? I'm hitting this regularly running on ofed-1.4.1-rc2.

Re: [OMPI devel] ***SPAM*** Re: [ewg] Seg fault running OpenMPI-1.3.1rc4

2009-03-30 Thread Steve Wise
the ompi-trunk. Pasha. Steve Wise wrote: When this happens, that node logs this type of message also in /var/log/messages: IMB-MPI1[8859]: segfault at 0018 rip 2b7bfc880800 rsp 7fffb1021330 error 4 Steve Wise wrote: Hey Jeff, Have you seen this? I'm hitting

Re: [OMPI devel] ***SPAM*** Re: [ewg] Seg fault running OpenMPI-1.3.1rc4

2009-03-30 Thread Steve Wise
igure_options CFLAGS=-g' Steve. Pavel Shamis (Pasha) wrote: Steve, If you will compile OMPI code with CFLAGS="-g" ,generate segfault core_file and send the core + IMB-MPI1 to me I will be able to understand the problem better. Regards, Pasha Steve Wise wrote: Hey Pasha,

[OMPI devel] delivering SIGUSR2 to an ompi process

2010-08-25 Thread Steve Wise
Hey Open MPI wizards, I'm trying to debug something in my library that gets loaded into my mpi processes when they are started via mpirun. With other MPIs, I've been able to deliver SIGUSR2 to the process and trigger some debug code I have in my library that sets up a handler for SIGUSR2. Ho

Re: [OMPI devel] delivering SIGUSR2 to an ompi process

2010-08-25 Thread Steve Wise
On 08/25/2010 11:33 AM, Ralph Castain wrote: We don't use it - mpirun traps it and then propagates it by default to all remote procs. So I should send the signal to the mpirun process? What OMPI version is this? 1.4.1 On Aug 25, 2010, at 10:23 AM, Steve Wise wrote:

Re: [OMPI devel] delivering SIGUSR2 to an ompi process

2010-08-25 Thread Steve Wise
On 08/25/2010 12:43 PM, Ralph Castain wrote: On Aug 25, 2010, at 11:26 AM, Steve Wise wrote: On 08/25/2010 11:33 AM, Ralph Castain wrote: We don't use it - mpirun traps it and then propagates it by default to all remote procs. So I should send the signal to the m

[OMPI devel] openib btl - fatal errors don't abort the job

2010-09-01 Thread Steve Wise
I was wondering what the logic is behind allowing an MPI job to continue in the presence of a fatal qp error? Note the "will try to continue" sentence: -- The OpenFabrics stack has reported a network error event. Open MPI

[OMPI devel] NP64 barrier problem

2010-09-16 Thread Steve Wise
Hi, I'm debugging a performance problem with running IMB-MP1/barrier in an NP64 cluster (8 nodes, 8 cores each). I'm using openmpi-1.4.1 from the OFED-1.5.1 distribution. The BTL is openib/iWARP via Chelsio's T3 RNIC. In short, a NP60 and smaller run completes in a timely manner as expect

Re: [OMPI devel] NP64 barrier problem

2010-09-16 Thread Steve Wise
Oops. One key typo here: This is the IMB-MPI1 gather test, not barrier. :( On 9/16/2010 12:05 PM, Steve Wise wrote: Hi, I'm debugging a performance problem with running IMB-MP1/barrier in an NP64 cluster (8 nodes, 8 cores each). I'm using openmpi-1.4.1 from the OFED-1.5.1 di

Re: [OMPI devel] NP64 _gather_ problem

2010-09-16 Thread Steve Wise
t delays... On 9/16/2010 1:01 PM, Steve Wise wrote: Oops. One key typo here: This is the IMB-MPI1 gather test, not barrier. :( On 9/16/2010 12:05 PM, Steve Wise wrote: Hi, I'm debugging a performance problem with running IMB-MP1/barrier in an NP64 cluster (8 nodes, 8 cores each

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Steve Wise
t all. This might be able to help determine if it is the actually connection set up between processes that are out of sync as oppose to something in the actual gather algorithm. --td Steve Wise wrote: Here's a clue: ompi_coll_tuned_gather_intra_dec_fixed() changes its algorithm for job

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Steve Wise
Does anyone have a NP64 IB cluster handy? I'd be interested if IB behaves this way when running with the rdmacm connect method. IE with: --mca btl_openib_cpc_include rdmacm --mca btl openib,sm,self Steve. On 9/17/2010 10:41 AM, Steve Wise wrote: Yes it does. With mpi_preconnect_m

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Steve Wise
actually has a Linux version). --td Steve Wise wrote: Yes it does. With mpi_preconnect_mpi to 1, NP64 doesn't stall. So its not the algorithm in and of itself, but rather some interplay between the algorithm and connection setup I guess. On 9/17/2010 5:24 AM, Terry Dontje wrote

Re: [OMPI devel] [ofa-general] OpenMPI and RDMA-CM

2007-05-07 Thread Steve Wise
pich, not ompi, so I need to go do some homework. But any pointers on the connection setup design for ompi would be great. I'm CCing de...@openmpi.org in case anyone else is interested in helping. Chelsio can provide rnic HW... Thanks, Steve. > > > On Apr 28, 2007, at 4

Re: [OMPI devel] [ofa-general] OpenMPI and RDMA-CM

2007-05-07 Thread Steve Wise
Also, there appears to be a DAPL BTL in OMPI. Is this BTL complete and enabled for the ofed-1.2 udapl library? Steve. On Mon, 2007-05-07 at 17:09 -0500, Steve Wise wrote: > On Sat, 2007-04-28 at 16:20 -0400, Jeff Squyres wrote: > > You'd probably be better asking this questi

Re: [OMPI devel] [ofa-general] OpenMPI and RDMA-CM

2007-05-08 Thread Steve Wise
On Mon, 2007-05-07 at 20:39 -0400, Jeff Squyres wrote: > On May 7, 2007, at 6:52 PM, Steve Wise wrote: > > > Also, there appears to be a DAPL BTL in OMPI. Is this BTL complete > > and > > enabled for the ofed-1.2 udapl library? > > Yes, it is complete and is well

Re: [OMPI devel] [ofa-general] OpenMPI and RDMA-CM

2007-05-08 Thread Steve Wise
so, I don't know what > their timeframe will be (and it may depend on the severity of the > problem). > Well I've tried OMPI on ofed-1.2 udapl today and it doesn't work. I'm debugging now. > > On May 8, 2007, at 9:47 AM, Steve Wise wrote: > >

[OMPI devel] OMPI over OFA udapl (was Re: [ofa-general] OpenMPI and RDMA-CM)

2007-05-08 Thread Steve Wise
> > Well I've tried OMPI on ofed-1.2 udapl today and it doesn't work. I'm > debugging now. > Here's part of the problem (from ompi/btl/udapl/btl_udapl.c): /* TODO - big bad evil hack! */ /* uDAPL doesn't ever seem to keep track of ports with addresses. This becomes a problem w

Re: [OMPI devel] [ofa-general] OpenMPI and RDMA-CM

2007-05-08 Thread Steve Wise
> >> Chelsio's gonna pony up the resources to get this work done asap. Do > >> you have any thoughts on how we can collaborate on this project? I'm > >> familiar with mvapich, not ompi, so I need to go do some homework. > >> But > >> any pointers on the connection setup design for ompi would b

Re: [OMPI devel] OMPI over OFA udapl (was Re: [ofa-general] OpenMPI and RDMA-CM)

2007-05-08 Thread Steve Wise
On Tue, 2007-05-08 at 13:57 -0400, Andrew Friedley wrote: > Steve Wise wrote: > >> Well I've tried OMPI on ofed-1.2 udapl today and it doesn't work. I'm > >> debugging now. > >> > > > > Here's part of the problem (from ompi/bt

Re: [OMPI devel] OMPI over OFA udapl (was Re: [ofa-general] OpenMPI and RDMA-CM)

2007-05-08 Thread Steve Wise
On Tue, 2007-05-08 at 12:55 -0700, Arlin Davis wrote: > Steve Wise wrote: > > >1) OMPI shouldn't be stepping on the ia_address. > > > > > stongly agree > > >2) OFA udapl should probably be explicitly binding local cm_ids to port > >zero. >

[OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
, Arlin Davis wrote: > Steve Wise wrote: > > >I would like the group to consider including changes needed to OMPI > >and/or ofa udapl to get OMPI working again on udapl for ofed-1.2. > > > >This will provide OMPI support over iwarp devices via udapl until we can > &

Re: [OMPI devel] [ofa-general] OpenMPI and RDMA-CM

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 08:37 +0300, Or Gerlitz wrote: > Andrew Friedley wrote: > > Jeff Squyres wrote: > FWIW, yes, adding RDMA CM support has actually been on my to-do list > for a while, but it keeps getting bumped by higher priority items. > It would be *much* better if some iWARP

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
Although as Boris pointed out, perhaps the hack in OMPI is no longer needed at all... On Wed, 2007-05-09 at 08:41 -0500, Steve Wise wrote: > 606 opened to track the udapl change. > > 607 opened to track the ompi change to remove the port number stashing > hack. > > Status: I

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 11:42 -0400, Donald Kerr wrote: > I agree OMPI trac ticket #890 should cover this. I will test the > suggested fix, just removing that one line from btl_udapl.c, on Solaris. > I am still not set up on Linux so hopefully Steve can confirm there. > All, First, I haven't tes

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 16:20 -0400, Donald Kerr wrote: > I missing some context here. Where are you plugging iwarp and OMPI > together? ofed-1.2 supports iwarp and the chelsio rnic. It can be accessed directly via the ofa verbs and ofa rdma-cm _as well as_ via udapl. I'm attempting to run OMP

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 16:27 -0400, Donald Kerr wrote: > So then I agree with Andrew, I think you are trying to impose > restrictions on uDAPL which are not part of the Spec. > true, but if you want a single btl for IB and IW, then you'll need to address this issue in some way...

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote: > > Steve Wise wrote: > > There have been a series of discussions on the ofa general list about > > this issue, and the conclusion to date is that it cannot be resolved in > > the rdma-cm or iwarp-cm code of the l

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 17:46 -0700, Andrew Friedley wrote: > > Therefore, the only truly safe thing for an iWARP btl to do (or a > > udapl btl since that is also an iWARP btl) is to have the active > > layer send an MPI Layer "nop" of some kind immediately after > > establishing the connection if t

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 17:55 -0700, Andrew Friedley wrote: > > Steve Wise wrote: > > On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote: > >> Steve Wise wrote: > >>> There have been a series of discussions on the ofa general list about > >>> this

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 15:01 -0700, Sean Hefty wrote: > > The reason it is hard or impossible to solve this in the DAPL layer is > > that any rdma operation on the QP affects the state of that QP and the > > associate CQs. In addition, if you use an RDMA send to enforce this you > > impact the othe

Re: [OMPI devel] OMPI over ofed udapl over iwarp

2007-05-10 Thread Steve Wise
> > There are two new issues so far: > > 1) this has uncovered a connection migration issue in the Chelsio > driver/firmware. We are developing and testing a fix for this now. > Should be ready tomorrow hopefully. > I have a fix for the above issue and I can continue with OMPI testing. To wo

Re: [OMPI devel] OMPI over ofed udapl over iwarp

2007-05-11 Thread Steve Wise
On Thu, 2007-05-10 at 23:10 -0400, Donald Kerr wrote: > > Caitlin Bestler wrote: > > >devel-boun...@open-mpi.org wrote: > > > > > >>>There are two new issues so far: > >>> > >>>1) this has uncovered a connection migration issue in the Chelsio > >>>driver/firmware. We are developing and testing

Re: [OMPI devel] OMPI over ofed udapl over iwarp

2007-05-14 Thread Steve Wise
On Sun, 2007-05-13 at 21:26 -0400, Donald Kerr wrote: > > Caitlin Bestler wrote: > > >Donal Kerr wrote: > > > > > > > order of business after connection establishment > (mba_btl_udapl_sendrecv(). The RECV buffer post for this exchange, > however, should really be done _before_ the

Re: [OMPI devel] IB/OpenFabrics pow wow

2007-11-19 Thread Steve Wise
Jeff Squyres wrote: Friday -- right, duh. My bad. So with everyone's replies so far, I think we're down to: 2. Mon, 26 Nov, 11am US East, 8am US Pacific, 6pm Israel 3. Thu, 29 Nov, 10am US East, 7am US Pacific, 5pm Israel 4. Thu, 29 Nov, 11am US East, 8am US Pacific, 6pm Israel 3 or 4 will

Re: [OMPI devel] IB/OpenFabrics pow wow

2007-11-20 Thread Steve Wise
Jeff Squyres wrote: We seem to have 2 possible times: - #2: only OCG can't make it - #3/#4: only Pasha/Mellanox can't make it To be blunt, I'm inclined to go with #2 because it's more important for this conversation to have the already-existing players involved (because they're familiar wit

Re: [OMPI devel] [ofa-general] uDAPL EVD queue length issue

2007-12-05 Thread Steve Wise
Jon Mason wrote: On Tue, Dec 04, 2007 at 11:40:17AM -0800, Arlin Davis wrote: Jon Mason wrote: While working on OMPI udapl btl, I have noticed some "interesting" behavior. OFA udapl wants the evd queues to be a power of 2 and then will subtract 1 for book keeping (ie, so that internal head and

Re: [OMPI devel] [ofa-general] uDAPL EVD queue length issue

2007-12-06 Thread Steve Wise
Arlin Davis wrote: I'm running OFED 1.2.5 and using Chelsio. From the linux rdma verbs perspective, ibv_create_cq() will create a cq that is >= the requested depth. The fact that mthca always bumps the size up to the next power of 2 isn't something udapl can rely on. It doesn't. uDAPL pa

Re: [OMPI devel] Open IB BTL development question

2008-01-16 Thread Steve Wise
Don Kerr wrote: Looking at the list of new features for OFED 1.3 and seeing that support for XRC went into the trunk I am curious if support for additional OFED 1.3 features will be included, or plan to be included in Open MPI? I am looking at the list of features here: http://64.233.167.104/

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Steve Wise
Gleb Natapov wrote: On Sun, Mar 09, 2008 at 02:48:09PM -0500, Jon Mason wrote: Issue (as described by Steve Wise): Currently OMPI uses qp 0 for all credit updates (by design). This breaks when running over the chelsio rnic due to a race condition between advertising the availability of a

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Steve Wise
Jeff Squyres wrote: On Mar 9, 2008, at 3:39 PM, Gleb Natapov wrote: 1. There was a discussion about this on openfabrics mailing list and the conclusion was that what Open MPI does is correct according to IB/ iWarp spec. 2. Is it possible to fix your FW to follow iWarp spec? Perhaps it is

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Steve Wise
Gleb Natapov wrote: On Mon, Mar 10, 2008 at 09:50:13AM -0500, Steve Wise wrote: I personally don't like the idea to add another layer of complexity to openib BTL code just to work around HW that doesn't follow spec. If work around is simple that is OK, but in this case it is not

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Steve Wise
Jeff Squyres wrote: On Mar 10, 2008, at 9:57 AM, Steve Wise wrote: A single PP QP might be fine for now, and chelsio's next-gen part will support SRQs and not have this funky issue. Good! But why use such a large buffer size for a single PP QP? Why no

Re: [OMPI devel] undefined references for rdma_get_peer_addr & rdma_get_local_addr

2008-05-04 Thread Steve Wise
This probably has to do with the fact that rdma_get_peer_addr() is a static inline in /usr/include/rdma/rdma_cma.h. So if you don't include that file in the test program, then you won't get rdma_get_peer_addr() even if you link with librdmacm.so Steve. Jeff Squyres wrote: Jon / Steve -- c

Re: [OMPI devel] Flush CQ error on iWARP/Out-of-sync shutdown

2008-05-05 Thread Steve Wise
Jon Mason wrote: I am seeing some unusual behavior during the shutdown phase of ompi at the end of my testcase. While running a IMB pingpong test over the rdmacm on openib, I get cq flush errors on my iWARP adapters. This error is happening because the remote node is still polling the endpoin

Re: [OMPI devel] Flush CQ error on iWARP/Out-of-sync shutdown

2008-05-06 Thread Steve Wise
Jeff Squyres wrote: On May 5, 2008, at 6:27 PM, Steve Wise wrote: I am seeing some unusual behavior during the shutdown phase of ompi at the end of my testcase. While running a IMB pingpong test over the rdmacm on openib, I get cq flush errors on my iWARP adapters. This error is

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Steve Wise
Jeff Squyres wrote: On May 19, 2008, at 3:40 PM, Jon Mason wrote: iWARP needs preposted recv buffers (or it will drop the connection). So this isn't a good option. I was talking about SRQ only. You said above that iwarp does retransmit for SRQ. openib BTL relies on HW retransmi

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Steve Wise
Jeff Squyres wrote: On May 19, 2008, at 4:44 PM, Steve Wise wrote: 1. Posting more at low watermark can lead to DoS-like behavior when you have a fast sender and a slow receiver. This is exactly the resource-exhaustion kind of behavior that a high quality MPI implementation is supposed to

[OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
Hi, I'm running ompi top-o-tree from github and seeing an openib btl issue where the qp/srq configuration is incorrect for the given device id. This works fine in 1.8.4rc1, but I see the problem in top-of-tree. A simple 2 node IMB-MPI1 pingpong fails to get the ranks setup. I see this logg

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
On 11/4/2014 2:09 PM, Steve Wise wrote: Hi, I'm running ompi top-o-tree from github and seeing an openib btl issue where the qp/srq configuration is incorrect for the given device id. This works fine in 1.8.4rc1, but I see the problem in top-of-tree. A simple 2 node IMB-MPI1 pingpong

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
wrote: I have run into the issue as well. I will open a pull request for 1.8.4 as part of a patch fixing the coalescing issues. -Nathan On Tue, Nov 04, 2014 at 02:50:30PM -0600, Steve Wise wrote: On 11/4/2014 2:09 PM, Steve Wise wrote: Hi, I'm running ompi top-o-tree from github and seei

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
I'll issue a pull request for this and the other change I"m making. On 11/4/2014 3:27 PM, Steve Wise wrote: I found the bug. Here is the fix: [root@stevo1 openib]# git diff diff --git a/opal/mca/btl/openib/btl_openib_component.c b/opal/mca/btl/openib/btl_openib_component.c ind

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
://github.com/hjelmn/ompi/commit/66fa429e306beb9fca59da0a4554e9b98d788316 -Nathan On Tue, Nov 04, 2014 at 03:27:23PM -0600, Steve Wise wrote: I found the bug. Here is the fix: [root@stevo1 openib]# git diff diff --git a/opal/mca/btl/openib/btl_openib_component.c b/opal/mca/btl/openib

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
r Open MPI developers face-to-face at SC14. If the RFC fails I will still bring that and a couple of other fixes into the master. -Nathan On Tue, Nov 04, 2014 at 04:06:45PM -0600, Steve Wise wrote: Ok, sounds like I should let you continue the good work! :) When do you plan to merge this into o

Re: [OMPI devel] the bug in btl_openib_connect_sl.c

2015-06-29 Thread Steve Wise
gt; To: Open MPI Developers List > Cc: Nathan Hjelm; Steve Wise > Subject: Re: [OMPI devel] the bug in btl_openib_connect_sl.c > > Nathan / Steve -- > > Can you comment? > > > > On Jun 26, 2015, at 5:13 AM, Алексей Рыжих wrote: > > > > Hi everybody, >