iwarp
connection mode. Once I fixed this, the gather operation for > NP60
behaves much much better...
Thanks Terry for helping.
Steve.
On 09/17/2010 03:46 PM, Steve Wise wrote:
I'll look into Solaris Studio. I think somehow the connections are
getting single threaded or somehow funnel
Hello,
I'm debugging an issue with openmpi-1.4.5 and the openib btl over
chelsio iwarp devices. I am the iwarp driver developer for this
device. I have debug code that detects cq overflows, and I'm seeing rcq
overflows during finalize for certain IMB runs with ompi.So as the
recv wrs a
someone point Steve to the right place to look in the openib BTL?
On Jul 2, 2012, at 11:24 AM, Steve Wise wrote:
Hello,
I'm debugging an issue with openmpi-1.4.5 and the openib btl over chelsio iwarp
devices. I am the iwarp driver developer for this device. I have debug code
that
On 7/2/2012 4:14 PM, Jeff Squyres wrote:
Steve --
Can you extend this new stuff to support RDMACM, including the warp-needed
connector-sends-first stuff?
I have no time right now. I could test something perhaps if someone can
do the initial pull of the rdma cpc code into the ofacm...
I
Hello,
I'm tracking an issue I see in openmpi-1.6.3. Running this command on
my chelsio iwarp/rdma setup causes a seg fault every time:
/usr/mpi/gcc/openmpi-1.6.3-dbg/bin/mpirun --np 2 --host hpc-hn1,hpc-cn2
--mca btl openib,sm,self --mca btl_openib_ipaddr_include
"192.168.170.0/24" /usr/mp
On 1/25/2013 12:19 PM, Steve Wise wrote:
Hello,
I'm tracking an issue I see in openmpi-1.6.3. Running this command on
my chelsio iwarp/rdma setup causes a seg fault every time:
/usr/mpi/gcc/openmpi-1.6.3-dbg/bin/mpirun --np 2 --host
hpc-hn1,hpc-cn2 --mca btl openib,sm,self
On 1/28/2013 11:48 AM, Ralph Castain wrote:
On Jan 28, 2013, at 9:12 AM, Steve Wise wrote:
On 1/25/2013 12:19 PM, Steve Wise wrote:
Hello,
I'm tracking an issue I see in openmpi-1.6.3. Running this command on my
chelsio iwarp/rdma setup causes a seg fault every time:
/usr/mpi/gcc/op
port 2...
Steve.
On Jan 28, 2013, at 10:03 AM, Steve Wise wrote:
On 1/28/2013 11:48 AM, Ralph Castain wrote:
On Jan 28, 2013, at 9:12 AM, Steve Wise wrote:
On 1/25/2013 12:19 PM, Steve Wise wrote:
Hello,
I'm tracking an issue I see in openmpi-1.6.3. Running this command on my
che
ing at all, and it ran
to completion without problem.
I suspect the problem is that the system I can use just isn't configured like
yours, and so I can't trigger the problem. Afraid I can't be of help after
all... :-(
On Jan 28, 2013, at 11:25 AM, Steve Wise wrote:
On 1/28/201
On 1/28/2013 2:04 PM, Ralph Castain wrote:
On Jan 28, 2013, at 11:55 AM, Steve Wise wrote:
Do you know if the rdmacm CPC is really being used for your connection setup
(vs other CPCs supported by IB)? Cuz iwarp only supports rdmacm. Maybe that's
the difference?
Dunno for certain,
On 1/28/2013 7:32 PM, Ralph Castain wrote:
Out of curiosity, could you tell us how you configured OMPI?
./configure --enable-debug --enable-mpirun-prefix-by-default
--prefix=/usr/mpi/gcc/openmpi-1.6.4rc2-dbg
On Jan 28, 2013, at 12:46 PM, Steve Wise wrote:
On 1/28/2013 2:04 PM, Ralph
b component
init, destroy it.
-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf
Of Ralph Castain
Sent: Monday, January 28, 2013 8:35 PM
To: Steve Wise
Cc: Open MPI Developers
Subject: Re: [OMPI devel] openib unloaded before last mem
nal Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf
Of Jeff Squyres (jsquyres)
Sent: Tuesday, January 29, 2013 10:05 AM
To: Steve Wise
Cc: Open MPI Developers
Subject: Re: [OMPI devel] openib unloaded before last mem dereg
It's on the ticket tha
Tests good on 1.6.3 too.
Thanks Josh!
On 1/29/2013 9:17 AM, Steve Wise wrote:
I applied it to 1.6.4rc2 and it fixed the seg fault issue. Lemme try
1.6.3 too.
On 1/29/2013 9:11 AM, Joshua Ladd wrote:
It should apply cleanly to 1.6.3 branch, I tested it this morning.
From top level OMPI
Yeoh
bbenton: Brad Benton
tonyb:Tony Breeds **NO COMMITS IN LAST YEAR**
swise:Steve Wise
On Jul 8, 2013, at 6:32 PM, Jeff Squyres (jsquyres) wrote:
According to https://svn.open-mpi.org/trac/ompi/wiki/Admistrative%20rules, it
is time for our annual review of Open MPI SVN accoun
Hello,
I just tried to run openmpi-1.7.2 over chelsio's IWARP device, and it no longer
works. It appears
that 1.7.2 fails to use the RDMACM CPC. I guess it is trying to use OOB, which
is IB-specific. If
I explicitly specify the RDMACM CPC via '--mca btl_openib_cpc_include rdmacm'
then it wor
> -Original Message-
> From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com]
> Sent: Monday, August 19, 2013 12:06 PM
> To: Steve Wise
> Cc:
> Subject: Re: openmpi-1.7.2 fails to use the RDMACM CPC
>
> Not offhand.
>
> Given the lack of iWARP test
I confirmed that this is a regression from 1.7.1...
I'll see if I can figure out what's going on...
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise
> Sent: Monday, August 19, 2013 12:15 PM
> To: 'Jeff Squyres (
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise
> Sent: Monday, August 19, 2013 2:42 PM
> To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)'
> Cc: 'Indranil Choudhury'
> Subject: Re: [OMPI
> -Original Message-
> From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com]
> Sent: Monday, August 19, 2013 3:23 PM
> To: Steve Wise
> Cc: Open MPI Developers; Indranil Choudhury
> Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
>
> No
> -Original Message-
> From: Steve Wise [mailto:sw...@opengridcomputing.com]
> Sent: Monday, August 19, 2013 3:25 PM
> To: 'Jeff Squyres (jsquyres)'
> Cc: 'Open MPI Developers'; 'Indranil Choudhury'
> Subject: RE: [OMPI
;t use "#ifdef OMPI_HAVE_RDMAOE",use
"#if defined(HAVE_IBV_LINK_LAYER_ETHERNET)"
* Update the following to include/link against common/verbs
* bcol/iboffload
* sbgp/ibnet
* btl/openib
>
> > >
> > > On Aug 19, 2013, at 4:17 PM, Steve Wise
&
This patch fixes iwarp. dunno if it breaks RoCE though :)
[root@r9 ompi-trunk]# svn diff
Index: ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
===
--- ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(revision 290
field in the ibv_port_attr structure.
> -Original Message-
> From: Steve Wise [mailto:sw...@opengridcomputing.com]
> Sent: Monday, August 19, 2013 3:53 PM
> To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)'
> Cc: 'Indranil Choudhury'
> Sub
> -Original Message-
> From: Steve Wise [mailto:sw...@opengridcomputing.com]
> Sent: Monday, August 19, 2013 4:02 PM
> To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)'
> Cc: 'Indranil Choudhury'
> Subject: RE: [OMPI devel] openmp
> Thanks for finding r27212. It was about a year ago, and had clearly fallen
> out of my cache (I
have very
> little to do with the openib BTL these days).
>
> Your solution isn't correct, because HAVE_IBV_LINK_LAYER_ETHERNET is defined
> (nor not) via this m4
> macro in config/ompi_check_ope
> -Original Message-
> From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com]
> Sent: Tuesday, August 20, 2013 8:59 AM
> To: Steve Wise
> Cc: Open MPI Developers; Indranil Choudhury
> Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
>
> On
RNET 1
Note the #define is HAVE_DECL_IBV_LINK_LAYER_ETHERNET but the code is checking
for
HAVE_IBV_LINK_LAYER_ETHERNET!
No _DECL_...
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise
> Sent: Tuesday, August 20, 2013 9:07 AM
> To: 'Jeff Sq
THERNET)
+#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET)
else if (flags & OMPI_COMMON_VERBS_FLAGS_LINK_LAYER_IB) {
if (IBV_LINK_LAYER_INFINIBAND == port_attr.link_layer) {
want = true;
> -Original Message-
> From: devel [mailto:devel-boun...@
> -Original Message-
> From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com]
> Sent: Tuesday, August 20, 2013 11:07 AM
> To: Steve Wise
> Cc: Open MPI Developers; Indranil Choudhury
> Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
>
> I t
>
> Don't forget that Chelsio is still on the hook for adding iWARP support into
ompi/mca/common/ofacm,
> however. :-)
>
You won't let me forget. ;) I will do it.
> Specifically: At some point iWARP support will break because we'll be removing
> ompi/mca/btl/openib/cpc and exclusively using om
Why is the 1.7 changeset different from the trunk changeset? Specifically,
#if defined(HAVE_IBV_LINK_LAYER_ETHERENET)
Is changed to
#if HAVE_DECL_IBV_LINK_LAYER_ETHERNET
Instead of
#if defined(HAVE_DECL_IBV_LINK_LAYER_ETHERNET)
> -Original Message-
> From: svn [mailto:svn-boun...
to be 0 or 1 (vs. #define'ing or
> >> #undef'ing it). So don't check for "#if defined(..."; just check for
> >> "#if ...".
>
>
> On Aug 23, 2013, at 8:10 AM, "Steve Wise" wrote:
>
> > Why is the 1.7 c
On 11/14/2013 12:16 PM, Jeff Squyres (jsquyres) wrote:
On Nov 14, 2013, at 1:03 PM, Ralph Castain wrote:
1) What the status of UDCM is (does it work reliably, does it support
XRC, etc.)
Seems to be working okay on the IB systems at LANL and IU. Don't know about XRC - I seem
to recall the ans
On 11/15/2013 5:12 PM, Ralph Castain wrote:
Perhaps if Pasha or somebody else proficient in the OMPI code could help out,
then the iWARP CPC could be moved. W/O help from OMPI developers, its going to
take me a very long time...
I believe we would all be willing to provide advice - we just ha
On 11/14/2013 3:12 PM, Shamis, Pavel wrote:
Comments inline.
3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm in
that move. Never changed openib to use ofacm/common.
Pasha: This is not entirely true. I changed openib btl ~3 year ago before my
departure from Mellano
Hey Jeff,
Have you seen this? I'm hitting this regularly running on ofed-1.4.1-rc2.
Test:
[ompi@vic12 ~]$ cat doit-ompi
#!/bin/sh
while : ; do
mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g
--mca btl openib,self,sm --mca btl_openib_max_btls 1
/usr/mpi/gcc/openmpi-1.3.1rc4
When this happens, that node logs this type of message also in
/var/log/messages:
IMB-MPI1[8859]: segfault at 0018 rip 2b7bfc880800 rsp
7fffb1021330 error 4
Steve Wise wrote:
Hey Jeff,
Have you seen this? I'm hitting this regularly running on
ofed-1.4.1-rc2.
the ompi-trunk.
Pasha.
Steve Wise wrote:
When this happens, that node logs this type of message also in
/var/log/messages:
IMB-MPI1[8859]: segfault at 0018 rip 2b7bfc880800 rsp
7fffb1021330 error 4
Steve Wise wrote:
Hey Jeff,
Have you seen this? I'm hitting
igure_options CFLAGS=-g'
Steve.
Pavel Shamis (Pasha) wrote:
Steve,
If you will compile OMPI code with CFLAGS="-g" ,generate segfault
core_file and send the core + IMB-MPI1 to me I will be able to
understand the problem better.
Regards,
Pasha
Steve Wise wrote:
Hey Pasha,
Hey Open MPI wizards,
I'm trying to debug something in my library that gets loaded into my mpi
processes when they are started via mpirun. With other MPIs, I've been
able to deliver SIGUSR2 to the process and trigger some debug code I
have in my library that sets up a handler for SIGUSR2. Ho
On 08/25/2010 11:33 AM, Ralph Castain wrote:
We don't use it - mpirun traps it and then propagates it by default to all
remote procs.
So I should send the signal to the mpirun process?
What OMPI version is this?
1.4.1
On Aug 25, 2010, at 10:23 AM, Steve Wise wrote:
On 08/25/2010 12:43 PM, Ralph Castain wrote:
On Aug 25, 2010, at 11:26 AM, Steve Wise wrote:
On 08/25/2010 11:33 AM, Ralph Castain wrote:
We don't use it - mpirun traps it and then propagates it by default to all
remote procs.
So I should send the signal to the m
I was wondering what the logic is behind allowing an MPI job to continue
in the presence of a fatal qp error?
Note the "will try to continue" sentence:
--
The OpenFabrics stack has reported a network error event. Open MPI
Hi,
I'm debugging a performance problem with running IMB-MP1/barrier in an
NP64 cluster (8 nodes, 8 cores each). I'm using openmpi-1.4.1 from the
OFED-1.5.1 distribution. The BTL is openib/iWARP via Chelsio's T3
RNIC. In short, a NP60 and smaller run completes in a timely manner as
expect
Oops. One key typo here: This is the IMB-MPI1 gather test, not
barrier. :(
On 9/16/2010 12:05 PM, Steve Wise wrote:
Hi,
I'm debugging a performance problem with running IMB-MP1/barrier in an
NP64 cluster (8 nodes, 8 cores each). I'm using openmpi-1.4.1 from
the OFED-1.5.1 di
t delays...
On 9/16/2010 1:01 PM, Steve Wise wrote:
Oops. One key typo here: This is the IMB-MPI1 gather test, not
barrier. :(
On 9/16/2010 12:05 PM, Steve Wise wrote:
Hi,
I'm debugging a performance problem with running IMB-MP1/barrier in
an NP64 cluster (8 nodes, 8 cores each
t all. This
might be able to help determine if it is the actually connection set
up between processes that are out of sync as oppose to something in
the actual gather algorithm.
--td
Steve Wise wrote:
Here's a clue: ompi_coll_tuned_gather_intra_dec_fixed() changes its
algorithm for job
Does anyone have a NP64 IB cluster handy? I'd be interested if IB
behaves this way when running with the rdmacm connect method. IE with:
--mca btl_openib_cpc_include rdmacm --mca btl openib,sm,self
Steve.
On 9/17/2010 10:41 AM, Steve Wise wrote:
Yes it does. With mpi_preconnect_m
actually
has a Linux version).
--td
Steve Wise wrote:
Yes it does. With mpi_preconnect_mpi to 1, NP64 doesn't stall. So
its not the algorithm in and of itself, but rather some interplay
between the algorithm and connection setup I guess.
On 9/17/2010 5:24 AM, Terry Dontje wrote
pich, not ompi, so I need to go do some homework. But
any pointers on the connection setup design for ompi would be great.
I'm CCing de...@openmpi.org in case anyone else is interested in
helping. Chelsio can provide rnic HW...
Thanks,
Steve.
>
>
> On Apr 28, 2007, at 4
Also, there appears to be a DAPL BTL in OMPI. Is this BTL complete and
enabled for the ofed-1.2 udapl library?
Steve.
On Mon, 2007-05-07 at 17:09 -0500, Steve Wise wrote:
> On Sat, 2007-04-28 at 16:20 -0400, Jeff Squyres wrote:
> > You'd probably be better asking this questi
On Mon, 2007-05-07 at 20:39 -0400, Jeff Squyres wrote:
> On May 7, 2007, at 6:52 PM, Steve Wise wrote:
>
> > Also, there appears to be a DAPL BTL in OMPI. Is this BTL complete
> > and
> > enabled for the ofed-1.2 udapl library?
>
> Yes, it is complete and is well
so, I don't know what
> their timeframe will be (and it may depend on the severity of the
> problem).
>
Well I've tried OMPI on ofed-1.2 udapl today and it doesn't work. I'm
debugging now.
>
> On May 8, 2007, at 9:47 AM, Steve Wise wrote:
>
>
>
> Well I've tried OMPI on ofed-1.2 udapl today and it doesn't work. I'm
> debugging now.
>
Here's part of the problem (from ompi/btl/udapl/btl_udapl.c):
/* TODO - big bad evil hack! */
/* uDAPL doesn't ever seem to keep track of ports with addresses. This
becomes a problem w
> >> Chelsio's gonna pony up the resources to get this work done asap. Do
> >> you have any thoughts on how we can collaborate on this project? I'm
> >> familiar with mvapich, not ompi, so I need to go do some homework.
> >> But
> >> any pointers on the connection setup design for ompi would b
On Tue, 2007-05-08 at 13:57 -0400, Andrew Friedley wrote:
> Steve Wise wrote:
> >> Well I've tried OMPI on ofed-1.2 udapl today and it doesn't work. I'm
> >> debugging now.
> >>
> >
> > Here's part of the problem (from ompi/bt
On Tue, 2007-05-08 at 12:55 -0700, Arlin Davis wrote:
> Steve Wise wrote:
>
> >1) OMPI shouldn't be stepping on the ia_address.
> >
> >
> stongly agree
>
> >2) OFA udapl should probably be explicitly binding local cm_ids to port
> >zero.
>
, Arlin Davis wrote:
> Steve Wise wrote:
>
> >I would like the group to consider including changes needed to OMPI
> >and/or ofa udapl to get OMPI working again on udapl for ofed-1.2.
> >
> >This will provide OMPI support over iwarp devices via udapl until we can
> &
On Wed, 2007-05-09 at 08:37 +0300, Or Gerlitz wrote:
> Andrew Friedley wrote:
> > Jeff Squyres wrote:
> FWIW, yes, adding RDMA CM support has actually been on my to-do list
> for a while, but it keeps getting bumped by higher priority items.
> It would be *much* better if some iWARP
Although as Boris pointed out, perhaps the hack in OMPI is no longer
needed at all...
On Wed, 2007-05-09 at 08:41 -0500, Steve Wise wrote:
> 606 opened to track the udapl change.
>
> 607 opened to track the ompi change to remove the port number stashing
> hack.
>
> Status: I
On Wed, 2007-05-09 at 11:42 -0400, Donald Kerr wrote:
> I agree OMPI trac ticket #890 should cover this. I will test the
> suggested fix, just removing that one line from btl_udapl.c, on Solaris.
> I am still not set up on Linux so hopefully Steve can confirm there.
>
All,
First, I haven't tes
On Wed, 2007-05-09 at 16:20 -0400, Donald Kerr wrote:
> I missing some context here. Where are you plugging iwarp and OMPI
> together?
ofed-1.2 supports iwarp and the chelsio rnic. It can be accessed
directly via the ofa verbs and ofa rdma-cm _as well as_ via udapl.
I'm attempting to run OMP
On Wed, 2007-05-09 at 16:27 -0400, Donald Kerr wrote:
> So then I agree with Andrew, I think you are trying to impose
> restrictions on uDAPL which are not part of the Spec.
>
true, but if you want a single btl for IB and IW, then you'll need to
address this issue in some way...
On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote:
>
> Steve Wise wrote:
> > There have been a series of discussions on the ofa general list about
> > this issue, and the conclusion to date is that it cannot be resolved in
> > the rdma-cm or iwarp-cm code of the l
On Wed, 2007-05-09 at 17:46 -0700, Andrew Friedley wrote:
> > Therefore, the only truly safe thing for an iWARP btl to do (or a
> > udapl btl since that is also an iWARP btl) is to have the active
> > layer send an MPI Layer "nop" of some kind immediately after
> > establishing the connection if t
On Wed, 2007-05-09 at 17:55 -0700, Andrew Friedley wrote:
>
> Steve Wise wrote:
> > On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote:
> >> Steve Wise wrote:
> >>> There have been a series of discussions on the ofa general list about
> >>> this
On Wed, 2007-05-09 at 15:01 -0700, Sean Hefty wrote:
> > The reason it is hard or impossible to solve this in the DAPL layer is
> > that any rdma operation on the QP affects the state of that QP and the
> > associate CQs. In addition, if you use an RDMA send to enforce this you
> > impact the othe
>
> There are two new issues so far:
>
> 1) this has uncovered a connection migration issue in the Chelsio
> driver/firmware. We are developing and testing a fix for this now.
> Should be ready tomorrow hopefully.
>
I have a fix for the above issue and I can continue with OMPI testing.
To wo
On Thu, 2007-05-10 at 23:10 -0400, Donald Kerr wrote:
>
> Caitlin Bestler wrote:
>
> >devel-boun...@open-mpi.org wrote:
> >
> >
> >>>There are two new issues so far:
> >>>
> >>>1) this has uncovered a connection migration issue in the Chelsio
> >>>driver/firmware. We are developing and testing
On Sun, 2007-05-13 at 21:26 -0400, Donald Kerr wrote:
>
> Caitlin Bestler wrote:
>
> >Donal Kerr wrote:
> >
> >
> >
> order of business after connection establishment
> (mba_btl_udapl_sendrecv(). The RECV buffer post for this exchange,
> however, should really be done _before_ the
Jeff Squyres wrote:
Friday -- right, duh. My bad. So with everyone's replies so far, I
think we're down to:
2. Mon, 26 Nov, 11am US East, 8am US Pacific, 6pm Israel
3. Thu, 29 Nov, 10am US East, 7am US Pacific, 5pm Israel
4. Thu, 29 Nov, 11am US East, 8am US Pacific, 6pm Israel
3 or 4 will
Jeff Squyres wrote:
We seem to have 2 possible times:
- #2: only OCG can't make it
- #3/#4: only Pasha/Mellanox can't make it
To be blunt, I'm inclined to go with #2 because it's more important
for this conversation to have the already-existing players involved
(because they're familiar wit
Jon Mason wrote:
On Tue, Dec 04, 2007 at 11:40:17AM -0800, Arlin Davis wrote:
Jon Mason wrote:
While working on OMPI udapl btl, I have noticed some "interesting"
behavior. OFA udapl wants the evd queues to be a power of 2 and
then will subtract 1 for book keeping (ie, so that internal head and
Arlin Davis wrote:
I'm running OFED 1.2.5 and using Chelsio.
From the linux rdma verbs perspective, ibv_create_cq() will create a
cq that is >= the requested depth. The fact that mthca always bumps
the size up to the next power of 2 isn't something udapl can rely on.
It doesn't.
uDAPL pa
Don Kerr wrote:
Looking at the list of new features for OFED 1.3 and seeing that support
for XRC went into the trunk I am curious if support for additional OFED
1.3 features will be included, or plan to be included in Open MPI?
I am looking at the list of features here:
http://64.233.167.104/
Gleb Natapov wrote:
On Sun, Mar 09, 2008 at 02:48:09PM -0500, Jon Mason wrote:
Issue (as described by Steve Wise):
Currently OMPI uses qp 0 for all credit updates (by design). This breaks
when running over the chelsio rnic due to a race condition between
advertising the availability of a
Jeff Squyres wrote:
On Mar 9, 2008, at 3:39 PM, Gleb Natapov wrote:
1. There was a discussion about this on openfabrics mailing list and
the
conclusion was that what Open MPI does is correct according to IB/
iWarp
spec.
2. Is it possible to fix your FW to follow iWarp spec? Perhaps it is
Gleb Natapov wrote:
On Mon, Mar 10, 2008 at 09:50:13AM -0500, Steve Wise wrote:
I personally don't like the idea to add another layer of complexity to openib
BTL code just to work around HW that doesn't follow spec. If work around
is simple that is OK, but in this case it is not
Jeff Squyres wrote:
On Mar 10, 2008, at 9:57 AM, Steve Wise wrote:
A single PP QP might be fine for now, and chelsio's next-gen part will
support SRQs and not have this funky issue.
Good!
But why use such a large buffer size for a single PP QP? Why no
This probably has to do with the fact that rdma_get_peer_addr() is a
static inline in /usr/include/rdma/rdma_cma.h. So if you don't include
that file in the test program, then you won't get rdma_get_peer_addr()
even if you link with librdmacm.so
Steve.
Jeff Squyres wrote:
Jon / Steve -- c
Jon Mason wrote:
I am seeing some unusual behavior during the shutdown phase of ompi at the end
of my testcase. While running a IMB pingpong test over the rdmacm on openib, I
get cq flush errors on my iWARP adapters.
This error is happening because the remote node is still polling the endpoin
Jeff Squyres wrote:
On May 5, 2008, at 6:27 PM, Steve Wise wrote:
I am seeing some unusual behavior during the shutdown phase of ompi
at the end of my testcase. While running a IMB pingpong test over
the rdmacm on openib, I get cq flush errors on my iWARP adapters.
This error is
Jeff Squyres wrote:
On May 19, 2008, at 3:40 PM, Jon Mason wrote:
iWARP needs preposted recv buffers (or it will drop the
connection). So
this isn't a good option.
I was talking about SRQ only. You said above that iwarp does
retransmit for SRQ.
openib BTL relies on HW retransmi
Jeff Squyres wrote:
On May 19, 2008, at 4:44 PM, Steve Wise wrote:
1. Posting more at low watermark can lead to DoS-like behavior when
you have a fast sender and a slow receiver. This is exactly the
resource-exhaustion kind of behavior that a high quality MPI
implementation is supposed to
Hi,
I'm running ompi top-o-tree from github and seeing an openib btl issue
where the qp/srq configuration is incorrect for the given device id.
This works fine in 1.8.4rc1, but I see the problem in top-of-tree. A
simple 2 node IMB-MPI1 pingpong fails to get the ranks setup. I see
this logg
On 11/4/2014 2:09 PM, Steve Wise wrote:
Hi,
I'm running ompi top-o-tree from github and seeing an openib btl issue
where the qp/srq configuration is incorrect for the given device id.
This works fine in 1.8.4rc1, but I see the problem in top-of-tree. A
simple 2 node IMB-MPI1 pingpong
wrote:
I have run into the issue as well. I will open a pull request for 1.8.4
as part of a patch fixing the coalescing issues.
-Nathan
On Tue, Nov 04, 2014 at 02:50:30PM -0600, Steve Wise wrote:
On 11/4/2014 2:09 PM, Steve Wise wrote:
Hi,
I'm running ompi top-o-tree from github and seei
I'll issue a pull request for this and the other change I"m making.
On 11/4/2014 3:27 PM, Steve Wise wrote:
I found the bug. Here is the fix:
[root@stevo1 openib]# git diff
diff --git a/opal/mca/btl/openib/btl_openib_component.c
b/opal/mca/btl/openib/btl_openib_component.c
ind
://github.com/hjelmn/ompi/commit/66fa429e306beb9fca59da0a4554e9b98d788316
-Nathan
On Tue, Nov 04, 2014 at 03:27:23PM -0600, Steve Wise wrote:
I found the bug. Here is the fix:
[root@stevo1 openib]# git diff
diff --git a/opal/mca/btl/openib/btl_openib_component.c
b/opal/mca/btl/openib
r Open MPI developers
face-to-face at SC14.
If the RFC fails I will still bring that and a couple of other fixes
into the master.
-Nathan
On Tue, Nov 04, 2014 at 04:06:45PM -0600, Steve Wise wrote:
Ok, sounds like I should let you continue the good work! :) When do you
plan to merge this into o
gt; To: Open MPI Developers List
> Cc: Nathan Hjelm; Steve Wise
> Subject: Re: [OMPI devel] the bug in btl_openib_connect_sl.c
>
> Nathan / Steve --
>
> Can you comment?
>
>
> > On Jun 26, 2015, at 5:13 AM, Алексей Рыжих wrote:
> >
> > Hi everybody,
>
92 matches
Mail list logo