Re: [ewg] RE: [ofa-general] OFED Jan 14 meeting summary on RC2readiness

2008-01-17 Thread Roland Dreier
> Well, I can't speak for everyone, but in my opinion if someone wants to run > MPI job so huge that XRC absolutely has to be used to be able to actually > finish it then he should seriously rethink his application design. But where do you think the crossover is where XRC starts to help MPI? In

[ewg] [PATCH] IPoIB/CM Enable SRQ support for HCAs with les than 16 s/g entries (in OFED 1.3)

2008-01-17 Thread Pradeep Satyanarayana
Some HCAs like ehca2 support fewer than 16 SG entries. Currently IPoIB/CM implicitly assumes all HCAs will support 16 SG entries of 4K pages for 64K MTUs. This patch removes that restriction. This patch continues to use order 0 allocations and enables implementation of connected mode on such HCA

Re: [ewg] RDS problematic on RC2

2008-01-17 Thread Ralph Campbell
Attached is the patch I sent to Olaf. It basically exchanges calls like dma_map_sg() to ib_dma_map_sg() so that the InfiniPath driver can intercept the DMA mapping calls and use kernel virtual addresses instead of physical addresses. The InfiniPath driver uses the host CPU to copy data in most case

RE: [ewg] Re: Can you send explanation how to work with bonding and the standard bonding setting

2008-01-17 Thread Scott Weitzenkamp (sweitzen)
[EMAIL PROTECTED] ~]# ofed_info | grep OFED OFED-1.3-20080116-0600 [EMAIL PROTECTED] ~]# ib-bond --version ib-bonding-0.9.0-21 [EMAIL PROTECTED] ~]# rpm -qli ib-bonding Name: ib-bonding Relocations: (not relocatable) Version : 0.9.0 Vendor:

Re: [ewg] RDS - Recovering from RDMA errors

2008-01-17 Thread Olaf Kirch
On Thursday 17 January 2008 16:51, Dotan Barak wrote: > Moving the QP to error state flushes all of the outstanding WRs and > create a completion for each WR. > If you want to delete all of the outstanding WRs, you should move the QP > state to reset. > > (Is this is what you asked?) My questio

Re: [ewg] RDS - Recovering from RDMA errors

2008-01-17 Thread Dotan Barak
Olaf Kirch wrote: When I hit a RDMA error (which happens quite frequently now at rds-stress exit, thanks to the fixed mr pool flushing :) I often see the RDS shutdown_worker getting stuck (and rmmod hangs). It's waiting for allocated WRs to disappear. This usually works, as all WQ entries are fl

[ewg] RDS - Recovering from RDMA errors

2008-01-17 Thread Olaf Kirch
When I hit a RDMA error (which happens quite frequently now at rds-stress exit, thanks to the fixed mr pool flushing :) I often see the RDS shutdown_worker getting stuck (and rmmod hangs). It's waiting for allocated WRs to disappear. This usually works, as all WQ entries are flushed out. This doe

Re: [ewg] RE: [ofa-general] OFED Jan 14 meeting summary on RC2readiness

2008-01-17 Thread Gleb Natapov
On Wed, Jan 16, 2008 at 09:35:39PM -0800, Roland Dreier wrote: > > Roland, you said that XRC API is ugly, are you going to push it upstream > > in its present form? > > That's a good question. Since there is no 'present form' for XRC as > far as I can tell, it's hard to make a definitive answer

Re: [ewg] Re: Can you send explanation how to work with bonding and the standard bonding setting

2008-01-17 Thread Moni Shoua
Scott Weitzenkamp (sweitzen) wrote: > Or, > > I don't see /sbin/call_ifenslave in my OFED-1.3-20080115-0600 ib-bonding > package. > Also, please run ib-bond --version. /sbin/call_ifenslave should be there only from release 16 and higher. ___ ewg maili

Re: [ewg] Re: Can you send explanation how to work with bonding and the standard bonding setting

2008-01-17 Thread Moni Shoua
Scott Weitzenkamp (sweitzen) wrote: > Or, > > I don't see /sbin/call_ifenslave in my OFED-1.3-20080115-0600 ib-bonding > package. > > [EMAIL PROTECTED] ~]# uname -a > Linux svbu-qa1850-1 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 > x86_64 x8 > 6_64 x86_64 GNU/Linux > [EMAIL PROTECTED] ~]#

Re: [ewg] RDS problematic on RC2

2008-01-17 Thread Richard Frank
<< If the TCP part is entirely non-working, it might be better to disable it for now rather than have it crash the machine. So far, I have never gotten it to function correctly and it crashes some machines almost immediately. My vote is to disable TCP support (return not supported) - at least

[ewg] [PATCH 4/4] IB/ehca: Prevent RDMA-related connection failures

2008-01-17 Thread Joachim Fenkes
Some HW revisions of eHCA2 may cause an RC connection to break if they received RDMA Reads over that connection before. This can be prevented by assuring that, after the first RDMA Read, the QP receives a new RDMA Read every few million link packets. Include code into the driver that inserts an em

[ewg] [PATCH 3/4] IB/ehca: Add "port connection autodetect mode"

2008-01-17 Thread Joachim Fenkes
From: Hoang-Nam Nguyen <[EMAIL PROTECTED]> This patch enhances ehca with a capability to "autodetect" the ports being connected physically. In order to utilize that function the module option nr_ports must be set to -1 (default is 2 - two ports). This feature is experimental and will made the defa

[ewg] [PATCH 2/4] IB/ehca: Define array to store SMI/GSI QPs

2008-01-17 Thread Joachim Fenkes
From: Hoang-Nam Nguyen Signed-off-by: Hoang-Nam Nguyen <[EMAIL PROTECTED]> --- drivers/infiniband/hw/ehca/ehca_classes.h |2 +- drivers/infiniband/hw/ehca/ehca_main.c|6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b

[ewg] [PATCH 1/4] IB/ehca: Remove CQ-QP-link before destroying QP in error path of create_qp()

2008-01-17 Thread Joachim Fenkes
From: Hoang-Nam Nguyen Signed-off-by: Hoang-Nam Nguyen <[EMAIL PROTECTED]> --- drivers/infiniband/hw/ehca/ehca_qp.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c index f116eb7..26c6a94 100

[ewg] [PATCH 0/4] IB/ehca: fixes, port connectivity autodetection, problem workaround

2008-01-17 Thread Joachim Fenkes
This patchset will fix a minor issue, introduce port connectivity autodetection and work around an RDMA-related problem in eHCA2. [1/4] fixes an error path in destroy_qp() [2/4] stores the SMI/GSI QPs in a per-port array [3/4] adds port connectivity autodetection [4/4] adds the aforementioned work

Re: [ewg] RDS problematic on RC2

2008-01-17 Thread Or Gerlitz
Johann George wrote: Oh, and if you're using RDMA - does this happen to be with qlogic HCAs? If so, I just received a patch from Ralph Campbell with some fixes to the way we set up out DMA mapping. RDS in OFED 1.3 does not currently work on the QLogic HCAs due to the way you are setting up DMA

Re: [ewg] RDS problematic on RC2

2008-01-17 Thread Olaf Kirch
On Thursday 17 January 2008 11:57, Johann George wrote: > > That's a remote invalid request error. Were you testing > > with RDMA or without? > > We were using the version that runs over IB. Well, yes. But you can do that with ordinary SENDs, or you can enable RDMA for large data blobs as well. B

Re: [ewg] RDS problematic on RC2

2008-01-17 Thread Johann George
> Oh, and if you're using RDMA - does this happen to be with > qlogic HCAs? If so, I just received a patch from Ralph > Campbell with some fixes to the way we set up out DMA > mapping. RDS in OFED 1.3 does not currently work on the QLogic HCAs due to the way you are setting up DMA mapping. We al

Re: [ewg] RDS problematic on RC2

2008-01-17 Thread Johann George
> That's a remote invalid request error. Were you testing > with RDMA or without? We were using the version that runs over IB. > What user application were you using for testing? qperf. Unfortunately the version that is included in OFED 1.3 RC2 is old due to a problem that Vlad just discovered