Re: [ewg] RDS problematic on RC2

2008-01-17 Thread Olaf Kirch
On Thursday 17 January 2008 11:57, Johann George wrote: That's a remote invalid request error. Were you testing with RDMA or without? We were using the version that runs over IB. Well, yes. But you can do that with ordinary SENDs, or you can enable RDMA for large data blobs as well. But

Re: [ewg] RDS problematic on RC2

2008-01-17 Thread Johann George
Oh, and if you're using RDMA - does this happen to be with qlogic HCAs? If so, I just received a patch from Ralph Campbell with some fixes to the way we set up out DMA mapping. RDS in OFED 1.3 does not currently work on the QLogic HCAs due to the way you are setting up DMA mapping. We

Re: [ewg] RDS problematic on RC2

2008-01-17 Thread Or Gerlitz
Johann George wrote: Oh, and if you're using RDMA - does this happen to be with qlogic HCAs? If so, I just received a patch from Ralph Campbell with some fixes to the way we set up out DMA mapping. RDS in OFED 1.3 does not currently work on the QLogic HCAs due to the way you are setting up

[ewg] [PATCH 4/4] IB/ehca: Prevent RDMA-related connection failures

2008-01-17 Thread Joachim Fenkes
Some HW revisions of eHCA2 may cause an RC connection to break if they received RDMA Reads over that connection before. This can be prevented by assuring that, after the first RDMA Read, the QP receives a new RDMA Read every few million link packets. Include code into the driver that inserts an

[ewg] [PATCH 2/4] IB/ehca: Define array to store SMI/GSI QPs

2008-01-17 Thread Joachim Fenkes
From: Hoang-Nam Nguyen hnguyen at de.ibm.com Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED] --- drivers/infiniband/hw/ehca/ehca_classes.h |2 +- drivers/infiniband/hw/ehca/ehca_main.c|6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git

[ewg] [PATCH 1/4] IB/ehca: Remove CQ-QP-link before destroying QP in error path of create_qp()

2008-01-17 Thread Joachim Fenkes
From: Hoang-Nam Nguyen hnguyen at de.ibm.com Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED] --- drivers/infiniband/hw/ehca/ehca_qp.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c index

Re: [ewg] Re: Can you send explanation how to work with bonding and the standard bonding setting

2008-01-17 Thread Moni Shoua
Scott Weitzenkamp (sweitzen) wrote: Or, I don't see /sbin/call_ifenslave in my OFED-1.3-20080115-0600 ib-bonding package. Also, please run ib-bond --version. /sbin/call_ifenslave should be there only from release 16 and higher. ___ ewg mailing

[ewg] RDS - Recovering from RDMA errors

2008-01-17 Thread Olaf Kirch
When I hit a RDMA error (which happens quite frequently now at rds-stress exit, thanks to the fixed mr pool flushing :) I often see the RDS shutdown_worker getting stuck (and rmmod hangs). It's waiting for allocated WRs to disappear. This usually works, as all WQ entries are flushed out. This

Re: [ewg] RE: [ofa-general] OFED Jan 14 meeting summary on RC2readiness

2008-01-17 Thread Gleb Natapov
On Wed, Jan 16, 2008 at 09:35:39PM -0800, Roland Dreier wrote: Roland, you said that XRC API is ugly, are you going to push it upstream in its present form? That's a good question. Since there is no 'present form' for XRC as far as I can tell, it's hard to make a definitive answer.

Re: [ewg] RDS - Recovering from RDMA errors

2008-01-17 Thread Olaf Kirch
On Thursday 17 January 2008 16:51, Dotan Barak wrote: Moving the QP to error state flushes all of the outstanding WRs and create a completion for each WR. If you want to delete all of the outstanding WRs, you should move the QP state to reset. (Is this is what you asked?) My question was

Re: [ewg] Re: Can you send explanation how to work with bonding and the standard bonding setting

2008-01-17 Thread Moni Shoua
Scott Weitzenkamp (sweitzen) wrote: Or, I don't see /sbin/call_ifenslave in my OFED-1.3-20080115-0600 ib-bonding package. [EMAIL PROTECTED] ~]# uname -a Linux svbu-qa1850-1 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64 x8 6_64 x86_64 GNU/Linux [EMAIL PROTECTED] ~]# rpm -ql

Re: [ewg] RDS problematic on RC2

2008-01-17 Thread Ralph Campbell
Attached is the patch I sent to Olaf. It basically exchanges calls like dma_map_sg() to ib_dma_map_sg() so that the InfiniPath driver can intercept the DMA mapping calls and use kernel virtual addresses instead of physical addresses. The InfiniPath driver uses the host CPU to copy data in most

RE: [ewg] Re: Can you send explanation how to work with bonding and the standard bonding setting

2008-01-17 Thread Scott Weitzenkamp (sweitzen)
[EMAIL PROTECTED] ~]# ofed_info | grep OFED OFED-1.3-20080116-0600 [EMAIL PROTECTED] ~]# ib-bond --version ib-bonding-0.9.0-21 [EMAIL PROTECTED] ~]# rpm -qli ib-bonding Name: ib-bonding Relocations: (not relocatable) Version : 0.9.0 Vendor:

Re: [ewg] RE: [ofa-general] OFED Jan 14 meeting summary on RC2readiness

2008-01-17 Thread Roland Dreier
Well, I can't speak for everyone, but in my opinion if someone wants to run MPI job so huge that XRC absolutely has to be used to be able to actually finish it then he should seriously rethink his application design. But where do you think the crossover is where XRC starts to help MPI? In