RE: ibv_post_send/recv kernel path optimizations

2011-01-24 Thread Walukiewicz, Miroslaw
PM To: Walukiewicz, Miroslaw; Roland Dreier Cc: Or Gerlitz; Jason Gunthorpe; linux-rdma@vger.kernel.org Subject: RE: ibv_post_send/recv kernel path optimizations + qp = idr_read_qp(cmd.qp_handle, file-ucontext); + if (!qp) + goto out_raw_qp; + + if (qp-qp_type

RE: ibv_post_send/recv kernel path optimizations

2011-01-21 Thread Walukiewicz, Miroslaw
: return ret ? ret : in_len; } -Original Message- From: Roland Dreier [mailto:rdre...@cisco.com] Sent: Monday, January 10, 2011 9:38 PM To: Walukiewicz, Miroslaw Cc: Or Gerlitz; Jason Gunthorpe; Hefty, Sean; linux-rdma@vger.kernel.org Subject: Re: ibv_post_send/recv kernel path optimizations

RE: ibv_post_send/recv kernel path optimizations

2011-01-21 Thread Hefty, Sean
+ qp = idr_read_qp(cmd.qp_handle, file-ucontext); + if (!qp) + goto out_raw_qp; + + if (qp-qp_type == IB_QPT_RAW_ETH) { + resp.bad_wr = 0; + ret = qp-device-post_send(qp, NULL, NULL); This looks odd to me and can definitely confuse someone

RE: ibv_post_send/recv kernel path optimizations

2011-01-10 Thread Walukiewicz, Miroslaw
...@vger.kernel.org] On Behalf Of Roland Dreier Sent: Wednesday, January 05, 2011 7:17 PM To: Walukiewicz, Miroslaw Cc: Or Gerlitz; Jason Gunthorpe; Hefty, Sean; linux-rdma@vger.kernel.org Subject: Re: ibv_post_send/recv kernel path optimizations The patch for ofed-1.5.3 looks like below. I will try

Re: ibv_post_send/recv kernel path optimizations

2011-01-10 Thread Roland Dreier
You are right that the most of the speed-up is coming from avoid semaphores, but not only. From the oprof traces, the semaphores made half of difference. The next one was copy_from_user and kmalloc/kfree usage (in my proposal - shared page method is used instead) OK, but in any

Re: ibv_post_send/recv kernel path optimizations

2011-01-05 Thread Roland Dreier
The patch for ofed-1.5.3 looks like below. I will try to push it to kernel.org after porting. Now an uverbs post_send/post_recv path is modified to make pre-lookup for RAW_ETH QPs. When a RAW_ETH QP is found the driver specific path is used for posting buffers. for

Re: ibv_post_send/recv kernel path optimizations

2010-12-27 Thread Or Gerlitz
Jason Gunthorpe wrote: Walukiewicz, Miroslaw wrote: called for many QPs, there is a single entry point to ib_uverbs_post_send using write to /dev/infiniband/uverbsX. In that case there is a lookup to QP store (idr_read_qp) necessary to find a correct ibv_qp Structure, what is a big time

RE: ibv_post_send/recv kernel path optimizations

2010-12-27 Thread Walukiewicz, Miroslaw
; Hefty, Sean; linux-rdma@vger.kernel.org Subject: Re: ibv_post_send/recv kernel path optimizations Jason Gunthorpe wrote: Walukiewicz, Miroslaw wrote: called for many QPs, there is a single entry point to ib_uverbs_post_send using write to /dev/infiniband/uverbsX. In that case there is a lookup

Re: ibv_post_send/recv kernel path optimizations

2010-12-27 Thread Or Gerlitz
On 12/27/2010 5:18 PM, Walukiewicz, Miroslaw wrote: I implemented the very small hash table and I achieved performance comparable to previous solution. Just to clarify, when saying achieved performance comparable to previous solution you refer to the approach which bypasses uverbs on the

RE: ibv_post_send/recv kernel path optimizations

2010-12-27 Thread Walukiewicz, Miroslaw
@vger.kernel.org Subject: Re: ibv_post_send/recv kernel path optimizations On 12/27/2010 5:18 PM, Walukiewicz, Miroslaw wrote: I implemented the very small hash table and I achieved performance comparable to previous solution. Just to clarify, when saying achieved performance comparable

RE: ibv_post_send/recv kernel path optimizations

2010-12-14 Thread Walukiewicz, Miroslaw
Dreier Cc: Roland Dreier; Hefty, Sean; linux-rdma@vger.kernel.org Subject: Re: ibv_post_send/recv kernel path optimizations On 11/26/2010 1:56 PM, Walukiewicz, Miroslaw wrote: Form the trace it looks like the __up_read() - 11% wastes most of time. It is called from idr_read_qp when a put_uobj_read

Re: ibv_post_send/recv kernel path optimizations

2010-12-14 Thread Jason Gunthorpe
On Tue, Dec 14, 2010 at 02:12:56PM +, Walukiewicz, Miroslaw wrote: Or, I looked into shared page approach of passing post_send/post_recv info. I still have some concerns. The shared page must be allocated per QP and there should be a common way to allocate such page for each driver.

RE: ibv_post_send/recv kernel path optimizations

2010-12-08 Thread Walukiewicz, Miroslaw
kernel path optimizations On 11/26/2010 1:56 PM, Walukiewicz, Miroslaw wrote: Form the trace it looks like the __up_read() - 11% wastes most of time. It is called from idr_read_qp when a put_uobj_read is called. if (copy_from_user(cmd, buf, sizeof cmd)) - 5% it is called twice from

Re: ibv_post_send/recv kernel path optimizations

2010-12-08 Thread Jason Gunthorpe
On Wed, Dec 08, 2010 at 12:14:51PM +, Walukiewicz, Miroslaw wrote: Or, I don't see why the ib uverbs flow (BTW - the data path has nothing to do with the rdma_cm, you're working with /dev/infiniband/uverbsX), can't be enhanced e.g to support shared-page which is allocated mmaped

Re: ibv_post_send/recv kernel path optimizations

2010-12-08 Thread Roland Dreier
The problem that I see is that the mmap is currently used for mapping of doorbell page in different drivers. The driver can use different offsets into the file to map different things. For example I believe ehca, ipath and qib already do this. I cannot find a specific code for

RE: ibv_post_send/recv kernel path optimizations (was: uverbs: handle large number of entries)

2010-11-26 Thread Walukiewicz, Miroslaw
, November 25, 2010 4:01 PM To: Walukiewicz, Miroslaw Cc: Jason Gunthorpe; Roland Dreier; Roland Dreier; Hefty, Sean; linux-rdma@vger.kernel.org Subject: Re: ibv_post_send/recv kernel path optimizations (was: uverbs: handle large number of entries) Jason Gunthorpe wrote: Hmm, considering your

Re: ibv_post_send/recv kernel path optimizations (was: uverbs: handle large number of entries)

2010-11-25 Thread Or Gerlitz
Jason Gunthorpe wrote: Hmm, considering your list is everything but Mellanox, maybe it makes much more sense to push the copy_to_user down into the driver - ie a ibv_poll_cq_user - then the driver can construct each CQ entry on the stack and copy it to userspace, avoid the double copy,