Re: [openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-20 Thread Grant Grundler
Hi Shirley, On Wed, Apr 19, 2006 at 11:31:32AM -0700, Shirley Ma wrote: ... By moving netperf RX traffic off the CPU handling interrupts, the 1.5Ghz ia64 box goes from 2.8 Gb/s to around 3.5 Gb/s. But the service demand (CPU time per KB payload) goes up from ~2.3 usec/KB to ~3.1 usec/KB

Re: [openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-20 Thread Shirley Ma
Hello Grant, Grant Grundler [EMAIL PROTECTED] wrote on 04/20/2006 08:16:27 AM: Was this measured using ehca? If so, the result implies at least two interrupt vectors are used. And it seems reasonable for IPoIB to tune for that even if it costs mthca a slight amount of overhead. Roland

Re: [openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-19 Thread Bernard King-Smith
Shirley After completion handler receives the notification, don't Shirley poll the CQ right away, and wait for more WIKIs in Shirley CQ. That way can reduce the CQ lock overhead. Roland That's interesting... it makes sense, and it argues in Roland favor of deferring CQ

Re: [openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-19 Thread Roland Dreier
Bernard The assumption you have here is that one CPU is capable Bernard of handling the completions without impacting Bernard bandwidth. We have seen the opposite in that we end up Bernard with one CPU pegged at high throughput. The benefit you Bernard are working on is latency

Re: [openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-19 Thread Shirley Ma
Roland, I still don't understand why splitting the CQ allows you to use more than one CPU to handle completions. Both CQ events get handled on the same CPU -- you just have more overhead in getting to the CQ event handlers if there are two of them. The send WC handler is different with recv

Re: [openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-19 Thread Roland Dreier
Shirley The send WC handler is different with recv WC Shirley handler. Even with some overhead we do see big Shirley improvement in bidirectional throughput. But how? There's only one CQ interrupt handler, which can only run on one CPU at a time. So the send WC handler and recv WC

Re: [openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-19 Thread Grant Grundler
On Wed, Apr 19, 2006 at 10:10:36AM -0400, Bernard King-Smith wrote: The benefit you are working on is latency will be faster if we handle both send and receive processing off the same thread/interrupt, but you have to balance that with bandwidth limitations. You think 4X has a bandwdith

Re: [openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-19 Thread Shirley Ma
Hello Grant, [EMAIL PROTECTED] wrote on 04/19/2006 09:42:26 AM: I've looked at this tradeoff pretty closely with ia64 (1.5Ghz) by pinning netperf to a different CPU than the one handling interrupts. By moving netperf RX traffic off the CPU handling interrupts, the 1.5Ghz ia64 box goes from

[openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Bernard King-Smith
Shirley Some tests have been done over mthca and Shirley ehca. Unidirectional stream test, gains up to 15% Shirley throughout with this patch on systems over 4 cpus. Shirley Bidirectional could gain more. People might get different Shirley performance improvement number under

Re: [openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Roland Dreier
Bernard On a multiple CPU system looking at TOP you see one Bernard process consuming a full CPU. This happens to be the Bernard thread handling completion queue entries. I suggested Bernard that we look at separate threads handing send completions Bernard vs. receive

[openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Shirley Ma
Bernie, Bernard King-Smith/Poughkeepsie/IBM wrote on 04/18/2006 01:48:28 PM: When we ran with the split completion queue patch, we no longer see one process pegging the CPU at 100% and we get a speedup of 65% going from STREAM to Duplex. Without the split completion queue, we only saw a

[openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Roland Dreier
Shirley This is another patch to gain huge performance on ehca Shirley driver. I haven't submitted yet. :-) What does the patch do? - R. ___ openib-general mailing list openib-general@openib.org

[openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Shirley Ma
Roland Dreier [EMAIL PROTECTED] wrote on 04/18/2006 02:33:55 PM: Shirley This is another patch to gain huge performance on ehca Shirley driver. I haven't submitted yet. :-) What does the patch do? - R. The patch allows you tuning send/recv NUM_WC per poll and add some cycles before

[openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Roland Dreier
Shirley The patch allows you tuning send/recv NUM_WC per poll and Shirley add some cycles before polling to sync with the hardware. I have no problem increasing NUM_WC to something much bigger. What do you mean by add some cycles before polling? - R.

[openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Shirley Ma
Roland Dreier [EMAIL PROTECTED] wrote on 04/18/2006 02:49:34 PM: Shirley The patch allows you tuning send/recv NUM_WC per poll and Shirley add some cycles before polling to sync with the hardware. I have no problem increasing NUM_WC to something much bigger. What do you mean by add

[openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Roland Dreier
Shirley After completion handler receives the notification, don't Shirley poll the CQ right away, and wait for more WIKIs in Shirley CQ. That way can reduce the CQ lock overhead. That's interesting... it makes sense, and it argues in favor of deferring CQ polling to a kernel thread.

[openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Shirley Ma
Roland Dreier [EMAIL PROTECTED] wrote on 04/18/2006 03:01:57 PM: Shirley After completion handler receives the notification, don't Shirley poll the CQ right away, and wait for more WIKIs in Shirley CQ. That way can reduce the CQ lock overhead. That's interesting... it makes sense,

Re: [openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Roland Dreier
Shirley After completion handler receives the notification, don't Shirley poll the CQ right away, and wait for more WIKIs in Shirley CQ. That way can reduce the CQ lock overhead. Roland That's interesting... it makes sense, and it argues in Roland favor of deferring CQ polling

[openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Roland Dreier
Shirley It's on mthca. If you are interested. I can submit a test Shirley patch for your experimental. Sure, that would be useful. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Shirley Ma
Roland Dreier [EMAIL PROTECTED] wrote on 04/18/2006 03:06:33 PM: And actually it argues against splitting the CQ, because having one CQ increases the number of CQ entries that we have a chance to poll at any one time, by lumping send and receive completions together... - R. The send needs

[openib-general] Re: openib-general Digest, Vol 22, Issue 114

2006-04-18 Thread Shirley Ma
Roland Dreier [EMAIL PROTECTED] wrote on 04/18/2006 03:07:06 PM: Shirley It's on mthca. If you are interested. I can submit a test Shirley patch for your experimental. Sure, that would be useful. - R. It is built on top of splitting CQ patch. I will send you the patch tomorrow.