Roland,
Thanks. You are right. After a clean
build, it works. I had too many build trees on each node, different kernels,
different patches, different SVN trees. which confused me. :(
I will clean my patches and start to
submit them for you to review.
Thanks
Shirley Ma
IBM Linux Technology Cent
> Retried and rebuilt SMP (32) kernel. Got below error. There is no problem
> on 64 kernel on other node. The mthca driver is from SVN 69XX tree.
>
> ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
> ib_mthca: Initializing :05:00.0
> ACPI: PCI Interrupt :05:00.0[A
Roland,
Retried and rebuilt SMP (32) kernel.
Got below error. There is no problem on 64 kernel on other node. The mthca
driver is from SVN 69XX tree.
ib_mthca: Mellanox InfiniBand HCA driver
v0.08 (February 14, 2006)
ib_mthca: Initializing :05:00.0
ACPI: PCI Interrupt :05:00.0[A]
-> GSI
> May 4 13:42:09 elm3b100 kernel: ib_mthca: Initializing ?<8e>,?
This is coming from
printk(KERN_INFO PFX "Initializing %s\n",
pci_name(pdev));
so something is screwed up. Perhaps your ib_mthca module needs to be
recompiled to match the SMP kernel?
Shirley> I
Hello Roland,
Finally I finished some of my tests
over UP<->UP, UP<->SMP. One of mthca driver couldn't up on
SMP after I updated firmware to 3.4.0. I got below error while loading
ib_mthca module:
May 4 13:42:09 elm3b100 kernel:
ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
On 4/27/06, Shirley Ma <[EMAIL PROTECTED]> wrote:
How many percentage throughput you got from your NAPI implementation?
Shirley,
I couldn't find our exact test results, we did it pretty long ago.
As far as I remember, we got several lonely percents, up to 3-4%.
No surprise that you get bette
Leonid Arsh <[EMAIL PROTECTED]> wrote on
04/27/2006 01:24:49 AM:
> Shirley Ma wrote:
> > Without seeing your patch, I coudn't say anything. I guess your
> > implemention
> > didn't handler multithreads simultanously. If you only have one
> > interrupt handler,
> > couldn't see any reason you ca
Shirley Ma wrote:
Without seeing your patch, I coudn't say anything. I guess your
implemention
didn't handler multithreads simultanously. If you only have one
interrupt handler,
couldn't see any reason you can get better performance number with
splitting CQs.
Shirley, you are right.
I just wan
Leonid,
> As I mentioned I will test my patch
to see how's the performance.
I have tested the prototype patch with
splitting CQs + work queue
in IPoIB layer, under 2-4 cpus, netperf
throughput got more than 10%
improvement on mthca without msi_x enabled.
I hit a slab cache bug on ehca. I need
Leonid,
Leonid Arsh <[EMAIL PROTECTED]> wrote on
04/26/2006 04:33:57 AM:
> Shirley Ma wrote:
> >
> > I am working on a patch to use multiple threads work queue for
ipoib
> > completion polling. Have you tried to this on your driver?
> No, we made some experiments with NAPI, tried also to split
Shirley Ma wrote:
I am working on a patch to use multiple threads work queue for ipoib
completion polling. Have you tried to this on your driver?
No, we made some experiments with NAPI, tried also to split CQ
(as I already wrote, this didn't help with tasklet completion handling.)
We also tri
Leonid,
There is no doubt NAPI helping interrupts
mitigation, throughput, packets out of order and balance between latency
and throughput. But NAPI might not help all of the devices since different
drivers have different implementations for CQ completion handler.
I am working on a patch to use
Please correct me if I'm mistaken.
I think that soft IRQ is used by the kernel for handling the received
packets - NET_RX_SOFTIRQ.
For the non-NAPI case, we poll the CQ for completions and call
netif_rx() in the completion notification context -
the HW interrupt context in case of mthca.
neti
You are right - different HCAs and adapters may need specific tuning.
The Mellanox VAPI adapter, handling completions in a tasklet, will
definitely suffer from CQ splitting,
since there may be only one tasklet running across all the CPUs.
The mtcha adapter is a completely different case - the co
Leonid,
> Have you tried to use other SOFTIRQ instead of
TASKLET_SOFTIRQ?
I have looked at the interrupts.h below:
HI_SOFTIRQ=0,
TIMER_SOFTIRQ,
NET_TX_SOFTIRQ,
NET_RX_SOFTIRQ,
BLOCK_SOFTIRQ,
TASKLET_SOFTIRQ
I could't see any softirq we could use
for IB
Hello Leonid,
Leonid Arsh <[EMAIL PROTECTED]> wrote on
04/23/2006 06:38:00 AM:
> Shirley,
>
> some additional information you may be interested:
>
> According to our experience with the Voltaire IPoIB driver,
> splitting CQ harmed the throughput (we checked with the iperf
> application,
Lenoid Arsh wrote:
Lenoid> Shirley,
Lenoid> some additional information you may be interested:
Lenoid> According to our experience with the Voltaire IPoIB driver,
Lenoid> splitting CQ harmed the throughput (we checked with the iperf
Lenoid> application, UDP mode.) Splitting the the CQ caus
Shirley,
some additional information you may be interested:
According to our experience with the Voltaire IPoIB driver,
splitting CQ harmed the throughput (we checked with the iperf
application, UDP mode.) Splitting the the CQ caused more interrupts,
context switches and CQ polls.
Note,
Shirley> In this case, post_send() should return ENOMEM. I didn't
Shirley> see any error returns.
OK, it's just a guess without seeing the patch ;)
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openi
Roland Dreier <[EMAIL PROTECTED]> wrote on 04/20/2006
09:05:41 AM:
> Shirley> It helps the performance about 10% for the
touch
> Shirley> netperf/netserver test, then hit driver
errors. I notice
> Shirley> that the send is faster than before. Let
me send you the
> Shirley> patch t
Shirley> It helps the performance about 10% for the touch
Shirley> netperf/netserver test, then hit driver errors. I notice
Shirley> that the send is faster than before. Let me send you the
Shirley> patch tomorrow, maybe you have the hints to identify the
Shirley> problem.
With
Roland Dreier <[EMAIL PROTECTED]> wrote on 04/19/2006
04:36:50 PM:
> Shirley> Since I haven't found any kernel use 128
bit address, I
> Shirley> use wr_id to save skb address, DMA mapping
and other
> Shirley> stuffs are saved in skb->cb, which is
the private data
> Shirley> for ea
Shirley> Since I haven't found any kernel use 128 bit address, I
Shirley> use wr_id to save skb address, DMA mapping and other
Shirley> stuffs are saved in skb->cb, which is the private data
Shirley> for each protocal layer in skb. Same for rx_ring, so
Shirley> rx_buff and tx_bu
Shirley> But I was little bit confused about ah->last_send. Why
Shirley> not use reference count instead?
The reasons may be lost in the mists of time, but I think using
last_send saves us from having to decrement a reference count when
sends complete. Since last_send is only set in the s
Roland,
I can send half of the patch for pre-review
which I have removed the tx_ring.
But I was little bit confused about
ah->last_send. Why not use reference count instead?
Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
Roland Dreier <[EMAIL PROTECTED]> wrote on 04/19/2006
03:57:52 PM:
> Shirley> Also I am working on removal tx_ring, which
requires CQ
> Shirley> to be splited to remove recv WC wiki flag
IPOIB_OP_RECV.
>
> How are you removing the TX ring? Where do you store the skbs
and DMA
> mappings t
Shirley> Also I am working on removal tx_ring, which requires CQ
Shirley> to be splited to remove recv WC wiki flag IPOIB_OP_RECV.
How are you removing the TX ring? Where do you store the skbs and DMA
mappings to be freed when a send completes?
- R.
_
Roland Dreier <[EMAIL PROTECTED]> wrote on 04/19/2006
02:34:24 PM:
> Shirley> OK. I am going to split the patch without
splitting CQ
> Shirley> first. WC handler is called in the
interrupt context, it
> Shirley> is a myth to have bidirectional performance
improvement
> Shirley> w
Shirley> OK. I am going to split the patch without splitting CQ
Shirley> first. WC handler is called in the interrupt context, it
Shirley> is a myth to have bidirectional performance improvement
Shirley> with splitting CQ. More investigation is needed.
But you did see performance
OK. I am going to split the patch without
splitting CQ first.
WC handler is called in the interrupt
context, it is a myth
to have bidirectional performance improvement
with splitting CQ.
More investigation is needed.
If WC handler can be moved from interrupt
context, splitting CQ
is still an app
> struct ipoib_rx_buf *rx_ringcacheline_aligned_in_smp;
> struct ib_wc recv_ibwc[IPOIB_NUM_RECV_WC];
>
> spinlock_t tx_lock; cacheline_aligned_in_smp;
> struct ipoib_tx_buf *tx_ring;
> unsigned tx_head;
> unsigned tx_tail;
> struct ib_sge
Oops. You mean change priv to:
struct ipoib_rx_buf *rx_ring cacheline_aligned_in_smp;
struct ib_wc recv_ibwc[IPOIB_NUM_RECV_WC];
spinlock_t tx_lock; cacheline_aligned_in_smp;
struct ipoib_tx_buf *tx_ring;
unsigned tx_head;
unsigned tx_tail;
struc
> - struct ipoib_rx_buf *rx_ring;
> + struct ipoib_rx_buf *rx_ringcacheline_aligned_in_smp;
>
> spinlock_t tx_lock;
> - struct ipoib_tx_buf *tx_ring;
> + struct ipoib_tx_buf *tx_ringcacheline_aligned_in_smp;
> unsigned
33 matches
Mail list logo