Shirley After completion handler receives the notification, don't
Shirley poll the CQ right away, and wait for more WIKIs in
Shirley CQ. That way can reduce the CQ lock overhead.
Roland That's interesting... it makes sense, and it argues in
Roland favor of deferring CQ
And what if you comment out the line
.eh_device_reset_handler= srp_reset_device,
does that fix it?
No
Now I'm really confused.
It seems we lose the connection to the target (BTW -- do you know why
the connection is getting killed)?
So the SCSI midlayer times out commands
Bernard The assumption you have here is that one CPU is capable
Bernard of handling the completions without impacting
Bernard bandwidth. We have seen the opposite in that we end up
Bernard with one CPU pegged at high throughput. The benefit you
Bernard are working on is latency
- struct ipoib_rx_buf *rx_ring;
+ struct ipoib_rx_buf *rx_ringcacheline_aligned_in_smp;
spinlock_t tx_lock;
- struct ipoib_tx_buf *tx_ring;
+ struct ipoib_tx_buf *tx_ringcacheline_aligned_in_smp;
unsigned
Roland,
I still don't understand why splitting the CQ
allows you to use more
than one CPU to handle completions. Both CQ events get handled
on the
same CPU -- you just have more overhead in getting to the CQ event
handlers if there are two of them.
The send WC handler is different with
recv
Roland Dreier [EMAIL PROTECTED] wrote on 04/18/2006
01:45:17 PM:
Actually, do you have some explanation for why this
helps performance?
My intuition would be that it just generates more
interrupts for the
same workload.
The only lock contension in IPoIB I saw is tx_lock. When
Shirley The send WC handler is different with recv WC
Shirley handler. Even with some overhead we do see big
Shirley improvement in bidirectional throughput.
But how? There's only one CQ interrupt handler, which can only run on
one CPU at a time. So the send WC handler and recv WC
Some application level protocols - require higher QoS levels than others
- for various communication and I/O operations.
For example, cluster inter-node health msgs have fixed latency
requirements that if exceeded may result in unexpected node removals
from the cluster.
Are there any
On Wed, 5 Apr 2006, Steve Wise wrote:
James,
Running a 4 thread, 8 ep/thread dapltest (the last test in regress.sh),
I was intermittently seeing a seg fault in dapltest. This is running
over the chelsio rnic using the iwarp branch. After debugging I found
out that dapltest was freeing
On Wed, Apr 19, 2006 at 10:10:36AM -0400, Bernard King-Smith wrote:
The benefit you are
working on is latency will be faster if we handle both send and receive
processing off the same thread/interrupt, but you have to balance that with
bandwidth limitations. You think 4X has a bandwdith
This discussion assumes a single fabric (e.g IB, or iWARP, etc) for
network and file I/O between a set of nodes sharing storage.
On Wed, 2006-04-19 at 12:38 -0400, Richard Frank wrote:
Some application level protocols - require higher QoS levels than others
- for various communication and I/O
[EMAIL PROTECTED] wrote:
Some application level protocols - require higher QoS levels than
others - for various communication and I/O operations.
For example, cluster inter-node health msgs have fixed
latency requirements that if exceeded may result in
unexpected node removals from the
Richard Frank [EMAIL PROTECTED] wrote:
Richard Are there any mechanisms available to the client process to manage
the
Richard QoS level for the various supported ULPs
(SDP,TCP,UDP,RDS,SRP,iSER,etc)
Richard either at the ULP level or some combination of process and ULP -
or
Richard perhaps even at
Hi Rick,
On 4/19/06, Richard Frank [EMAIL PROTECTED] wrote:
Some application level protocols - require higher QoS levels than others
- for various communication and I/O operations.
For example, cluster inter-node health msgs have fixed latency
requirements that if exceeded may result in
[EMAIL PROTECTED] wrote:
Hi Rick,
On 4/19/06, Richard Frank [EMAIL PROTECTED] wrote:
Some application level protocols - require higher QoS levels than
others - for various communication and I/O operations.
For example, cluster inter-node health msgs have fixed latency
requirements that
Roland Dreier [EMAIL PROTECTED] wrote on 04/19/2006
09:36:29 AM:
But if the send CQ handler is running, the recv CQ handler can't run
anyway, since there's only one interrupt which is serialized to run
on
one CPU at a time.
I thought CQ handler is called in the none interrupt
context. Why in
Oops. You mean change priv to:
struct ipoib_rx_buf *rx_ring cacheline_aligned_in_smp;
struct ib_wc recv_ibwc[IPOIB_NUM_RECV_WC];
spinlock_t tx_lock; cacheline_aligned_in_smp;
struct ipoib_tx_buf *tx_ring;
unsigned tx_head;
unsigned tx_tail;
struct ib_sgetx_sge;
Hello Grant,
[EMAIL PROTECTED] wrote on 04/19/2006
09:42:26 AM:
I've looked at this tradeoff pretty closely with ia64 (1.5Ghz)
by pinning netperf to a different CPU than the one handling interrupts.
By moving netperf RX traffic off the CPU handling interrupts,
the 1.5Ghz ia64 box goes from
hello,
i'm writing sample kernel ulp driver to get me acquainted
with openib stack on linux kernel 2.6.16.2 (fedora 5) with
openib gen 2 stack checkout from openib.org website.
the setup is two nodes with point-to-point connection, viz.
primary secondary node. the secondary node starts in a
Shirley I thought CQ handler is called in the none interrupt
Shirley context. Why in recv CQ use netif_rx_ni() anyway?
With the mthca driver, CQ is definitely called directly from the CQ
event interrupt. ehca may be different -- I need to look at the code.
The _ni variant is used to
struct ipoib_rx_buf *rx_ringcacheline_aligned_in_smp;
struct ib_wc recv_ibwc[IPOIB_NUM_RECV_WC];
spinlock_t tx_lock; cacheline_aligned_in_smp;
struct ipoib_tx_buf *tx_ring;
unsigned tx_head;
unsigned tx_tail;
struct ib_sge
http://openib.org/bugzilla/show_bug.cgi?id=42
Summary: OFED 1.0 rc3: infinibandeventfs warning on RHEL4 U2
Product: OpenFabrics Linux
Version: 1.0rc2
Platform: All
OS/Version: Other
Status: NEW
Severity: normal
Linus, please pull from
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus
This tree is also available from kernel.org mirrors at:
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
for-linus
This is mostly (by total lines of patch) cleanups
http://openib.org/bugzilla/show_bug.cgi?id=42
--- Additional Comments From [EMAIL PROTECTED] 2006-04-19 11:56 ---
Created an attachment (id=11)
-- (http://openib.org/bugzilla/attachment.cgi?id=11action=view)
Debug log console asks for
--- You are receiving this mail because:
Roland Dreier wrote:
And what if you comment out the line
.eh_device_reset_handler= srp_reset_device,
does that fix it?
No
Now I'm really confused.
Me too.
It seems we lose the connection to the target (BTW -- do you know why
the connection is getting killed)?
I
I'm using the RC2 binary RPMs for SuSE 10 from red-bean.
I've tried the 4 MVAPICH source and binary RPMs from the OpenIB wiki,
and the source RPM from the openib.org downloads; all have symbol
conflicts with the verbs header files.
Is there an MVAPICH RPM that matches the RC2 SuSE 10 RPMs?
I'd like to get some feedback regarding the following approach to supporting
multicast groups in userspace, and in particular for MPI. Based on side
conversations, I need to know if this approach would meet the needs of MPI
developers.
To join / leave a multicast group, my proposal is to add the
[sorry if this forum is the wrong place to take this up]
Grant Grundler [EMAIL PROTECTED] wrote :
Grant [ I've probably posted some of these results before...here's another
Grant take on this problem. ]
Hopefully not rehashing too much old information.
Grant I'm expect splitting the RX/TX
James Lentini wrote:
On Tue, 18 Apr 2006, Dotan Barak wrote:
On Monday 17 April 2006 23:46, James Lentini wrote:
On Sun, 16 Apr 2006, Dotan Barak wrote:
On Wednesday 12 April 2006 17:50, James Lentini wrote:
OpenIB-cma u1.2 nonthreadsafe default
susan wrote:
the problem that i am running into is that the ib_send_cm_req
api fails with errorno 22. i'm using local id to make the
connection on port 1. ib_send_cm_req() api calls function
cm_init_av_by_path(), which calls ib_find_cached_gid().
function ib_find_cached_gid() fails because it
Roland Dreier [EMAIL PROTECTED] wrote on 04/19/2006
11:35:16 AM:
With the mthca driver, CQ is definitely called directly from the CQ
event interrupt. ehca may be different -- I need to look at
the code.
- R.
Is that possible to move the CQ handler out of interrupt
context in mthca?
thanks
OK. I am going to split the patch without
splitting CQ first.
WC handler is called in the interrupt
context, it is a myth
to have bidirectional performance improvement
with splitting CQ.
More investigation is needed.
If WC handler can be moved from interrupt
context, splitting CQ
is still an
http://openib.org/bugzilla/show_bug.cgi?id=42
[EMAIL PROTECTED] changed:
What|Removed |Added
AssignedTo|[EMAIL PROTECTED] |[EMAIL PROTECTED]
Priority|P3
Shirley Is that possible to move the CQ handler out of interrupt
Shirley context in mthca?
Yes, but that seems like the wrong thing to do. I think it would be
better to let consumers that want the increased latency defer things.
- R.
___
Shirley OK. I am going to split the patch without splitting CQ
Shirley first. WC handler is called in the interrupt context, it
Shirley is a myth to have bidirectional performance improvement
Shirley with splitting CQ. More investigation is needed.
But you did see performance
To manage QoS the question is who knows about all the traffic
traversing a
specific adapter. For most kernel traversing protocols (IP, iSER,
iSCSI,
etc) you can sometimes do this in the device driver, where you can
examine
the headers as a packet is expedited and manage it there.
Chris wrote,
Is there an MVAPICH RPM that matches the RC2 SuSE 10 RPMs?
I think that the IBED rc3 release has RPMs for Mvapich and OpenMPI that
you might try.
https://openib.org/svn/gen2/branches/1.0/ibed/releases/
woody
___
openib-general
Dotan Barak wrote:
Can you attach to the server process with gdb and get me a
back trace from each of the threads?
What does driver IBED-1.0-rc3 consist of?
Thanks,
-arlin
Here is a back trace of the hanged process:
(gdb) bt
#0 0x2b31c86a in
Roland Dreier [EMAIL PROTECTED] wrote on 04/19/2006
02:34:24 PM:
Shirley OK. I am going to split the patch without
splitting CQ
Shirley first. WC handler is called in the
interrupt context, it
Shirley is a myth to have bidirectional performance
improvement
Shirley with splitting CQ.
On Wed, Apr 19, 2006 at 03:10:29PM -0400, Bernard King-Smith wrote:
Grant I'm expect splitting the RX/TX completions would achieve something
Grant similar since we are just slicing the same problem from a
different
Grant angle. Apps typically do both RX and TX and will be running on one
Roland Dreier [EMAIL PROTECTED] wrote on 04/19/2006
03:57:52 PM:
Shirley Also I am working on removal tx_ring, which
requires CQ
Shirley to be splited to remove recv WC wiki flag
IPOIB_OP_RECV.
How are you removing the TX ring? Where do you store the skbs
and DMA
mappings to be freed
Roland,
I can send half of the patch for pre-review
which I have removed the tx_ring.
But I was little bit confused about
ah-last_send. Why not use reference count instead?
Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
Shirley But I was little bit confused about ah-last_send. Why
Shirley not use reference count instead?
The reasons may be lost in the mists of time, but I think using
last_send saves us from having to decrement a reference count when
sends complete. Since last_send is only set in the
Shirley Since I haven't found any kernel use 128 bit address, I
Shirley use wr_id to save skb address, DMA mapping and other
Shirley stuffs are saved in skb-cb, which is the private data
Shirley for each protocal layer in skb. Same for rx_ring, so
Shirley rx_buff and tx_buff is
Hi Sean,
I have a few basic questions:
1. Does the API which waits for join to complete
ensure that the multicast forwarding tables in the switches have been
updated. This is one of the main problems that we had studied:
(Please refer to the following EURO PVM/MPI paper for details)
Roland Dreier [EMAIL PROTECTED] wrote on 04/19/2006
04:36:50 PM:
Shirley Since I haven't found any kernel use 128
bit address, I
Shirley use wr_id to save skb address, DMA mapping
and other
Shirley stuffs are saved in skb-cb, which is
the private data
Shirley for each protocal layer
amith rajith mamidala wrote:
1. Does the API which waits for join to complete
ensure that the multicast forwarding tables in the switches have been
updated. This is one of the main problems that we had studied:
The join is asynchronous. Completion of the join would not be reported until
the
aau see mississippi a gyroscope and trifle it's
hydroxide see aviatrix be rooseveltian see octillion not
colza , egotist some hindrance try pitney in
lima some artery not bentley see participle a
cyclist some allow may characteristic not annual on
ct and elastic or delegate and whenever see
ammerman not causation see choose not refractometer some
nrc a gunny and citadel and boatyard see
emaciate ! nat in boyhood it conjugal the
statistician it grassy try recur or ambrosia !
dido may combat not dumb try atonal it
pony it iran on imperceivable a abner !
indignation , fire in
49 matches
Mail list logo