Re: [openib-general] Re: IBM eHCA testing..

2005-10-13 Thread Hal Rosenstock
On Thu, 2005-10-13 at 18:46, Troy Benjegerdes wrote: > I'm also attaching part of an opensm log file. > > (the full copy is at http://scl.ameslab.gov/~troy/osm-ehca.log ) > > The IBM galaxy adapters are at: > Initial path: [0][1][16] > Initial path: [0][1][13] > The OpenSM is just s

Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Roland Dreier
Helen> Not in realtime. My observations were made after the fact. Helen> I supose I can launch another test and watch the cunter in Helen> realtime if you believe that is necessary? That might be interesting. Assuming the HCA continues to work fine, and IPoIB recovers, the only theor

Re: [openib-general] Re: [PATCH] [SA Query] Change sa_query MAD allocation

2005-10-13 Thread Sean Hefty
Roland Dreier wrote: Thanks, I'll read this over. What's the motivation here? To shift over to ib_create_send_mad() so that all the MAD-related DMA mapping stuff is in one place, to make it easier to fix? Yes - the motivation is to fix the DMA mapping issue that you pointed out by changing i

Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Helen Chen
Roland, >From [EMAIL PROTECTED] Thu Oct 13 16:19:30 2005 > >Helen> BTW, the state of the IPoIB network seemed fine after the >Helen> failed test, nd the mthca counters are moving up nicely. > >Even on the server on3-ib? Yes, even on the server on3-ib. > >Helen> Do you still think thi

[openib-general] Re: [PATCH] [SA Query] Change sa_query MAD allocation

2005-10-13 Thread Roland Dreier
Thanks, I'll read this over. What's the motivation here? To shift over to ib_create_send_mad() so that all the MAD-related DMA mapping stuff is in one place, to make it easier to fix? - R. ___ openib-general mailing list openib-general@openib.org http

[openib-general] [PATCH] [SA Query] Change sa_query MAD allocation

2005-10-13 Thread Sean Hefty
This patch changes sa_query to allocate MADs using the ib_create_send_mad() routine. The intent behind this change was to eventually change ib_post_send_mad() to take an ib_send_mad_buf as input, but see the "DMA mapping abuses in MAD layer" thread. We may want to go with an alternate solution.

Re: [openib-general] DMA mapping abuses in MAD layer

2005-10-13 Thread Roland Dreier
Sean> Any preference to pursuing this change or modifying Sean> ib_post_send_mad to take an ib_mad_send_buf? I think it's going to be confusing to cast a virtual address to a long and then ignore the lkey field. So I would go with a new interface not built on ib_sge. On the other hand, m

Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Roland Dreier
Helen> BTW, the state of the IPoIB network seemed fine after the Helen> failed test, nd the mthca counters are moving up nicely. Even on the server on3-ib? Helen> Do you still think this is a crash of the HCA firmware? Helen> Should I call Mellanox? Not if IPoIB is working on the

Re: [openib-general] Re: IBM eHCA testing..

2005-10-13 Thread Shirley Ma
Thanks. It's strange the copy-paste gave an extra 1. Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/list

Re: [openib-general] DMA mapping abuses in MAD layer

2005-10-13 Thread Sean Hefty
Sean Hefty wrote: Does anyone else have any other ideas on how to fix this issue? The current MAD interface requires the user to have code similar to this: send_buf->sge.addr = dma_map_single(mad_agent->device->dma_device, buf, buf_size, DMA_

Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Helen Chen
Roland, Ci So you are right, it is not a moving target. After repeating the IOZONE tests several times, I narrowed down the culprit, server on3-ib. Parallel I/O had made it a bit difficult to chase it down :-( BTW, the state of the IPoIB network seemed fine after the failed test, nd the mth

[PATCH, please test] IPoIB: recycle RX bufs (was: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer)

2005-10-13 Thread Roland Dreier
Roland> My plan is to change the receive handling of IPoIB Roland> slightly, so that if it can't allocate a new receive Roland> buffer, it reposts the old buffer and drops the packet it Roland> just received. Here's a patch that changes IPoIB to use this scheme. This should be muc

Re: [openib-general] Re: IBM eHCA testing..

2005-10-13 Thread Roland Dreier
> http://ozlabs.org/pipermail/linuxppc64-dev/2005-July/004662.html1 delete the '1' from the end of the URL... - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visi

Re: [openib-general] Re: [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
Robert> Since the rest of the patch needed to get this working Robert> isn't applied to either the trunk or the ipath branch yet Robert> (and since the branch will be going away shortly), can you Robert> just apply this patch to the trunk when you do the merge? Sure, no problem.

Re: [openib-general] Re: IBM eHCA testing..

2005-10-13 Thread Shirley Ma
I am not sure whether something related to dma_addr_t. Could you please try below patch? >  http://ozlabs.org/pipermail/linuxppc64-dev/2005-July/004662.html1 Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638__

Re: [openib-general] Re: [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Robert Walsh
> And here's a patch to ipath to make it work with the uverbs command mask... Roland, Since the rest of the patch needed to get this working isn't applied to either the trunk or the ipath branch yet (and since the branch will be going away shortly), can you just apply this patch to the trunk when

Re: [openib-general] Re: IBM eHCA testing..

2005-10-13 Thread Troy Benjegerdes
On Wed, Oct 12, 2005 at 01:04:37PM +0200, IBMEHCA DD wrote: > I just released the ehca2_0028 which uses svn 3615 on > https://sourceforge.net/projects/ibmehcad/ > As you might notice the license already has changed to the openib.org > license. > > With 2.6.13 we had the non-issue that our maun f

Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Roland Dreier
Helen> It doesn't seem like shrinking the TCP window had helped. Helen> I captured the Dmesg log from Lustre server and associated Helen> client reporting IOZONE error. What is the state of the system after you start seeing the ib0 transmit time out messages? Does IPoIB work at all?

Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Helen Chen
Roland, It doesn't seem like shrinking the TCP window had helped. I captured the Dmesg log from Lustre server and associated client reporting IOZONE error. BTW, this problem is a moving target so it is hard to believe that it is hardware related(?) BTW, I am using the mellanox DDR switch and HCA

Re: [openib-general] Re: [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
And here's a patch to ipath to make it work with the uverbs command mask... Index: infiniband/hw/ipath/ib_ipath/ipath_openib.c === --- infiniband/hw/ipath/ib_ipath/ipath_openib.c (revision 3758) +++ infiniband/hw/ipath/ib_ipath/ipath_

Re: [openib-general] Re: [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
OK, here's a new patch that adds a mask of allowed userspace commands set by the kernel low-level driver. Thanks, good catch Michael... - R. --- include/rdma/ib_user_verbs.h(revision 3707) +++ include/rdma/ib_user_verbs.h(working copy) @@ -1,6 +1,7 @@ /* * Copyright (c) 2005

[openib-general] Re: [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
Michael> What prevents the user from passing e.g. poll cq command Michael> on mthca device? If that happens, it seems that Michael> ib_poll_cq will then crash. Michael> Is there a mask somewhere that lets the device specify Michael> which uverbs commands are allowed for it? Hm

Re: [openib-general] [RFC] libibverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
Robert> Since qp_type is now in ibv_qp, it probably no longer Robert> needs to be in mthca_qp. This is just a minor Robert> optimization. Yep, I'll make that change too. - R. ___ openib-general mailing list openib-general@openib.org http:/

Re: [openib-general] [RFC] libibverbs changes for PathScale merge

2005-10-13 Thread Robert Walsh
> @@ -488,6 +489,7 @@ struct ibv_qp { > uint32_thandle; > uint32_tqp_num; > enum ibv_qp_state state; > + enum ibv_qp_typeqp_type; > > pthread_mutex_t mutex; > pthread_cond_t cond; Since qp_type is no

[openib-general] Re: [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > Subject: [RFC] Kernel uverbs changes for PathScale merge > > Here are the changes to the kernel part of userspace verbs required to > support PathScale's driver. I'm now happy with them and ready to > commit them to the svn trunk and queue them for

[openib-general] [RFC] libibverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
Here are the changes to libibverbs required to support PathScale's driver. Again, I'm happy with them and would just like to get comments on them before I commit them to svn. Thanks, Roland --- libibverbs/include/infiniband/driver.h (revision 3774) +++ libibverbs/include/infiniband/driver

[openib-general] [RFC] Kernel uverbs changes for PathScale merge

2005-10-13 Thread Roland Dreier
Here are the changes to the kernel part of userspace verbs required to support PathScale's driver. I'm now happy with them and ready to commit them to the svn trunk and queue them for 2.6.15. This will allow the PathScale hardware-specific driver to be move to the trunk as well, although quite a

Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Helen Chen
Roland, >From [EMAIL PROTECTED] Thu Oct 13 13:53:05 2005 > >Helen> Roland, Thank you for your response. That fixed my initial >Helen> buffer allocation failure. After we tuned the Lustre and >Helen> reran same IOZONE tests again, we got the following >Helen> problem. Was there a

Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Roland Dreier
Helen> Roland, Thank you for your response. That fixed my initial Helen> buffer allocation failure. After we tuned the Lustre and Helen> reran same IOZONE tests again, we got the following Helen> problem. Was there an actual network interrupt? If so, the Helen> problem is not

Re: [openib-general] Re: [PATCH] perftest/rdma_bw; add support for RDMA read and starting PSN

2005-10-13 Thread Arlin Davis
Michael S. Tsirkin wrote: Quoting r. Arlin Davis <[EMAIL PROTECTED]>: Subject: [PATCH] perftest/rdma_bw; add support for RDMA read and starting PSN Michael, The patch adds command line options for RDMA reads and starting PSN. I used these modifications to help isolate the RDMA read perform

Re: [openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Helen Chen
Roland, Thank you for your response. That fixed my initial buffer allocation failure. After we tuned the Lustre and reran same IOZONE tests again, we got the following problem. Was there an actual network interrupt? If so, the problem is not obvious now; the two nodes are pinging over IPoIB. Pl

[openib-general] Re: [PATCH] uDAPL async QP/CQ error handling fixed

2005-10-13 Thread James Lentini
On Thu, 13 Oct 2005, Arlin Davis wrote: > James, > > Patch will fix the async error handling and callback mappings. QP/CQ > error mappings were totally screwed up. Updated TODO list. > > -arlin Committed in revision 3774. ___ openib-general mailing

[openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > Subject: Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to > allocate receive buffer > > Michael> Yes, it seems that if such an allocation fails IPoIB may > Michael> never repost the receive buffer. Is that right? > > I think

[openib-general] Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Roland Dreier
Michael> Yes, it seems that if such an allocation fails IPoIB may Michael> never repost the receive buffer. Is that right? I think so. My plan is to change the receive handling of IPoIB slightly, so that if it can't allocate a new receive buffer, it reposts the old buffer and drops the pa

[openib-general] Re: [PATCH] perftest/rdma_bw; add support for RDMA read and starting PSN

2005-10-13 Thread Michael S. Tsirkin
Quoting r. Arlin Davis <[EMAIL PROTECTED]>: > Subject: [PATCH] perftest/rdma_bw; add support for RDMA read and starting PSN > > Michael, > > The patch adds command line options for RDMA reads and starting PSN. I > used these modifications to > help isolate the RDMA read performance degradation wi

[openib-general] Re: Re: ib0: ipoib_ib_post_receive failed for buf 111 ib0: failed to allocate receive buffer

2005-10-13 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > IPoIB's handling of these allocation errors can definitely be improved Yes, it seems that if such an allocation fails IPoIB may never repost the receive buffer. Is that right? -- MST ___ openib-general ma

[openib-general] [PATCH] uDAPL async QP/CQ error handling fixed

2005-10-13 Thread Arlin Davis
James, Patch will fix the async error handling and callback mappings. QP/CQ error mappings were totally screwed up. Updated TODO list. -arlin Signed-off by: Arlin Davis <[EMAIL PROTECTED]> Index: dapl/openib/TODO === --- dapl/op

RE: [openib-general] [RFC] IB address translation using ARP

2005-10-13 Thread Caitlin Bestler
I agree with Mike's analysis. But I'd also like to point out that even when source compatability is not a requirement, source familiarity is. That is, even when recoding is feasible the API should only introduce new concepts as required to improve efficiency. The shift from socket model to Q

RE: [openib-general] [RFC] IB address translation using ARP

2005-10-13 Thread Michael Krause
At 03:14 PM 10/12/2005, Caitlin Bestler wrote:   > -Original Message- > From: [EMAIL PROTECTED] > [ mailto:[EMAIL PROTECTED]] On Behalf Of Sean Hefty > Sent: Wednesday, October 12, 2005 2:36 PM > To: Michael Krause > Cc: openib-general@openib.org > Subject: Re: [openib-general] [RFC] IB

[openib-general] [PATCH] perftest/rdma_bw; add support for RDMA read and starting PSN

2005-10-13 Thread Arlin Davis
Michael, The patch adds command line options for RDMA reads and starting PSN. I used these modifications to help isolate the RDMA read performance degradation with 4.6.2 firmware. -arlin Signed-off by: Arlin Davis <[EMAIL PROTECTED]> Index: rdma_bw.c =

RE: [openib-general] QP with large starting sequence adds latencyto RDMA READ???

2005-10-13 Thread Fab Tillier
> From: Arlin Davis [mailto:[EMAIL PROTECTED] > Sent: Thursday, October 13, 2005 9:42 AM > > Sean Hefty wrote: > > > Arlin Davis wrote: > > > >> I just noticed some RDMA read performance issues that seem to be > >> related to the QP starting sequence number. If I set the starting > >> sequence to

Re: [openib-general] QP with large starting sequence adds latency to RDMA READ???

2005-10-13 Thread Arlin Davis
Sean Hefty wrote: Arlin Davis wrote: I just noticed some RDMA read performance issues that seem to be related to the QP starting sequence number. If I set the starting sequence to 1 then all is fine but if I set it to 0x1 then it seems to add ~40us to my 32KB RDMA read operation (polling

Re: [openib-general] mvapich-gen2 IA64 compile problem

2005-10-13 Thread John Partridge
Sayantan, Thanks for the reply. I was just using make in the mvapich-gen2 directory, that may call the script I don't know. I'll take a look at the doc you suggested and go through the troule shooting in there. John Sayantan Sur wrote: Hi John, * On Oct,6 John Partridge<[EMAIL PROTECTED]> wro

Re: [openib-general] Migration Solution

2005-10-13 Thread Hal Rosenstock
On Thu, 2005-10-13 at 03:10, Mohit Katiyar, Noida wrote: > Hi all, > If anyone can suggest some good possible solution for migrating from > Clients FC Switch -> SAN connection > To > Clients---> IB network---> SAN Connection It depends on your storage. There are two c

[openib-general] Re: [PATCH] [ADDR] return gateway GID for non-local IP addresses

2005-10-13 Thread Hal Rosenstock
On Wed, 2005-10-12 at 19:39, Sean Hefty wrote: > The following patch returns the GID of the IP gateway for non-local > subnet IP addresses. > > Hal, does this change look correct to you? I don't have an easy way > to test this fully. Yes, this looks right. I think the address resolution part c

[openib-general] Migration Solution

2005-10-13 Thread Mohit Katiyar, Noida
Hi all, If anyone can suggest some good possible solution for migrating from Clients FC Switch -> SAN connection To Clients---> IB network---> SAN Connection The most economical I can think of is Clients -> IB Switch > IB FC gateway---> FC Switch--