RE: question on the timewait event of the rdma-cm

2011-11-14 Thread Hefty, Sean
> Does going through timewait always holds? e.g no matter what's the > return status of rdma_disconnect and/or the status of the rdma_cm > disconnected event? It usually holds. It will fail if rdma_disconnect() is called from a bogus state. But otherwise, I believe that it will enter timewait o

RE: question on the timewait event of the rdma-cm

2011-11-14 Thread Hefty, Sean
> I'm debugging some disconnect related race in iser - and wanted to check > with you something re the CM/RDMA-CM state machine: I see that when a > disconnected is initiated by the passive side (iser target) of a > connection, such that the active side (iser initiator) gets > RDMA_CM_EVENT_DISCONN

RE: [PATCH 00/11] New Caching mechanism for ib_core

2011-11-09 Thread Hefty, Sean
> > Yes, but the use of the cache is hidden from the user. > > user who? ib_cm, rdma_cm, ib_mad, ib_ipoib, etc. You would need to trace the use up to all ULPs. > Since the drivers would be users of cache calls, the behavior would be > as assumed for the query calls for the driver. Refer mthca

RE: [PATCH V2 6/7] IB/qib: memcpy optimizations

2011-11-09 Thread Hefty, Sean
> This fix adds an x86_64 specific routine that 1) probes for > X86_FEATURE_REP_GOOD > and 2) uses an inline asm routine builton rep movsq that testing has shown is > better than the builtin memcpy for all cases up to 4K. The probing routine is > now called when the qib module is loaded to enable

RE: [PATCH 00/11] New Caching mechanism for ib_core

2011-11-09 Thread Hefty, Sean
> > > 1. Greater degree of control by individual drivers. Drivers have a > > >choice to use it or not. > > > > I believe that some callers need to know that specific query calls will not > sleep. That capability should either be required or exposed through the API. > > The new cache access fu

RE: [PATCH 00/11] New Caching mechanism for ib_core

2011-11-08 Thread Hefty, Sean
> The main motivations are: > > 1. Greater degree of control by individual drivers. Drivers have a >choice to use it or not. I believe that some callers need to know that specific query calls will not sleep. That capability should either be required or exposed through the API. > 2. The lib

[PATCH 5/5] rdma/cm: Allow user to specify AF_IB when binding

2011-11-04 Thread Hefty, Sean
Modify rdma_bind_addr to allow the user to specify AF_IB when binding to a device. AF_IB indicates that the user is not mapping an IP address to the native IB addressing. (The mapping may have already been done, or is not needed.) Signed-off-by: Sean Hefty --- drivers/infiniband/core/cma.c |

[PATCH 4/5] rdma/cm: Update port reservation to support AF_IB

2011-11-04 Thread Hefty, Sean
The AF_IB uses a 64-bit service id (SID), which the user can control through the use of a mask. The rdma_cm will assign values to the unmasked portions of the SID based on the selected port space and port number. Because the IB spec divides the SID range into several regions, a SID/mask combinati

[PATCH 3/5] ib/addr: Add AF_IB support to ip_addr_size

2011-11-04 Thread Hefty, Sean
Add support for AF_IB to ip_addr_size, and rename the function to account for the change. Give the compiler more control over whether the call should be inline or not by moving the definition into the .c file, removing the static inline, and exporting it. Signed-off-by: Sean Hefty --- drivers/i

[PATCH 2/5] rdma/cm: Include AF_IB in loopback and any address checks

2011-11-04 Thread Hefty, Sean
Enhance checks for loopback and any address to support AF_IB in addition to AF_INET and AF_INT6. This will allow future patches to use AF_IB when binding and resolving addresses. Signed-off-by: Sean Hefty --- drivers/infiniband/core/cma.c | 40 1 files

[PATCH 1/5] rdma/cm: Define native IB address

2011-11-04 Thread Hefty, Sean
Define AF_IB and sockaddr_ib to allow the rdma_cm to use native IB addressing. Signed-off-by: Sean Hefty --- The format of sockaddr_ib was a result after a lengthy discussion on the linux-rdma list, mostly between myself and Jason. include/linux/socket.h |2 + include/rdma/ib.h | 89

[PATCH 0/5] rdma/cm: Add support for native Infiniband addressing

2011-11-03 Thread Hefty, Sean
The following patches are the first 5 in a series of 25 total that adds the ability to handle native Infiniband addressing to the rdma_cm. I'm hoping by submitting only a small subset of the patches at a time, they will be easier to review. Adding support for native IB addressing allows us to offl

[PATCH] [libibverbs] Allow 3rd party extensions to verb routines

2011-11-03 Thread Hefty, Sean
In order to support OFED, vendor specific calls, or new ibverbs operations, define a generic extension mechanism. This allows OFED, an RDMA vendor, or another registered 3rd party (for example, the librdmacm) to define RDMA extensions, plus provides a backwards compatible way to add new features t

RE: rdma_cm UD sendto without connect

2011-11-03 Thread Hefty, Sean
> Are there calls in rdma_cm that allow me to do this? For the life of > me I can't figure out how, for instance, I would write an app that has > a single QP that recieves UD datagrams from anyone. How do potential > clients find the QP number, qkey and ah without using listen/accept > which will

RE: [PATCH] RDMA/cm: check that idr_pre_get didn't fail and clean it in error flow

2011-11-03 Thread Hefty, Sean
> Do you want me to resent a new patch only with the cleanup code? If you have time, please do. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: rdma_cm UD sendto without connect

2011-11-03 Thread Hefty, Sean
> Is it possible to send/recv datagrams with no previous connection > similar to UDP sendto/recvfrom? UD examples I've seen are connection > oriented and listen/connect/accept, establishing a connection prior to > conversing. To send a datagram, you need the QP number, qkey, and addressing inform

RE: [PATCH] RDMA/cm: check that idr_pre_get didn't fail and clean it in error flow

2011-11-03 Thread Hefty, Sean
> The function idr_pre_get may fail, so there is a need to check the status of > it. Since this function allocate resources, we need to clean them during the > resource cleaning in case of error. We should add the cleanup code below, but I don't think we care if idr_pre_get fails here. We'll jus

RE: [PATCH] DAPL v2.0: common: increase default IB ack timer from 16 to 20

2011-11-02 Thread Hefty, Sean
> Yes, a user can modify path lifetime via rdma_create_ep() but there > is no way for the user to know how much that will be manipulated and > increased in the IB CM driver. Sure there is. It's an open source driver. :) The ib_cm calculates the "correct" timeout based on the packet lifetime pr

RE: [PATCH] DAPL v2.0: common: increase default IB ack timer from 16 to 20

2011-11-02 Thread Hefty, Sean
> For larger, more congested fabrics, a larger ACK timer is needed. > Consumers can still change default with environment variable > DAPL_ACK_TIMER if they need to increase or decrease. > > This applies to SCM and UCM providers only. The CMA provider, which > uses rdma_cm, has no way to control ac

RE: [PATCH 4/5] ib/core: add support for extended performance counters in sysfs

2011-11-01 Thread Hefty, Sean
> Let's not get into fairness here... I'm trying to make progress on my backlog > but there are patches that for better or worse have been around for a year > or more. Along these lines, is there any news on when patchwork might be available again? I've been trying to help review some of the bac

RE: Question about a corner case in the CMA

2011-11-01 Thread Hefty, Sean
> My question is: will the mc->next contain the right value? > (since the compiler may cache the value of mc->next within that function and > not read the updated value from memory, > so if the mc_list will be changed by another thread, we may get bad results > ...) Access to the list is protected

RE: Question about a corner case in the CMA

2011-10-31 Thread Hefty, Sean
> If a user tried to join a multicast group and the variable "mc" is being added > to the list of the mc_list: > 1) If the write() fails, the value of mc->next may be changed by another > thread (for example, removing the mcast which is pointed by mc->next) > will the "mc->next" still point to

RE: [PATCH] rdma/cma: minor code refactoring when saving a string content

2011-10-31 Thread Hefty, Sean
thanks - applied -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: UDP -> IPoIB -> Verbs

2011-10-31 Thread Hefty, Sean
> Sean, wait, the rdmacm IPoIB port space allows librdmacm consumers to > subscribe to multicast groups created by IPoIB and vise versa, so for > udp/multicast the answer is yes, agree? Good point - for multicast it should work. -- To unsubscribe from this list: send the line "unsubscribe linux-r

RE: cq error timeout issue

2011-10-30 Thread Hefty, Sean
> I try to use ibv_poll_cq to identify connectivity problems. The > scenario is following, based on modified rping example: > > 1) preliminary steps done and rdma connection established between > Client and Server, retry_count in rdma_conn_param is set 1; > 2) Server lost its link (corresponding s

RE: UDP -> IPoIB -> Verbs

2011-10-30 Thread Hefty, Sean
> Is it possible using verbs to send and/or recv UDP data traveling over > IPoIB? Or to assocate a socket with a completion queue? I didn't see a response to this, but the answers are no. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to

RE: [PATCH] rdma/cma: fixed resource leak in case of an error in udaddy

2011-10-26 Thread Hefty, Sean
thanks - applied -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Are there are problems using the CMA with IPv6?

2011-10-26 Thread Hefty, Sean
> Subject: Are there are problems using the CMA with IPv6? IPv6 should work fine. > I noticed the following code in the cmatose/rping examples: but, I can't say whether all of the examples support ipv6. I believe that they do. > > static int get_addr(char *dst, struct sockaddr *addr) > { >

RE: [PATCH] rdma/core: Really export ib_open_qp() to user space

2011-10-17 Thread Hefty, Sean
> again, and just to make sure I got it - for basic XRC testing which doesn't go > to MPI nor to > the OFED compatability APIs, what env/test would you recommend - is that the > xrc branch on the three libraries and rdma_x{client, server}? Yes - please make sure you have pulled those branches rece

RE: [PATCH] rdma/core: Really export ib_open_qp() to user space

2011-10-17 Thread Hefty, Sean
> So what else would you suggest for further testing, is that pulling > the xrc branch of your ofa hosted librdmacm/libibverbs/libmlx4 trees > and run librdmacm's rdma_{xclient,xserver} example? I was a bit confused > since I see this example both in the master and the xrc brach. I wanted to keep

RE: rdma: 3.1.0-rc9 breaks UD

2011-10-14 Thread Hefty, Sean
> I wonder how important wire compatibilty of the ibv_xx_pingpong > examples is... should I worry about this? I know that for some of the larger clusters, they may not be able to upgrade the software on all systems during a scheduled downtime. This is about the only reason I can think of why yo

RE: [PATCH] rdma/core: Really export ib_open_qp() to user space

2011-10-14 Thread Hefty, Sean
> I pulled the xrc patches in your for-next branch and ran some simple > tests against it. Between the last time I tested XRC and now, I'm now > seeing mvapich2 hang during MPI finalize, which I'm debugging. Just an update: the issues that I was seeing were caused by missing patches in my librar

RE: rdma: 3.1.0-rc9 breaks UD

2011-10-14 Thread Hefty, Sean
> Our 3.1-rc9 included Rolands for-next branch. Actually, mine did too. I wonder if this OFED patch to libibverbs is causing the issue: http://git.openfabrics.org/git?p=ofed_1_5/libibverbs.git;a=blob;f=fixes/rocee_examples.patch;h=eda5a401a3424a104e8100848b5b6bf4e5b63bee;hb=HEAD -- To unsubscri

RE: rdma: 3.1.0-rc9 breaks UD

2011-10-14 Thread Hefty, Sean
> Running ibv_ud_pingpong and ibc_uc_pingpong between two hosts. One with > OFED 1.5.3.1 (Ubuntu LTS 10.04) and another on linux 3.1.0-rc9 (Same > ubuntu version uderlying) with the upstream libraries. FWIW, I was able to run 3.1-rc9 in loopback and between 3.0 and 3.1-rc9 systems. I don't have

[PATCH] rdma/core: Really export ib_open_qp() to user space

2011-10-13 Thread Hefty, Sean
We need to add an entry into the uverbs and device command tables to allow user space to actually call ib_open_qp. Signed-off-by: Sean Hefty --- If possible, this should just be merged with the last patch in the XRC series. In my previous tests, this was not getting called. I either did not hav

RE: Building 3.1-rc9 in kernel infiniband support with OFED libraries

2011-10-13 Thread Hefty, Sean
> Wait, now I'm baffled by the patch (ie > http://git.openfabrics.org/git?p=~shefty/rdma- > dev.git;a=commitdiff;h=1ec4e62a6e967ddc258e7c4e674168debb727d39) > > I don't see anything that calls ib_uverbs_open_qp(). Am I missing something?? > > Does the OFED API compatibility actually call this fu

RE: Building 3.1-rc9 in kernel infiniband support with OFED libraries

2011-10-13 Thread Hefty, Sean
> The result is pushed out to my github for-next branch, with the > expectation that I'll ask Linus to pull for 3.2. Thanks - I'll take a look and test again. > However I do have one question: the last patch > ("RDMA/uverbs: Export ib_open_qp() capability to user space" in > my tree) adds IB_US

RE: Building 3.1-rc9 in kernel infiniband support with OFED libraries

2011-10-12 Thread Hefty, Sean
> Did we every resolve the controversy about the rcv QPs with > MPI users? The design seemed sane to me, but Yes - I believe so. > Also (I'm sure you already posted this once, but...) Sean, > do you have a git tree with all the kernel patches included? My latest patches are at: git

RE: Building 3.1-rc9 in kernel infiniband support with OFED libraries

2011-10-12 Thread Hefty, Sean
> Why is the OFED libibverbs library binary incompatible with the > non-OFED libibverbs library ? Why hasn't XRC support been implemented > in the OFED libibverbs library such that applications built against > the upstream libibverbs headers also work with the latest OFED version > of that library

RE: [PATCH for-3.2 v2 1/5] ib/core: Add extended link speeds

2011-10-05 Thread Hefty, Sean
> The following extended speeds are introduced: > FDR-10 - is a proprietary link speed which is 10.3125 Gbps at 64/66 > encoding rather than 8b10b encoding. > FDR - represents the IBA extended speed: 14.0625 Gbps. > EDR - represents the IBA extended speed: 25.78125 Gbps. > > Signe

RE: [PATCH for-3.2 v1 1/5] ib/core: Add extended link speeds

2011-10-04 Thread Hefty, Sean
> @@ -186,16 +187,30 @@ static ssize_t rate_show(struct ib_port > return ret; > > switch (attr.active_speed) { > - case 2: speed = " DDR"; break; > - case 4: speed = " QDR"; break; > + case 2: > + speed = " DDR"; > + break; > + case 4: >

RE: [PATCH for-3.2 v1 1/5] ib/core: Add extended link speeds

2011-10-03 Thread Hefty, Sean
> Why not just define this function to return the rate in units where > it's a whole number? > ie just have the function return the value in Mbps instead of Gbps, so > it returns > 2500, 5000, etc. instead of 2+(*rounded=5), 5+(*rounded=0), etc. even simpler! > > Or if ib_rate_to_int() is only us

RE: [PATCH for-3.2 v1 1/5] ib/core: Add extended link speeds

2011-10-03 Thread Hefty, Sean
> --- a/drivers/infiniband/core/verbs.c > +++ b/drivers/infiniband/core/verbs.c > @@ -77,6 +77,35 @@ enum ib_rate mult_to_ib_rate(int mult) > } > EXPORT_SYMBOL(mult_to_ib_rate); > > +int ib_rate_to_int(enum ib_rate rate, int *rounded) > +{ I like this approach better, but I wonder if it still c

RE: [PATCH] rdma/cm: Fix crash in cma_req_handler

2011-09-30 Thread Hefty, Sean
> The rdma_cm uses the local qp_type to determine how to > process an incoming request. This can result in an > incoming REQ being treated as a SIDR REQ and vice versa. > Fix this by switching off the event type instead, and for > good measure verify that the listener supports the incoming > conne

RE: IBV_WC_LOC_QP_OP_ERR for IBV_WR_RDMA_READ

2011-09-29 Thread Hefty, Sean
> I'm using RDMA CM. I am making the ibv_modify_qp() call immediately after > rdma_create_qp(). Should I move this code after the rdma_connect() / > rdma_accept()? > > Sample code I am looking at is from the OFA coding course. If you're using the rdma cm, the QP transitions are handled for you

RE: IBV_WC_LOC_QP_OP_ERR for IBV_WR_RDMA_READ

2011-09-29 Thread Hefty, Sean
> Thanks for responding Roland. I wasn't setting these up. I've added a > call to ibv_modify_qp to set these but it doesn't help. I even explicitly > set the access rights on the QP > > qp_attr.max_rd_atomic = 32; > qp_attr.max_dest_rd_atomic = 32; > qp_attr.qp_access_fl

RE: Issue with ibv_post_send

2011-09-29 Thread Hefty, Sean
> * When I reduce the sge length of the first work request to 20 ibv_post_send > works > * When I remove IBV_SEND_INLINE from the send_flags, ibv_post_send works. > > Either case I am unable to achieve the desired functionality. > > Is there a maximum sge size limit in the case of IBV_SEND_INLINE

RE: Issue with ibv_post_send

2011-09-26 Thread Hefty, Sean
> code snips > start > -- > int rc; > struct ibv_send_wr bad_send_wr[3]; This should just be: struct ibv_send_wr *bad_send_wr; On a post error, the pointer will be set to the work request that failed. >

RE: [ofw] ib-diags: compatability issue with ibstat

2011-09-23 Thread Hefty, Sean
> 1. Remove ibstat and use ibv_devinfo instead > 2. Change ibstat to obtain the fdr10 information using ibverbs > 3. Move the is_fdr10 functionality into OS specific files or code > sections of ibstat > 4. Change ibstat to obtain fdr10 data using MADs or some other OS > inde

RE: [ofw] ib-diags: compatability issue with ibstat

2011-09-23 Thread Hefty, Sean
So far, it seems that the choices are: 1. Remove ibstat and use ibv_devinfo instead 2. Change ibstat to obtain the fdr10 information using ibverbs 3. Move the is_fdr10 functionality into OS specific files or code sections of ibstat 4. Change ibstat to obtain fdr10 data using MADs or some other OS

RE: [ofw] ib-diags: compatability issue with ibstat

2011-09-23 Thread Hefty, Sean
> > The only way to determine whether fdr10 is active or not is via the > > vendor proprietary MAD. That info may be reflected in some other API > > (and/or file) so that MAD does not need to be reissued. In a separate > > thread on linux-rdma, there was discussion on a couple of different > > ways

RE: [ofw] ib-diags: compatability issue with ibstat

2011-09-23 Thread Hefty, Sean
> Even if ibv_devinfo is updated to include the additional information, > do we want to require libibverbs, etc. on any IB management machine > just for this ? That's not the case today on IB management nodes. IMO, it's acceptable for ibverbs to be the basic requirement for any IB userspace appli

RE: [PATCH] ib-diags: Add cast to fix build on windows

2011-09-22 Thread Hefty, Sean
> It supresses a warning, warnings are not build failures. Actually, on windows the build fails. No executables are built. mad_[get|set]_field[8|16|32] should work fine. But those are exported from ibmad, which would require ib-diags to either check for them as a requirement or implement them

RE: [PATCH] ib-diags: Add cast to fix build on windows

2011-09-22 Thread Hefty, Sean
> Adding a cast is just pointless noise, doesn't fix or prove anything. It fixes the build on windows. You can call that pointless, but I do not. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at

RE: [PATCH v2] ib-diags: Add cast to fix build on windows

2011-09-22 Thread Hefty, Sean
Signed-off-by: Sean Hefty --- changes from v1: reset mode back to 644 libibnetdisc/src/ibnetdisc.c |2 +- src/ibportstate.c|4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/libibnetdisc/src/ibnetdisc.c b/libibnetdisc/src/ibnetdisc.c index 86210eb..c93e7a

RE: [PATCH] ib-diags: Add cast to fix build on windows

2011-09-22 Thread Hefty, Sean
> >  libibnetdisc/src/ibnetdisc.c |    2 +- > >  src/ibportstate.c            |    4 ++-- > >  2 files changed, 3 insertions(+), 3 deletions(-) > >  mode change 100644 => 100755 libibnetdisc/src/ibnetdisc.c > >  mode change 100644 => 100755 src/ibportstate.c > > A minor nit: is this mode change re

[PATCH] ib-diags: Add cast to fix build on windows

2011-09-22 Thread Hefty, Sean
Signed-off-by: Sean Hefty --- libibnetdisc/src/ibnetdisc.c |2 +- src/ibportstate.c|4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) mode change 100644 => 100755 libibnetdisc/src/ibnetdisc.c mode change 100644 => 100755 src/ibportstate.c diff --git a/libibnetdisc/sr

RE: ib-diags: compatability issue with ibstat

2011-09-21 Thread Hefty, Sean
> Does this mean "ibstatus" does not work on Windows? We do not support any of the scripts on windows. As far as I could tell, the scripts look like they just do post-processing of available output. > How are you proposing the addition to ibverbs? It seems this would break ABI > there. On wi

ib-diags: compatability issue with ibstat

2011-09-21 Thread Hefty, Sean
commit 1344cb3feacafc462440dabfa5997c5205486d83 added support for FDR10 in a way that is not compatible with Windows support. Windows does not use files to read attribute information. I will probably need to obtain the necessary information using ibverbs on windows by reading port attributes.

RE: creating common ib_types.h for linux and windows

2011-09-21 Thread Hefty, Sean
> What is your end goal? To have one code base for OpenSM that would be able to > be compiled on both Linux and Windows based on __WIN__ definition? My end goal is to decrease the maintenance cost porting opensm to Windows. Ideally, I'd like to have a common code base for opensm, similar to what

RE: creating common ib_types.h for linux and windows

2011-09-20 Thread Hefty, Sean
> Why to test for __WIN__ instead of _WIN32 (defined both when building > 32-bit and 64-bit code -- see also > http://msdn.microsoft.com/en-us/library/b0084kay%28v=vs.80%29.aspx) ? I have no idea. This is just what's currently in the code. I can change this portion of the code if we want to use

creating common ib_types.h for linux and windows

2011-09-19 Thread Hefty, Sean
It would be easier to maintain opensm on windows if it truly shared the same code base. For now, I'd just like to start with a common ib_types.h file. (There are currently hundreds, if not thousands, of lines that differ.) ib_types.h uses #if defined(__WIN__) to separate linux from windows co

RE: [PATCH RFC 1/4] ib/core: handle EDR/FDR extended speeds

2011-09-19 Thread Hefty, Sean
> It maps to what was done in the PortInfo attribute to add the new > extended speeds. There was no room for expansion in the existing > original link speed fields so a "parallel" set of fields had to be > added there.. That's was an issue with the wire protocol format, correct? Why carry that s

RE: [PATCH RFC 1/4] ib/core: handle EDR/FDR extended speeds

2011-09-19 Thread Hefty, Sean
> Index: b/drivers/infiniband/core/verbs.c > === > --- a/drivers/infiniband/core/verbs.c 2011-09-13 13:34:19.660539000 +0300 > +++ b/drivers/infiniband/core/verbs.c 2011-09-13 16:42:39.713754400 +0300 > @@ -77,6 +77,23 @@ enum ib_rate

RE: [PATCH RFC 2/4] ib/core: handle FDR-10 link encoding

2011-09-19 Thread Hefty, Sean
> Index: b/drivers/infiniband/core/sysfs.c > === > --- a/drivers/infiniband/core/sysfs.c 2011-09-14 13:49:58.0 +0300 > +++ b/drivers/infiniband/core/sysfs.c 2011-09-14 13:50:43.731775900 +0300 > @@ -209,7 +209,7 @@ static ssize

RE: [PATCH] An argument for allowing applications to manually send RMPP packets if desired

2011-09-18 Thread Hefty, Sean
> Ultimately I think the scalable/compatible answer is to move these > RMPP work loads to a verbs QP and we need to have a user space RMPP > implementation for that anyhow. Many to one is never scalable. The applications simply cannot rely on every node querying the SA at the same time, especial

RE: identify the race condition in this code and win the respect of linux-rdma developers!

2011-09-16 Thread Hefty, Sean
> As I see it the problem flow would be this: > > poll > ibv_req_notify_cq > // HCA Write CQ Entry > // Trigger EVENT Since the trigger event must be separate from writing the cq entry, if it doesn't happen until... > poll -> return CQ > > poll > ibv_req_notify_cq // does nothing, event is alr

RE: identify the race condition in this code and win the respect of linux-rdma developers!

2011-09-16 Thread Hefty, Sean
Maybe someone at Mellanox can provide some guidance here. Here are the arm/get_event calls from the libmlx4. int mlx4_arm_cq(struct ibv_cq *ibvcq, int solicited) { ... sn = cq->arm_sn & 3; ci = cq->cons_index & 0xff; cmd = solicited ? MLX4_CQ_DB_REQ_NOT_SOL

RE: identify the race condition in this code and win the respect of linux-rdma developers!

2011-09-16 Thread Hefty, Sean
> poll and cq_events are totally independent, I belive the implementation > of CQ events is like: > > enum {DISABLED,ARMED,TRIGGERED} flag; I was suggesting that there were only 2 states, not 3, with ibv_get_cq_event() simply retrieving a queued event from the kernel without touching the CQ stat

RE: identify the race condition in this code and win the respect of linux-rdma developers!

2011-09-16 Thread Hefty, Sean
> > Case 1: > > ret = ibv_poll_cq(id->recv_cq, 1, wc); > > ret = ibv_req_notify_cq(id->recv_cq, 0); > > > > while (!(ret = ibv_poll_cq(id->recv_cq, 1, wc))) { > > ret = ibv_get_cq_event(id->recv_cq_channel, &cq, &context); > > ibv_ack_cq_events(id->recv_cq, 1); >

identify the race condition in this code and win the respect of linux-rdma developers!

2011-09-15 Thread Hefty, Sean
I have a ping-pong test application that loops doing: send, wait for send completion, wait for receive completion. The test occasionally hangs in the following code at ibv_get_cq_event() (error handling removed): Case 1: ret = ibv_poll_cq(id->recv_cq, 1, wc); ret = ibv_req_notif

RE: [PATCH 0/20 v2] rdma: Add XRC support

2011-09-14 Thread Hefty, Sean
Roland, Can I get a quick status update regarding this series? Thanks, Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: [PATCH] An argument for allowing applications to manually send RMPP packets if desired

2011-09-12 Thread Hefty, Sean
> Couldn't the kernel just copy data as needed, and say that > userspace needs to keep the buffer stable until the send > completes? (With an opt-in from userspace maybe?) I agree, and I'll add that the IBTA could also take up the task of coming up with a far more efficient way of transferring e

[PATCH 4/4] [for-3.2] rdma/ucm: Removed checks for unsigned value < 0

2011-09-07 Thread Hefty, Sean
cmd is unsigned, no need to check for < 0. Found by code inspection. Signed-off-by: Sean Hefty --- drivers/infiniband/core/ucm.c |2 +- drivers/infiniband/core/ucma.c|2 +- drivers/infiniband/core/user_mad.c|5 ++--- drivers/infiniband/core/uverbs_main.c |3 +

[PATCH 3/4] [for-3.2] ib/mad: Verify mgmt class in received MADs

2011-09-07 Thread Hefty, Sean
If a received MAD contains an invalid or reserved mgmt class, we will attempt to access method_table outside of its range. Add a check to ensure that mgmt class falls within the handled range. Found by code inspection. Signed-off-by: Sean Hefty --- drivers/infiniband/core/mad.c |3 +++ 1 fi

[PATCH 2/4] [for-3.2] rdma/cm: Check for NULL conn_param in rdma_accept

2011-09-07 Thread Hefty, Sean
Check that conn_param is not null before dereferencing it when processing rdma_accept(). This is necessary to prevent a possible system crash, which can be invoked by user space. Problem found by code inspection. Signed-off-by: Sean Hefty --- drivers/infiniband/core/cma.c | 38 ++

[PATCH 1/4] [for-3.2] rdma/addr: Do not call neigh_event_send() with NULL neighbour

2011-09-07 Thread Hefty, Sean
The following code in addr6_resolve can result in calling neigh_event_send() with a NULL neighbour: neigh = dst->neighbour; if (!neigh || !(neigh->nud_state & NUD_VALID)) { neigh_event_send(dst->neighbour, NULL); ret = -ENODATA; goto put; } Fix this. Found by code i

RE: Resolving IP addr to GID with librdmacm

2011-09-06 Thread Hefty, Sean
> Does IPoIB module depend on Subnet manager/Subnet Administration components > for > any service while resolving IP address to GID. Yes - it relies on multicast groups having been setup and SA queries. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a messa

RE: Does the CMA support connecting UC QPs?

2011-09-06 Thread Hefty, Sean
> Can one connect a UC QPs using the CMA? No, but the xrc patches should enable this. An app may be able to create UC QPs and connect them, but the rdma_cm will treat the connection as RC. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord

RE: [PATCH 0/20 v2] rdma: Add XRC support

2011-08-25 Thread Hefty, Sean
> The only issue from my viewpoint is whether the change referenced by patch 11 > (ib/cm: Update xrc support based on xrc annex errata) has been approved by the > IBTA. I'm waiting on a response on that. I received confirmation that the errata to the XRC Annex was approved. Both patch 10 (ib/cm

RE: Resolving IP addr to GID with librdmacm

2011-08-24 Thread Hefty, Sean
> In the code, it looks like during resolving IP addr to GID, one of the kernel > module (ib_addr) is using neigh_lookup(..) to get the hardware address of the > given IP. I did not get the dependency on the module ib_ipoib to resolve > remote > IP addr to GID. Am I missing anything here? Please

RE: [PATCH 1/2] librdmacm: Fix resource leak in error flow

2011-08-23 Thread Hefty, Sean
> Prevent resource leak by destroying the event channel before returning from > function in an error flow. Thanks - I applied this patch after my patch "Fix resource leak when CMA_CREATE_MSG_CMD_RESP fails" - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the b

[PATCH] librdmacm: Fix resource leak when CMA_CREATE_MSG_CMD_RESP fails

2011-08-23 Thread Hefty, Sean
If resources are allocated before CMA_CREATE_MSG_CMD_RESP or CMA_CREATE_MSG_CMD are called, and those calls fail, we need to cleanup the resources before returning. Fix this by changing the CMA_CREATE_MSG macros to remove the alloca and calling return. The request and response structures are now

RE: [PATCH 2/2] librdmacm: fix resource leaks when CMA_CREATE_MSG_CMD_RESP fails

2011-08-22 Thread Hefty, Sean
> -#define CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, type, size) \ > +#define CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, type, size, clean_cmd) \ This starts to get ugly, especially with usage that ends up looking like this: > - CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_DESTROY_ID, size); >

[PATCH] rdma/cm: Fix crash in cma_req_handler

2011-08-22 Thread Hefty, Sean
The rdma_cm uses the local qp_type to determine how to process an incoming request. This can result in an incoming REQ being treated as a SIDR REQ and vice versa. Fix this by switching off the event type instead, and for good measure verify that the listener supports the incoming connection reques

RE: [PATCH 1/2] librdmacm: Fix resource leak in error flow

2011-08-22 Thread Hefty, Sean
> Did you have a chance for reviewing those patches? I did see and flagged them for follow up, but I have not reviewed them yet. I should get to them sometime this week. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org

RE: [RFC] XRC upstream merge reboot

2011-08-22 Thread Hefty, Sean
> I am a bit concerned here. In the current usage model, target QPs are > destroyed when their reference > count goes to zero > (ib_reg_xrc_recv_qp and ibv_xrc_create_qp increment the reference count, > while ib_unreg_xrc_recv_qp > decrements it). > In this model, the TGT QP user/consumer does n

RE: [PATCH 0/20 v2] rdma: Add XRC support

2011-08-19 Thread Hefty, Sean
> Great stuff! I'd really like to get this into kernel 3.2... do you have any > open issues or is it all good from your POV? The only issue from my viewpoint is whether the change referenced by patch 11 (ib/cm: Update xrc support based on xrc annex errata) has been approved by the IBTA. I'm wa

[PATCH 2/2 v2] libmlx4: Add support for XRC extension

2011-08-18 Thread Hefty, Sean
Implement the libibverbs xrc support using the defined xrc extension. This patch is based on a patch by Jack Morgenstein . Signed-off-by: Sean Hefty --- Changes from v1: Add support for open_qp(). Avoid allocating unnecessary resources for XRC TGT QPs. src/buf.c |6 + src/cq.c |

[PATCH 1/2] libmlx4: Add support for verbs extensions

2011-08-18 Thread Hefty, Sean
Update the libmlx4 library to register extensions with libibverbs, if it supports extensions. By registering extension support, this indicates to ibverbs that extended data structures are available. Signed-off-by: Sean Hefty --- Makefile.am|2 +- configure.in |3 ++ src/mlx4-ext.c

[PATCH 0/6 v2] libibverbs: Add support for XRC

2011-08-18 Thread Hefty, Sean
The following patches add support for XRC to ibverbs. Support for XRC requires additional function calls into the verbs driver library (e.g. libmlx4): open_xrcd, close_xrcd, create xrc srqs, and open_qp. Since no mechanism is currently defined to allow for these calls, we first define a way to add

[PATCH 7/6 v2] [OFED] libibverbs: Support both OFED verbs and ibverbs

2011-08-18 Thread Hefty, Sean
This patch allows libibverbs to support both libibverbs API that shipped with OFED 1.5 and the upstream libibverbs API. This supports existing apps that are compiled against the upstream libibverbs (ibverbs). And in ideal cases, an application coded to the OFED version of libibverbs (ofverbs) wou

[PATCH 6/6 v2] libibverbs: Add man page for ib_open_qp

2011-08-18 Thread Hefty, Sean
Signed-off-by: Sean Hefty --- man/ibv_open_qp.3 | 48 1 files changed, 48 insertions(+), 0 deletions(-) create mode 100644 man/ibv_open_qp.3 diff --git a/man/ibv_open_qp.3 b/man/ibv_open_qp.3 new file mode 100644 index 000..577a28d --- /dev

[PATCH 5/6 v2] libibverbs: Add ibv_open_qp()

2011-08-18 Thread Hefty, Sean
XRC receive QPs are shareable across multiple processes. Allow any process with access to the xrc domain to open an existing QP. After opening the QP, the process will receive events related to the QP and be able to modify the QP. Signed-off-by: Sean Hefty --- Submitting this change separate fr

[PATCH 4/6 v2] libibverbs: Add xrc pingpong example

2011-08-18 Thread Hefty, Sean
From: Jay Sternberg Signed-off-by: Jay Sternberg Signed-off-by: Sean Hefty --- Makefile.am |4 examples/xsrq_pingpong.c | 873 ++ 2 files changed, 876 insertions(+), 1 deletions(-) create mode 100644 examples/xsrq_pingpong.c diff

[PATCH 2/6 v2] libibverbs: Using extensions to define XRC support

2011-08-18 Thread Hefty, Sean
Define a common libibverbs driver side extension to support XRC. XRC introduces several new concepts and structures: XRC domains: xrcd's are a type of protection domain used to associate shared receive queues with xrc queue pairs. Since xrcd are meant to be shared among multiple processes, we in

[PATCH 3/6 v2] libibverbs: Add/update man pages for xrc verbs

2011-08-18 Thread Hefty, Sean
Signed-off-by: Sean Hefty --- Makefile.am |3 +- man/ibv_create_qp.3 | 13 +++- man/ibv_create_xsrq.3 | 80 + man/ibv_open_xrcd.3 | 65 man/ibv_post_send.3 | 11 ++- 5 file

[PATCH 1/6 v2] libibverbs: Allow 3rd party extensions to verb routines

2011-08-18 Thread Hefty, Sean
In order to support OFED, vendor specific calls, or new ibverbs operations, define a generic extension mechanism. This allows OFED, an RDMA vendor, or another registered 3rd party (for example, the librdmacm) to define RDMA extensions, plus provides a backwards compatible way to add new features t

[PATCH 0/20 v2] rdma: Add XRC support

2011-08-18 Thread Hefty, Sean
XRC provides a scalability enhancement when a process on one node must communicate with multiple processes on another node. This is commonly the case when running MPI applications on multi-core systems in a cluster. An XRC connection consists of an initiator (XRC INI) qp and a target (XRC TGT) qp

[PATCH 19/20 v2] rdma/core: Export ib_open_qp to share XRC TGT QPs

2011-08-18 Thread Hefty, Sean
XRC TGT QPs are shared resources among multiple processes. Since the creating process may exit, allow other processes which share the same XRC domain to open the existing QP. This allows us to transfer ownership of an xrc tgt qp to another process. Conceptually, verbs treats an xrc tgt qp as a '

<    5   6   7   8   9   10   11   12   13   14   >