RE: [RFC] XRC upstream merge reboot

2011-08-22 Thread Hefty, Sean
> I am a bit concerned here. In the current usage model, target QPs are > destroyed when their reference > count goes to zero > (ib_reg_xrc_recv_qp and ibv_xrc_create_qp increment the reference count, > while ib_unreg_xrc_recv_qp > decrements it). > In this model, the TGT QP user/consumer does n

Re: [RFC] XRC upstream merge reboot

2011-08-21 Thread Jack Morgenstein
On Thursday 11 August 2011 01:20, Hefty, Sean wrote: > To help with OFED feature level compatibility, I'm in the process of adding a > new call to ibverbs: > > struct ib_qp_open_attr { > void (*event_handler)(struct ib_event *, void *); > void  *qp_context; > u32    qp_num

Re: [RFC] XRC upstream merge reboot

2011-08-11 Thread Shamis, Pavel
I think it's good idea to support both usage models. Regards, Pasha. >> Things only get complicated when the domain-allocator process allocates a >> single domain and simply >> uses that single domain for all jobs (i.e., the domain is never de-allocated >> for the lifetime of the >> allocating

RE: [RFC] XRC upstream merge reboot

2011-08-10 Thread Hefty, Sean
> Things only get complicated when the domain-allocator process allocates a > single domain and simply > uses that single domain for all jobs (i.e., the domain is never de-allocated > for the lifetime of the > allocating process, and the allocating process is the server for all jobs). To help with

RE: [RFC] XRC upstream merge reboot

2011-08-03 Thread Hefty, Sean
> ?? How do you register for an event? There is only > ibv_get_async_event(3) - I thought it returned all events relevant to > the associated verbs context. The OFED APIs for managing XRC receive QPs are: int (*create_xrc_rcv_qp)(struct ibv_qp_init_attr *init_attr, uint32

Re: [RFC] XRC upstream merge reboot

2011-08-03 Thread Jason Gunthorpe
On Thu, Aug 04, 2011 at 12:06:24AM +, Hefty, Sean wrote: > > Where does the ib_verbs async event for APM state change get routed for > > XRC? > > The OFED APIs route QP events to all processes which register for > that qp number. ?? How do you register for an event? There is only ibv_get_asy

RE: [RFC] XRC upstream merge reboot

2011-08-03 Thread Hefty, Sean
> Where does the ib_verbs async event for APM state change get routed for > XRC? The OFED APIs route QP events to all processes which register for that qp number. > Does the event have enough info to identify all the necessary > parts? The event carries the qp number only. > Can the process th

Re: [RFC] XRC upstream merge reboot

2011-08-03 Thread Jason Gunthorpe
On Wed, Aug 03, 2011 at 05:16:17PM -0400, Shamis, Pavel wrote: > > > > > Well, in Open MPI we have XRC code that uses APM. > > > If Mellanox cares about the feature, they would have to rework this part > > > of > > > code in Open MPI. > > > I don't know about other apps. > > > > But does the APM

RE: [RFC] XRC upstream merge reboot

2011-08-03 Thread Shamis, Pavel
> > > Well, in Open MPI we have XRC code that uses APM. > > If Mellanox cares about the feature, they would have to rework this part of > > code in Open MPI. > > I don't know about other apps. > > But does the APM implementation expect some other process other than > the creator to be able to mod

RE: [RFC] XRC upstream merge reboot

2011-08-03 Thread Hefty, Sean
> Well, in Open MPI we have XRC code that uses APM. > If Mellanox cares about the feature, they would have to rework this part of > code in Open MPI. > I don't know about other apps. But does the APM implementation expect some other process other than the creator to be able to modify the QP? APM

RE: [RFC] XRC upstream merge reboot

2011-08-03 Thread Shamis, Pavel
> > Well, actually I was thinking about APM. If the "creator" exits, we do not > > have a way to > > upload alternative path. > > Correct - that would be a limitation. You would need to move to a new tgt > qp. Well, in Open MPI we have XRC code that uses APM. If Mellanox cares about the featur

RE: [RFC] XRC upstream merge reboot

2011-08-03 Thread Hefty, Sean
> Well, actually I was thinking about APM. If the "creator" exits, we do not > have a way to > upload alternative path. Correct - that would be a limitation. You would need to move to a new tgt qp. In a general solution, this involves not only allowing other processes to modify the QP, but als

RE: [RFC] XRC upstream merge reboot

2011-08-03 Thread Shamis, Pavel
> > > BTW, did we have the same limitation/feature (only creating process is > allowed > > to modify) in original XRC driver ? > > I'm not certain about the implementation, but the OFED APIs would allow > any process within the xrc domain to modify the qp. > > > Hmm, is it way to destroy the QP

Re: [RFC] XRC upstream merge reboot

2011-08-03 Thread Jack Morgenstein
On Tuesday 02 August 2011 19:29, Shamis, Pavel wrote: > XRC domain is created by process that starts first.  All the rest processes, > that belong > to the same mpi session and reside on the same node, join the domain. > TGT QP is created by process that receive inbound connection first and it i

RE: [RFC] XRC upstream merge reboot

2011-08-02 Thread Hefty, Sean
> BTW, did we have the same limitation/feature (only creating process is allowed > to modify) in original XRC driver ? I'm not certain about the implementation, but the OFED APIs would allow any process within the xrc domain to modify the qp. > Hmm, is it way to destroy the QP, when the origina

Re: [RFC] XRC upstream merge reboot

2011-08-02 Thread Shamis, Pavel
On Aug 2, 2011, at 5:25 PM, Hefty, Sean wrote: >> If the target QP is opened in low level driver, then it's owned by group of >> processes that share the same XRC domain. > > Can you define what you mean by 'owned'? > > With the latest patches, the target qp is created in the kernel. Data > re

RE: [RFC] XRC upstream merge reboot

2011-08-02 Thread Hefty, Sean
> If the target QP is opened in low level driver, then it's owned by group of > processes that share the same XRC domain. Can you define what you mean by 'owned'? With the latest patches, the target qp is created in the kernel. Data received on the target qp can go to any process sharing the as

Re: [RFC] XRC upstream merge reboot

2011-08-02 Thread Shamis, Pavel
>> We do have unregister on finalization. But this code doesn't introduce any >> synchronization across processes on the same node, since kernel manages the >> receive qp. If the reference counter will be moved to app responsibility, it >> will enforce the app to mange the reference counter on app

Re: [RFC] XRC upstream merge reboot

2011-08-02 Thread Shamis, Pavel
Hi Jack, Please see my comments below > From Pavel Shamis: >>> We do have unregister on finalization. But this code doesn't introduce any >>> synchronization across processes on the same node, since kernel manages the >>> receive qp. If the reference counter will be moved to app responsibility, it

Re: [RFC] XRC upstream merge reboot

2011-08-02 Thread Jack Morgenstein
On Monday 01 August 2011 21:28, Hefty, Sean wrote: >From Pavel Shamis: > > We do have unregister on finalization. But this code doesn't introduce any > > synchronization across processes on the same node, since kernel manages the > > receive qp. If the reference counter will be moved to app respon

RE: [RFC] XRC upstream merge reboot

2011-08-01 Thread Hefty, Sean
> We do have unregister on finalization. But this code doesn't introduce any > synchronization across processes on the same node, since kernel manages the > receive qp. If the reference counter will be moved to app responsibility, it > will enforce the app to mange the reference counter on app leve

Re: [RFC] XRC upstream merge reboot

2011-08-01 Thread Shamis, Pavel
>> Actually I think it is really not so good idea manage reference counter >> across OOB communication. > > But this is exactly what the current API *requires* that users of XRC do!!! > And I agree, it's not a good idea. :) We do have unregister on finalization. But this code doesn't introduce

RE: [RFC] XRC upstream merge reboot

2011-08-01 Thread Hefty, Sean
> Actually I think it is really not so good idea manage reference counter > across OOB communication. But this is exactly what the current API *requires* that users of XRC do!!! And I agree, it's not a good idea. :) > IMHO, I don't see a good reason to redefine existing API. > I afraid, that su

Re: [RFC] XRC upstream merge reboot

2011-07-26 Thread Shamis, Pavel
Please see my notes below. >>> I've tried to come up with a clean way to determine the lifetime of an xrc >> tgt qp,\ >>> and I think the best approach is still: >>> >>> 1. Allow the creating process to destroy it at any time, and >>> >>> 2a. If not explicitly destroyed, the tgt qp is bound to t

RE: [RFC] XRC upstream merge reboot

2011-07-21 Thread Hefty, Sean
> > I've tried to come up with a clean way to determine the lifetime of an xrc > tgt qp,\ > > and I think the best approach is still: > > > > 1. Allow the creating process to destroy it at any time, and > > > > 2a. If not explicitly destroyed, the tgt qp is bound to the lifetime of the > xrc domain

RE: [RFC] XRC upstream merge reboot

2011-07-21 Thread Hefty, Sean
> If you use file descriptors for the XRC domain, then when the last user of the > domain exits, the domain > gets destroyed (at least this is in OFED. Sean's code looks the same). > > In this case, the kernel cleanup code for the process should close the XRC > domains opened by that > process, s

Re: [RFC] XRC upstream merge reboot

2011-07-21 Thread Jeff Squyres
On Jul 21, 2011, at 8:47 AM, Jack Morgenstein wrote: > [snip] > When the last user of an XRC domain exits cleanly (or crashes), the domain > should be destroyed. > In this case, with Sean's design, the tgt qp's for the XRC domain should also > be destroyed. Sounds perfect. -- Jeff Squyres jsq

Re: [RFC] XRC upstream merge reboot

2011-07-21 Thread Jack Morgenstein
On Thursday 21 July 2011 14:58, Jeff Squyres wrote: > > If MPI can use a different XRC domain per job (and deallocate the domain > > at the job's end), this would solve the tgt qp lifetime problem (-- by > > destroying all the tgt qp's when the xrc domain is deallocated). > > What happens if the M

Re: [RFC] XRC upstream merge reboot

2011-07-21 Thread Jeff Squyres
On Jul 21, 2011, at 3:38 AM, Jack Morgenstein wrote: > If MPI can use a different XRC domain per job (and deallocate the domain > at the job's end), this would solve the tgt qp lifetime problem (-- by > destroying all the tgt qp's when the xrc domain is deallocated). What happens if the MPI job c

Re: [RFC] XRC upstream merge reboot

2011-07-21 Thread Jack Morgenstein
On Thursday 21 July 2011 10:38, Jack Morgenstein wrote: > Having a new OFED support BOTH interfaces is a nightmare I don't even want to > think about! I over-reacted here, sorry about that. I know that it will be difficult to support both the old and the new interface. However, to support the c

Re: [RFC] XRC upstream merge reboot

2011-07-21 Thread Jack Morgenstein
On Wednesday 20 July 2011 21:51, Hefty, Sean wrote: > I've tried to come up with a clean way to determine the lifetime of an xrc > tgt qp,\ > and I think the best approach is still: > > 1. Allow the creating process to destroy it at any time, and > > 2a. If not explicitly destroyed, the tgt qp

RE: [RFC] XRC upstream merge reboot

2011-07-20 Thread Hefty, Sean
I've tried to come up with a clean way to determine the lifetime of an xrc tgt qp, and I think the best approach is still: 1. Allow the creating process to destroy it at any time, and 2a. If not explicitly destroyed, the tgt qp is bound to the lifetime of the xrc domain or 2b. The creating proc

RE: [RFC] XRC upstream merge reboot

2011-06-23 Thread Hefty, Sean
> If you are thinking of using this for tracking, there is no way that I can > see except reference counting to know when all users of the TGT have > finished with it. Indicating usage by "tapping" the TGT QP whenever an > SRQ receives a packet via the TGT is not a good idea (this is data > path!)

Re: [RFC] XRC upstream merge reboot

2011-06-22 Thread Jack Morgenstein
On Wednesday 22 June 2011 22:57, Hefty, Sean wrote: > > > We can report the creation of a tgt qp on an xrcd as an async event. > > To whom? > > to all users of the xrcd.  IMO, if we require undefined, out of band > communication to use > XRC, then we have an incomplete solution.  It's just too ba

Re: [RFC] XRC upstream merge reboot

2011-06-22 Thread Jack Morgenstein
On Wednesday 22 June 2011 22:57, Hefty, Sean wrote: > Is there *any* way for a tgt qp to know if the remote ini qp is still active? > Not that I am aware of. -Jack -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More m

RE: [RFC] XRC upstream merge reboot

2011-06-22 Thread Hefty, Sean
> > For MPI, I would expect an xrcd to be associated with a single job > instance. > So did I, but they said that this was not the case, and they were very > pleased > with the final (more complicated implementation-wise) interface. > We need to get them involved in this discussion ASAP. I agree.

RE: [RFC] XRC upstream merge reboot

2011-06-22 Thread Tziporet Koren
> > For MPI, I would expect an xrcd to be associated with a single job instance. > > So did I, but they said that this was not the case, and they were very > > pleased > > with the final (more complicated implementation-wise) interface. > > We need to get them involved in this discussion ASAP.

Re: [RFC] XRC upstream merge reboot

2011-06-22 Thread Jack Morgenstein
> I read over the threads that you referenced. I do understand what the > reg/unreg calls were trying to do. In short, I agree with your original > approach > of letting the tgt qp hang around while the xrcd exists, > and I'm not convinced what HP MPI was trying to do should drive a > more com

RE: [RFC] XRC upstream merge reboot

2011-06-22 Thread Hefty, Sean
> > After looking at the implementation more, what I didn't like about the > reg/unreg > > calls is that it is independent of receiving data on an SRQ. That is, a > user can > > receive data on an SRQ through a TGT QP before they have registered and > after unregistering. > That is correct, but th

Re: [RFC] XRC upstream merge reboot

2011-06-22 Thread Jack Morgenstein
On Wednesday 22 June 2011 19:14, Hefty, Sean wrote: > This is partly true, and I haven't come up with a better way to handle this. > Note that the patches allow the original creator of the TGT QP to destroy it > by simply calling ibv_destroy_qp(). > This doesn't handle the process dying, but may

RE: [RFC] XRC upstream merge reboot

2011-06-22 Thread Hefty, Sean
> I noticed (from the code in your git, xrc branch) that the XRC target QPs > stick around until the XRC domain is de-allocated. > There was a long thread about this in December, 2007, where the MPI > community > found this approach unacceptable (leading to accumulation of "dead" XRC > TGT qp's). >

Re: [RFC] XRC upstream merge reboot

2011-06-22 Thread Jack Morgenstein
Hi Sean, Some initial feature feedback. I noticed (from the code in your git, xrc branch) that the XRC target QPs stick around until the XRC domain is de-allocated. There was a long thread about this in December, 2007, where the MPI community found this approach unacceptable (leading to accumulat

RE: [RFC] XRC upstream merge reboot

2011-05-18 Thread Hefty, Sean
What about something along the lines of the following? This is 2 incomplete patches squashed together, lacking any serious documentation. I believe this will support existing apps and driver libraries, as either binary or re-compiling unmodified source code. A driver library calls ibv_registe

Re: [RFC] XRC upstream merge reboot

2011-05-18 Thread Roland Dreier
On Wed, May 18, 2011 at 10:30 AM, Hefty, Sean wrote: >> As long as the version number in the ibv_context is increasing and not >> branching then I think it is OK. 0 = what we have now. 1 = + XRC, 2 = >> +XRC+ummunotify, etc. Drivers 0 out the function pointers they do not >> support. > > I was thi

Re: [RFC] XRC upstream merge reboot

2011-05-18 Thread Jason Gunthorpe
On Wed, May 18, 2011 at 06:13:54PM +, Hefty, Sean wrote: > You need it in the normal send case as well, either outside of the > union, or part of a new struct within the union. Works for me.. union { [..] struct { uint64_t reserved1[3]; uint32_t reserved2; uint32_t remote_q

RE: [RFC] XRC upstream merge reboot

2011-05-18 Thread Hefty, Sean
> The size is 3*64 + 1*32 so there is a 32 bit pad, thus we can rewrite > it as: > > union { > struct { > uint64_tremote_addr; > uint32_trkey; > uint32_txrc_remote_qpn; >

Re: [RFC] XRC upstream merge reboot

2011-05-18 Thread Jason Gunthorpe
On Wed, May 18, 2011 at 05:30:30PM +, Hefty, Sean wrote: > > As long as the version number in the ibv_context is increasing and not > > branching then I think it is OK. 0 = what we have now. 1 = + XRC, 2 = > > +XRC+ummunotify, etc. Drivers 0 out the function pointers they do not > > support. >

RE: [RFC] XRC upstream merge reboot

2011-05-18 Thread Hefty, Sean
> As long as the version number in the ibv_context is increasing and not > branching then I think it is OK. 0 = what we have now. 1 = + XRC, 2 = > +XRC+ummunotify, etc. Drivers 0 out the function pointers they do not > support. I was thinking more along this line, but I can see how using a named e

Re: [RFC] XRC upstream merge reboot

2011-05-18 Thread Jason Gunthorpe
On Wed, May 18, 2011 at 09:44:01AM -0700, Roland Dreier wrote: > and have support for named extensions, I think that would be even > better. ie we could define a bunch of new XRC related stuff and > then have some interface to the driver where we ask for the "XRC" > extension (by name with a stri

Re: [RFC] XRC upstream merge reboot

2011-05-18 Thread Roland Dreier
On Mon, May 16, 2011 at 2:13 PM, Hefty, Sean wrote: > libibverbs > -- > We define a new device capability flag IBV_DEVICE_EXT_OPS, indicating that > the library supports extended operations.  If set, the provider library > returns an extended structure from ibv_open_device(): > >        

RE: [RFC] XRC upstream merge reboot

2011-05-18 Thread Hefty, Sean
> Great that you are taking this on! I will review this next week. Hopefully I'll have some early patches sometime next week. See below for my current thoughts based on how the implementation is progressing. My thoughts change hourly. > > From an architecture viewpoint, XRC adds 4 new XRC sp

Re: [RFC] XRC upstream merge reboot

2011-05-18 Thread Jack Morgenstein
Sean, Great that you are taking this on! I will review this next week. -Jack On Tuesday 17 May 2011 00:13, Hefty, Sean wrote: > I've been working on a set of XRC patches aimed at upstream inclusion to the > kernel, libibverbs, and librdmacm. I'm using existing patches as the major > starting