> I am a bit concerned here. In the current usage model, target QPs are
> destroyed when their reference
> count goes to zero
> (ib_reg_xrc_recv_qp and ibv_xrc_create_qp increment the reference count,
> while ib_unreg_xrc_recv_qp
> decrements it).
> In this model, the TGT QP user/consumer does n
On Thursday 11 August 2011 01:20, Hefty, Sean wrote:
> To help with OFED feature level compatibility, I'm in the process of adding a
> new call to ibverbs:
>
> struct ib_qp_open_attr {
> void (*event_handler)(struct ib_event *, void *);
> void *qp_context;
> u32 qp_num
I think it's good idea to support both usage models.
Regards,
Pasha.
>> Things only get complicated when the domain-allocator process allocates a
>> single domain and simply
>> uses that single domain for all jobs (i.e., the domain is never de-allocated
>> for the lifetime of the
>> allocating
> Things only get complicated when the domain-allocator process allocates a
> single domain and simply
> uses that single domain for all jobs (i.e., the domain is never de-allocated
> for the lifetime of the
> allocating process, and the allocating process is the server for all jobs).
To help with
> ?? How do you register for an event? There is only
> ibv_get_async_event(3) - I thought it returned all events relevant to
> the associated verbs context.
The OFED APIs for managing XRC receive QPs are:
int (*create_xrc_rcv_qp)(struct ibv_qp_init_attr *init_attr,
uint32
On Thu, Aug 04, 2011 at 12:06:24AM +, Hefty, Sean wrote:
> > Where does the ib_verbs async event for APM state change get routed for
> > XRC?
>
> The OFED APIs route QP events to all processes which register for
> that qp number.
?? How do you register for an event? There is only
ibv_get_asy
> Where does the ib_verbs async event for APM state change get routed for
> XRC?
The OFED APIs route QP events to all processes which register for that qp
number.
> Does the event have enough info to identify all the necessary
> parts?
The event carries the qp number only.
> Can the process th
On Wed, Aug 03, 2011 at 05:16:17PM -0400, Shamis, Pavel wrote:
> >
> > > Well, in Open MPI we have XRC code that uses APM.
> > > If Mellanox cares about the feature, they would have to rework this part
> > > of
> > > code in Open MPI.
> > > I don't know about other apps.
> >
> > But does the APM
>
> > Well, in Open MPI we have XRC code that uses APM.
> > If Mellanox cares about the feature, they would have to rework this part of
> > code in Open MPI.
> > I don't know about other apps.
>
> But does the APM implementation expect some other process other than
> the creator to be able to mod
> Well, in Open MPI we have XRC code that uses APM.
> If Mellanox cares about the feature, they would have to rework this part of
> code in Open MPI.
> I don't know about other apps.
But does the APM implementation expect some other process other than the
creator to be able to modify the QP?
APM
> > Well, actually I was thinking about APM. If the "creator" exits, we do not
> > have a way to
> > upload alternative path.
>
> Correct - that would be a limitation. You would need to move to a new tgt
> qp.
Well, in Open MPI we have XRC code that uses APM.
If Mellanox cares about the featur
> Well, actually I was thinking about APM. If the "creator" exits, we do not
> have a way to
> upload alternative path.
Correct - that would be a limitation. You would need to move to a new tgt qp.
In a general solution, this involves not only allowing other processes to
modify the QP, but als
>
> > BTW, did we have the same limitation/feature (only creating process is
> allowed
> > to modify) in original XRC driver ?
>
> I'm not certain about the implementation, but the OFED APIs would allow
> any process within the xrc domain to modify the qp.
>
> > Hmm, is it way to destroy the QP
On Tuesday 02 August 2011 19:29, Shamis, Pavel wrote:
> XRC domain is created by process that starts first. All the rest processes,
> that belong
> to the same mpi session and reside on the same node, join the domain.
> TGT QP is created by process that receive inbound connection first and it i
> BTW, did we have the same limitation/feature (only creating process is allowed
> to modify) in original XRC driver ?
I'm not certain about the implementation, but the OFED APIs would allow any
process within the xrc domain to modify the qp.
> Hmm, is it way to destroy the QP, when the origina
On Aug 2, 2011, at 5:25 PM, Hefty, Sean wrote:
>> If the target QP is opened in low level driver, then it's owned by group of
>> processes that share the same XRC domain.
>
> Can you define what you mean by 'owned'?
>
> With the latest patches, the target qp is created in the kernel. Data
> re
> If the target QP is opened in low level driver, then it's owned by group of
> processes that share the same XRC domain.
Can you define what you mean by 'owned'?
With the latest patches, the target qp is created in the kernel. Data received
on the target qp can go to any process sharing the as
>> We do have unregister on finalization. But this code doesn't introduce any
>> synchronization across processes on the same node, since kernel manages the
>> receive qp. If the reference counter will be moved to app responsibility, it
>> will enforce the app to mange the reference counter on app
Hi Jack,
Please see my comments below
> From Pavel Shamis:
>>> We do have unregister on finalization. But this code doesn't introduce any
>>> synchronization across processes on the same node, since kernel manages the
>>> receive qp. If the reference counter will be moved to app responsibility, it
On Monday 01 August 2011 21:28, Hefty, Sean wrote:
>From Pavel Shamis:
> > We do have unregister on finalization. But this code doesn't introduce any
> > synchronization across processes on the same node, since kernel manages the
> > receive qp. If the reference counter will be moved to app respon
> We do have unregister on finalization. But this code doesn't introduce any
> synchronization across processes on the same node, since kernel manages the
> receive qp. If the reference counter will be moved to app responsibility, it
> will enforce the app to mange the reference counter on app leve
>> Actually I think it is really not so good idea manage reference counter
>> across OOB communication.
>
> But this is exactly what the current API *requires* that users of XRC do!!!
> And I agree, it's not a good idea. :)
We do have unregister on finalization. But this code doesn't introduce
> Actually I think it is really not so good idea manage reference counter
> across OOB communication.
But this is exactly what the current API *requires* that users of XRC do!!!
And I agree, it's not a good idea. :)
> IMHO, I don't see a good reason to redefine existing API.
> I afraid, that su
Please see my notes below.
>>> I've tried to come up with a clean way to determine the lifetime of an xrc
>> tgt qp,\
>>> and I think the best approach is still:
>>>
>>> 1. Allow the creating process to destroy it at any time, and
>>>
>>> 2a. If not explicitly destroyed, the tgt qp is bound to t
> > I've tried to come up with a clean way to determine the lifetime of an xrc
> tgt qp,\
> > and I think the best approach is still:
> >
> > 1. Allow the creating process to destroy it at any time, and
> >
> > 2a. If not explicitly destroyed, the tgt qp is bound to the lifetime of the
> xrc domain
> If you use file descriptors for the XRC domain, then when the last user of the
> domain exits, the domain
> gets destroyed (at least this is in OFED. Sean's code looks the same).
>
> In this case, the kernel cleanup code for the process should close the XRC
> domains opened by that
> process, s
On Jul 21, 2011, at 8:47 AM, Jack Morgenstein wrote:
> [snip]
> When the last user of an XRC domain exits cleanly (or crashes), the domain
> should be destroyed.
> In this case, with Sean's design, the tgt qp's for the XRC domain should also
> be destroyed.
Sounds perfect.
--
Jeff Squyres
jsq
On Thursday 21 July 2011 14:58, Jeff Squyres wrote:
> > If MPI can use a different XRC domain per job (and deallocate the domain
> > at the job's end), this would solve the tgt qp lifetime problem (-- by
> > destroying all the tgt qp's when the xrc domain is deallocated).
>
> What happens if the M
On Jul 21, 2011, at 3:38 AM, Jack Morgenstein wrote:
> If MPI can use a different XRC domain per job (and deallocate the domain
> at the job's end), this would solve the tgt qp lifetime problem (-- by
> destroying all the tgt qp's when the xrc domain is deallocated).
What happens if the MPI job c
On Thursday 21 July 2011 10:38, Jack Morgenstein wrote:
> Having a new OFED support BOTH interfaces is a nightmare I don't even want to
> think about!
I over-reacted here, sorry about that. I know that it will be difficult
to support both the old and the new interface. However, to support the
c
On Wednesday 20 July 2011 21:51, Hefty, Sean wrote:
> I've tried to come up with a clean way to determine the lifetime of an xrc
> tgt qp,\
> and I think the best approach is still:
>
> 1. Allow the creating process to destroy it at any time, and
>
> 2a. If not explicitly destroyed, the tgt qp
I've tried to come up with a clean way to determine the lifetime of an xrc tgt
qp, and I think the best approach is still:
1. Allow the creating process to destroy it at any time, and
2a. If not explicitly destroyed, the tgt qp is bound to the lifetime of the xrc
domain
or
2b. The creating proc
> If you are thinking of using this for tracking, there is no way that I can
> see except reference counting to know when all users of the TGT have
> finished with it. Indicating usage by "tapping" the TGT QP whenever an
> SRQ receives a packet via the TGT is not a good idea (this is data
> path!)
On Wednesday 22 June 2011 22:57, Hefty, Sean wrote:
> > > We can report the creation of a tgt qp on an xrcd as an async event.
> > To whom?
>
> to all users of the xrcd. IMO, if we require undefined, out of band
> communication to use
> XRC, then we have an incomplete solution. It's just too ba
On Wednesday 22 June 2011 22:57, Hefty, Sean wrote:
> Is there *any* way for a tgt qp to know if the remote ini qp is still active?
>
Not that I am aware of.
-Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More m
> > For MPI, I would expect an xrcd to be associated with a single job
> instance.
> So did I, but they said that this was not the case, and they were very
> pleased
> with the final (more complicated implementation-wise) interface.
> We need to get them involved in this discussion ASAP.
I agree.
> > For MPI, I would expect an xrcd to be associated with a single job instance.
> > So did I, but they said that this was not the case, and they were very
> > pleased
> > with the final (more complicated implementation-wise) interface.
> > We need to get them involved in this discussion ASAP.
> I read over the threads that you referenced. I do understand what the
> reg/unreg calls were trying to do. In short, I agree with your original
> approach
> of letting the tgt qp hang around while the xrcd exists,
> and I'm not convinced what HP MPI was trying to do should drive a
> more com
> > After looking at the implementation more, what I didn't like about the
> reg/unreg
> > calls is that it is independent of receiving data on an SRQ. That is, a
> user can
> > receive data on an SRQ through a TGT QP before they have registered and
> after unregistering.
> That is correct, but th
On Wednesday 22 June 2011 19:14, Hefty, Sean wrote:
> This is partly true, and I haven't come up with a better way to handle this.
> Note that the patches allow the original creator of the TGT QP to destroy it
> by simply calling ibv_destroy_qp().
> This doesn't handle the process dying, but may
> I noticed (from the code in your git, xrc branch) that the XRC target QPs
> stick around until the XRC domain is de-allocated.
> There was a long thread about this in December, 2007, where the MPI
> community
> found this approach unacceptable (leading to accumulation of "dead" XRC
> TGT qp's).
>
Hi Sean,
Some initial feature feedback.
I noticed (from the code in your git, xrc branch) that the XRC target QPs
stick around until the XRC domain is de-allocated.
There was a long thread about this in December, 2007, where the MPI community
found this approach unacceptable (leading to accumulat
What about something along the lines of the following? This is 2 incomplete
patches squashed together, lacking any serious documentation. I believe this
will support existing apps and driver libraries, as either binary or
re-compiling unmodified source code.
A driver library calls ibv_registe
On Wed, May 18, 2011 at 10:30 AM, Hefty, Sean wrote:
>> As long as the version number in the ibv_context is increasing and not
>> branching then I think it is OK. 0 = what we have now. 1 = + XRC, 2 =
>> +XRC+ummunotify, etc. Drivers 0 out the function pointers they do not
>> support.
>
> I was thi
On Wed, May 18, 2011 at 06:13:54PM +, Hefty, Sean wrote:
> You need it in the normal send case as well, either outside of the
> union, or part of a new struct within the union.
Works for me..
union {
[..]
struct {
uint64_t reserved1[3];
uint32_t reserved2;
uint32_t remote_q
> The size is 3*64 + 1*32 so there is a 32 bit pad, thus we can rewrite
> it as:
>
> union {
> struct {
> uint64_tremote_addr;
> uint32_trkey;
> uint32_txrc_remote_qpn;
>
On Wed, May 18, 2011 at 05:30:30PM +, Hefty, Sean wrote:
> > As long as the version number in the ibv_context is increasing and not
> > branching then I think it is OK. 0 = what we have now. 1 = + XRC, 2 =
> > +XRC+ummunotify, etc. Drivers 0 out the function pointers they do not
> > support.
>
> As long as the version number in the ibv_context is increasing and not
> branching then I think it is OK. 0 = what we have now. 1 = + XRC, 2 =
> +XRC+ummunotify, etc. Drivers 0 out the function pointers they do not
> support.
I was thinking more along this line, but I can see how using a named e
On Wed, May 18, 2011 at 09:44:01AM -0700, Roland Dreier wrote:
> and have support for named extensions, I think that would be even
> better. ie we could define a bunch of new XRC related stuff and
> then have some interface to the driver where we ask for the "XRC"
> extension (by name with a stri
On Mon, May 16, 2011 at 2:13 PM, Hefty, Sean wrote:
> libibverbs
> --
> We define a new device capability flag IBV_DEVICE_EXT_OPS, indicating that
> the library supports extended operations. If set, the provider library
> returns an extended structure from ibv_open_device():
>
>
> Great that you are taking this on! I will review this next week.
Hopefully I'll have some early patches sometime next week. See below for my
current thoughts based on how the implementation is progressing. My thoughts
change hourly.
> > From an architecture viewpoint, XRC adds 4 new XRC sp
Sean,
Great that you are taking this on! I will review this next week.
-Jack
On Tuesday 17 May 2011 00:13, Hefty, Sean wrote:
> I've been working on a set of XRC patches aimed at upstream inclusion to the
> kernel, libibverbs, and librdmacm. I'm using existing patches as the major
> starting
52 matches
Mail list logo