I am a bit concerned here. In the current usage model, target QPs are
destroyed when their reference
count goes to zero
(ib_reg_xrc_recv_qp and ibv_xrc_create_qp increment the reference count,
while ib_unreg_xrc_recv_qp
decrements it).
In this model, the TGT QP user/consumer does not
On Thursday 11 August 2011 01:20, Hefty, Sean wrote:
To help with OFED feature level compatibility, I'm in the process of adding a
new call to ibverbs:
struct ib_qp_open_attr {
void (*event_handler)(struct ib_event *, void *);
void *qp_context;
u32 qp_num;
};
I think it's good idea to support both usage models.
Regards,
Pasha.
Things only get complicated when the domain-allocator process allocates a
single domain and simply
uses that single domain for all jobs (i.e., the domain is never de-allocated
for the lifetime of the
allocating process,
Things only get complicated when the domain-allocator process allocates a
single domain and simply
uses that single domain for all jobs (i.e., the domain is never de-allocated
for the lifetime of the
allocating process, and the allocating process is the server for all jobs).
To help with
On Tuesday 02 August 2011 19:29, Shamis, Pavel wrote:
XRC domain is created by process that starts first. All the rest processes,
that belong
to the same mpi session and reside on the same node, join the domain.
TGT QP is created by process that receive inbound connection first and it is
BTW, did we have the same limitation/feature (only creating process is
allowed
to modify) in original XRC driver ?
I'm not certain about the implementation, but the OFED APIs would allow
any process within the xrc domain to modify the qp.
Hmm, is it way to destroy the QP, when the
Well, actually I was thinking about APM. If the creator exits, we do not
have a way to
upload alternative path.
Correct - that would be a limitation. You would need to move to a new tgt qp.
In a general solution, this involves not only allowing other processes to
modify the QP, but also
Well, actually I was thinking about APM. If the creator exits, we do not
have a way to
upload alternative path.
Correct - that would be a limitation. You would need to move to a new tgt
qp.
Well, in Open MPI we have XRC code that uses APM.
If Mellanox cares about the feature, they
Well, in Open MPI we have XRC code that uses APM.
If Mellanox cares about the feature, they would have to rework this part of
code in Open MPI.
I don't know about other apps.
But does the APM implementation expect some other process other than the
creator to be able to modify the QP?
APM
Well, in Open MPI we have XRC code that uses APM.
If Mellanox cares about the feature, they would have to rework this part of
code in Open MPI.
I don't know about other apps.
But does the APM implementation expect some other process other than
the creator to be able to modify the QP?
On Wed, Aug 03, 2011 at 05:16:17PM -0400, Shamis, Pavel wrote:
Well, in Open MPI we have XRC code that uses APM.
If Mellanox cares about the feature, they would have to rework this part
of
code in Open MPI.
I don't know about other apps.
But does the APM implementation
Where does the ib_verbs async event for APM state change get routed for
XRC?
The OFED APIs route QP events to all processes which register for that qp
number.
Does the event have enough info to identify all the necessary
parts?
The event carries the qp number only.
Can the process that
On Thu, Aug 04, 2011 at 12:06:24AM +, Hefty, Sean wrote:
Where does the ib_verbs async event for APM state change get routed for
XRC?
The OFED APIs route QP events to all processes which register for
that qp number.
?? How do you register for an event? There is only
?? How do you register for an event? There is only
ibv_get_async_event(3) - I thought it returned all events relevant to
the associated verbs context.
The OFED APIs for managing XRC receive QPs are:
int (*create_xrc_rcv_qp)(struct ibv_qp_init_attr *init_attr,
uint32_t
On Monday 01 August 2011 21:28, Hefty, Sean wrote:
From Pavel Shamis:
We do have unregister on finalization. But this code doesn't introduce any
synchronization across processes on the same node, since kernel manages the
receive qp. If the reference counter will be moved to app
Hi Jack,
Please see my comments below
From Pavel Shamis:
We do have unregister on finalization. But this code doesn't introduce any
synchronization across processes on the same node, since kernel manages the
receive qp. If the reference counter will be moved to app responsibility, it
will
We do have unregister on finalization. But this code doesn't introduce any
synchronization across processes on the same node, since kernel manages the
receive qp. If the reference counter will be moved to app responsibility, it
will enforce the app to mange the reference counter on app level
If the target QP is opened in low level driver, then it's owned by group of
processes that share the same XRC domain.
Can you define what you mean by 'owned'?
With the latest patches, the target qp is created in the kernel. Data received
on the target qp can go to any process sharing the
On Aug 2, 2011, at 5:25 PM, Hefty, Sean wrote:
If the target QP is opened in low level driver, then it's owned by group of
processes that share the same XRC domain.
Can you define what you mean by 'owned'?
With the latest patches, the target qp is created in the kernel. Data
received
BTW, did we have the same limitation/feature (only creating process is allowed
to modify) in original XRC driver ?
I'm not certain about the implementation, but the OFED APIs would allow any
process within the xrc domain to modify the qp.
Hmm, is it way to destroy the QP, when the original
Actually I think it is really not so good idea manage reference counter
across OOB communication.
But this is exactly what the current API *requires* that users of XRC do!!!
And I agree, it's not a good idea. :)
IMHO, I don't see a good reason to redefine existing API.
I afraid, that such
Actually I think it is really not so good idea manage reference counter
across OOB communication.
But this is exactly what the current API *requires* that users of XRC do!!!
And I agree, it's not a good idea. :)
We do have unregister on finalization. But this code doesn't introduce any
We do have unregister on finalization. But this code doesn't introduce any
synchronization across processes on the same node, since kernel manages the
receive qp. If the reference counter will be moved to app responsibility, it
will enforce the app to mange the reference counter on app level ,
Please see my notes below.
I've tried to come up with a clean way to determine the lifetime of an xrc
tgt qp,\
and I think the best approach is still:
1. Allow the creating process to destroy it at any time, and
2a. If not explicitly destroyed, the tgt qp is bound to the lifetime of the
On Wednesday 20 July 2011 21:51, Hefty, Sean wrote:
I've tried to come up with a clean way to determine the lifetime of an xrc
tgt qp,\
and I think the best approach is still:
1. Allow the creating process to destroy it at any time, and
2a. If not explicitly destroyed, the tgt qp is
On Thursday 21 July 2011 10:38, Jack Morgenstein wrote:
Having a new OFED support BOTH interfaces is a nightmare I don't even want to
think about!
I over-reacted here, sorry about that. I know that it will be difficult
to support both the old and the new interface. However, to support the
On Jul 21, 2011, at 3:38 AM, Jack Morgenstein wrote:
If MPI can use a different XRC domain per job (and deallocate the domain
at the job's end), this would solve the tgt qp lifetime problem (-- by
destroying all the tgt qp's when the xrc domain is deallocated).
What happens if the MPI job
On Jul 21, 2011, at 8:47 AM, Jack Morgenstein wrote:
[snip]
When the last user of an XRC domain exits cleanly (or crashes), the domain
should be destroyed.
In this case, with Sean's design, the tgt qp's for the XRC domain should also
be destroyed.
Sounds perfect.
--
Jeff Squyres
If you use file descriptors for the XRC domain, then when the last user of the
domain exits, the domain
gets destroyed (at least this is in OFED. Sean's code looks the same).
In this case, the kernel cleanup code for the process should close the XRC
domains opened by that
process, so
I've tried to come up with a clean way to determine the lifetime of an xrc
tgt qp,\
and I think the best approach is still:
1. Allow the creating process to destroy it at any time, and
2a. If not explicitly destroyed, the tgt qp is bound to the lifetime of the
xrc domain
or
2b.
I've tried to come up with a clean way to determine the lifetime of an xrc tgt
qp, and I think the best approach is still:
1. Allow the creating process to destroy it at any time, and
2a. If not explicitly destroyed, the tgt qp is bound to the lifetime of the xrc
domain
or
2b. The creating
On Wednesday 22 June 2011 22:57, Hefty, Sean wrote:
We can report the creation of a tgt qp on an xrcd as an async event.
To whom?
to all users of the xrcd. IMO, if we require undefined, out of band
communication to use
XRC, then we have an incomplete solution. It's just too bad that
Hi Sean,
Some initial feature feedback.
I noticed (from the code in your git, xrc branch) that the XRC target QPs
stick around until the XRC domain is de-allocated.
There was a long thread about this in December, 2007, where the MPI community
found this approach unacceptable (leading to
I noticed (from the code in your git, xrc branch) that the XRC target QPs
stick around until the XRC domain is de-allocated.
There was a long thread about this in December, 2007, where the MPI
community
found this approach unacceptable (leading to accumulation of dead XRC
TGT qp's).
They
On Wednesday 22 June 2011 19:14, Hefty, Sean wrote:
This is partly true, and I haven't come up with a better way to handle this.
Note that the patches allow the original creator of the TGT QP to destroy it
by simply calling ibv_destroy_qp().
This doesn't handle the process dying, but maybe
After looking at the implementation more, what I didn't like about the
reg/unreg
calls is that it is independent of receiving data on an SRQ. That is, a
user can
receive data on an SRQ through a TGT QP before they have registered and
after unregistering.
That is correct, but the
I read over the threads that you referenced. I do understand what the
reg/unreg calls were trying to do. In short, I agree with your original
approach
of letting the tgt qp hang around while the xrcd exists,
and I'm not convinced what HP MPI was trying to do should drive a
more
For MPI, I would expect an xrcd to be associated with a single job instance.
So did I, but they said that this was not the case, and they were very
pleased
with the final (more complicated implementation-wise) interface.
We need to get them involved in this discussion ASAP.
For MPI, I would expect an xrcd to be associated with a single job
instance.
So did I, but they said that this was not the case, and they were very
pleased
with the final (more complicated implementation-wise) interface.
We need to get them involved in this discussion ASAP.
I agree. But
Sean,
Great that you are taking this on! I will review this next week.
-Jack
On Tuesday 17 May 2011 00:13, Hefty, Sean wrote:
I've been working on a set of XRC patches aimed at upstream inclusion to the
kernel, libibverbs, and librdmacm. I'm using existing patches as the major
starting
Great that you are taking this on! I will review this next week.
Hopefully I'll have some early patches sometime next week. See below for my
current thoughts based on how the implementation is progressing. My thoughts
change hourly.
From an architecture viewpoint, XRC adds 4 new XRC
On Mon, May 16, 2011 at 2:13 PM, Hefty, Sean sean.he...@intel.com wrote:
libibverbs
--
We define a new device capability flag IBV_DEVICE_EXT_OPS, indicating that
the library supports extended operations. If set, the provider library
returns an extended structure from
On Wed, May 18, 2011 at 09:44:01AM -0700, Roland Dreier wrote:
and have support for named extensions, I think that would be even
better. ie we could define a bunch of new XRC related stuff and
then have some interface to the driver where we ask for the XRC
extension (by name with a string)
As long as the version number in the ibv_context is increasing and not
branching then I think it is OK. 0 = what we have now. 1 = + XRC, 2 =
+XRC+ummunotify, etc. Drivers 0 out the function pointers they do not
support.
I was thinking more along this line, but I can see how using a named
On Wed, May 18, 2011 at 05:30:30PM +, Hefty, Sean wrote:
As long as the version number in the ibv_context is increasing and not
branching then I think it is OK. 0 = what we have now. 1 = + XRC, 2 =
+XRC+ummunotify, etc. Drivers 0 out the function pointers they do not
support.
I was
The size is 3*64 + 1*32 so there is a 32 bit pad, thus we can rewrite
it as:
union {
struct {
uint64_tremote_addr;
uint32_trkey;
uint32_txrc_remote_qpn;
}
On Wed, May 18, 2011 at 06:13:54PM +, Hefty, Sean wrote:
You need it in the normal send case as well, either outside of the
union, or part of a new struct within the union.
Works for me..
union {
[..]
struct {
uint64_t reserved1[3];
uint32_t reserved2;
uint32_t
On Wed, May 18, 2011 at 10:30 AM, Hefty, Sean sean.he...@intel.com wrote:
As long as the version number in the ibv_context is increasing and not
branching then I think it is OK. 0 = what we have now. 1 = + XRC, 2 =
+XRC+ummunotify, etc. Drivers 0 out the function pointers they do not
support.
What about something along the lines of the following? This is 2 incomplete
patches squashed together, lacking any serious documentation. I believe this
will support existing apps and driver libraries, as either binary or
re-compiling unmodified source code.
A driver library calls
49 matches
Mail list logo