Re: [RFC] XRC upstream merge reboot

2011-08-11 Thread Shamis, Pavel
I think it's good idea to support both usage models. Regards, Pasha. >> Things only get complicated when the domain-allocator process allocates a >> single domain and simply >> uses that single domain for all jobs (i.e., the domain is never de-allocated >> for the lifetime of the >> allocating

RE: [RFC] XRC upstream merge reboot

2011-08-03 Thread Shamis, Pavel
> > > Well, in Open MPI we have XRC code that uses APM. > > If Mellanox cares about the feature, they would have to rework this part of > > code in Open MPI. > > I don't know about other apps. > > But does the APM implementation expect some other process other than > the creator to be able to mod

RE: [RFC] XRC upstream merge reboot

2011-08-03 Thread Shamis, Pavel
> > Well, actually I was thinking about APM. If the "creator" exits, we do not > > have a way to > > upload alternative path. > > Correct - that would be a limitation. You would need to move to a new tgt > qp. Well, in Open MPI we have XRC code that uses APM. If Mellanox cares about the featur

RE: [RFC] XRC upstream merge reboot

2011-08-03 Thread Shamis, Pavel
> > > BTW, did we have the same limitation/feature (only creating process is > allowed > > to modify) in original XRC driver ? > > I'm not certain about the implementation, but the OFED APIs would allow > any process within the xrc domain to modify the qp. > > > Hmm, is it way to destroy the QP

Re: [RFC] XRC upstream merge reboot

2011-08-02 Thread Shamis, Pavel
On Aug 2, 2011, at 5:25 PM, Hefty, Sean wrote: >> If the target QP is opened in low level driver, then it's owned by group of >> processes that share the same XRC domain. > > Can you define what you mean by 'owned'? > > With the latest patches, the target qp is created in the kernel. Data > re

Re: [RFC] XRC upstream merge reboot

2011-08-02 Thread Shamis, Pavel
>> We do have unregister on finalization. But this code doesn't introduce any >> synchronization across processes on the same node, since kernel manages the >> receive qp. If the reference counter will be moved to app responsibility, it >> will enforce the app to mange the reference counter on app

Re: [RFC] XRC upstream merge reboot

2011-08-02 Thread Shamis, Pavel
Hi Jack, Please see my comments below > From Pavel Shamis: >>> We do have unregister on finalization. But this code doesn't introduce any >>> synchronization across processes on the same node, since kernel manages the >>> receive qp. If the reference counter will be moved to app responsibility, it

Re: [RFC] XRC upstream merge reboot

2011-08-01 Thread Shamis, Pavel
>> Actually I think it is really not so good idea manage reference counter >> across OOB communication. > > But this is exactly what the current API *requires* that users of XRC do!!! > And I agree, it's not a good idea. :) We do have unregister on finalization. But this code doesn't introduce

Re: [RFC] XRC upstream merge reboot

2011-07-26 Thread Shamis, Pavel
Please see my notes below. >>> I've tried to come up with a clean way to determine the lifetime of an xrc >> tgt qp,\ >>> and I think the best approach is still: >>> >>> 1. Allow the creating process to destroy it at any time, and >>> >>> 2a. If not explicitly destroyed, the tgt qp is bound to t