Roland Dreier wrote:
> I'm not sure about this -- does this lead to duplication of code like
> keeping track of outstanding requests? Or are you exporting some
> really low-level interface from ib_sa?
This did not lead to code duplication. As you guessed, I exported the
send_mad() interface.
>
> * Remove the multicast APIs from ib_sa.h. Multicast definitions are
> * relocated from ib_sa to the ib_multicast module. (This provides
> * cleaner encapsulation of the multicast services.)
I'm not sure about this -- does this lead to duplication of code like
keeping track of outstanding re
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> * Added client register/unregister routines.
BTW, we want this stuff for CM an CMA as well, don't we?
Could be nice to have that module unloading race in CM and CMA closed
for 2.6.19.
--
MST
___
openib-gene
Based on the feedback, I've made the following changes to this patch:
* Remove the multicast APIs from ib_sa.h. Multicast definitions are relocated
from ib_sa to the ib_multicast module. (This provides cleaner encapsulation of
the multicast services.)
* Added support for non-equal MTU, rate, a
Eitan Zahavi wrote:
> I agree that layering on top is easier. But does it really solve the
> bug? I think not. If you would REPLACE the API and not provide both options
> (above and below refcount enforcement ) it would make sense to me.
We disagree on the philosophy here. I view ib_mad as multi
Sean Hefty wrote:
> Eitan Zahavi wrote:
>
>> I disagree. If you sniff at the MAD level you can simply react to the
>> lower level messages.
>>
>
> First, when designing this, I did consider using the MAD snooping ability,
> and
> changing what could be done with snooping. However, the m
>I'm not sure -- who are the customers for this? What do they want to
>do really?
The main customers I'm aware of are MPI developers. For direct IB multicast
usage, the national labs are converting some of their MPI algorithms to use IB
multicast. Their current approach is to create a multicast
Sean> On this same thought, do you have an idea of an interface
Sean> that you'd accept to export raw IB multicast support up to
Sean> userspace?
I'm not sure -- who are the customers for this? What do they want to
do really?
- R.
___
ope
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: RE: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add
> ib_multicast module to track join requests from the same port
>
> >Another comment that I'd like not to get in the noise is that we need
> >to h
>Another comment that I'd like not to get in the noise is that we need
>to handle the full set of SA queries, not just EQ.
I think that functionality can be added separately, but ib_multicast is only
intended to handle Set / Delete methods. Get and GetTable methods would still
go through the ib_s
Roland Dreier wrote:
> We want a way for unprivileged userspace to be able to use multicast.
> Usually I say "just use a privileged daemon in userspace" but I think
> in this case we actually need coordination between the kernel and
> userspace to track _all_ multicast joins, so it does make sense
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: Re: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add
> ib_multicast module to track join requests from the same port
>
> Michael S. Tsirkin wrote:
> > Ah. I get it.
> > If my callback will return e
>But unlike the sa races which were unfixable without API changes,
>here users can synchronize the removal of the mc object.
>So I think what you describe is a user error.
The user can ensure that an id is only destroyed once. What they cannot ensure
is whether their callback is still running.
>
Michael S. Tsirkin wrote:
> Ah. I get it.
> If my callback will return error, I must make sure I won't destroy the cm_id.
> But this means that I don't get the protection on destroy that was
> checking that callbacks have all gone.
>
>
> So, let's solve it in the same way we did for sa?
Yes - I
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: RE: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add
> ib_multicast module to track join requests from the same port
>
> >But unlike the sa races which were unfixable without API changes,
> >here users
>But what is the issue? some kind of race?
If we look at just the ib_multicast patches as an example...
Calling ib_join_multicast allocates a struct ib_multicast that must be freed.
Here's the relevant portion of ipoib's join callback:
@@ -325,11 +328,10 @@ ipoib_mcast_sendonly_join_complete(int
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: RE: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add
> ib_multicast module to track join requests from the same port
>
> >But what is the issue? some kind of race?
>
> If we look at just the ib_mult
Eitan Zahavi wrote:
> I disagree. If you sniff at the MAD level you can simply react to the
> lower level messages.
First, when designing this, I did consider using the MAD snooping ability, and
changing what could be done with snooping. However, the multicast handling is
not simply sniffing M
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> The ib_cm and rdma_cm have the issue if a client uses the return value from
> the callback to destroy their cm_id's.
But what is the issue? some kind of race?
--
MST
___
openib-general mailing list
openib-
Michael S. Tsirkin wrote:
> Hmm, sorry, I forgot.
> Could you restate what the ib_cm/rdma_cm problem is, please?
> Shouldn't we solve that, too?
The ib_multicast API needs register/unregister calls to prevent module unload
races.
The ib_cm and rdma_cm have the issue if a client uses the return v
Eitan Zahavi wrote:
> Roland Dreier wrote:
>> Eitan> 1 that you know about. Others did not make it into the
>> Eitan> kernel but are quite productive to those running them.
>>
>> What are those others?
> CFS is an example.
Lustre o2ibnld is using RC only and from what i know is coded over
Sean Hefty wrote:
> Eitan Zahavi wrote:
>
>> So if it is then there is no problem sniffing it and refcounting.
>>
>
> The MADs cannot simply be sniffed and counted. MADs which affect the same
> multicast group should not always be sent. Join operations must be
> serialized
> against le
Roland Dreier wrote:
> Eitan> 1 that you know about. Others did not make it into the
> Eitan> kernel but are quite productive to those running them.
>
> What are those others?
>
CFS is an example.
> Eitan> Changing top API for ULPs and Clients is simpler to
> Eitan> implement but
Roland Dreier wrote:
> Eitan> If the tracking (ref counting) was done at the MAD level -
> Eitan> no change to IPoIB would have been required ...
>
> It doesn't seem very feasible to implement a complete local copy of
> the SA (in the kernel no less) so that we can allow unprivileged
> proc
Roland Dreier wrote:
> Or> Its not a rush its a move for enabling user space code that
> Or> can offload IP Multicast. We have a library doing that which
> Or> is coded over the gen1 stack and is now in porting for the
> Or> gen2 stack.
>
> OK -- I would like to hear your experienc
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: Re: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add
> ib_multicast module to track join requests from the same port
>
> Sean Hefty wrote:
> >>>+
Sean Hefty wrote:
>>>+ int (*callback)(int status,
>>>+ struct ib_multicast
>>>+ *multicast),
>>>+ void *context);
>>>+
>>
>
Eitan Zahavi wrote:
> So if it is then there is no problem sniffing it and refcounting.
The MADs cannot simply be sniffed and counted. MADs which affect the same
multicast group should not always be sent. Join operations must be serialized
against leave operations, especially when the join/lea
Roland Dreier wrote:
> Yes, I would like to see another consumer, just to get a sanity check
> that you have implemented the right thing.
I think aiming for 2.6.20 is reasonable, but I'm fine if this doesn't end up
making it. I'd like to get it into a branch targeting some kernel release
(even
Eitan> User level code can be run by root. It can access QP1 and
Eitan> bypass your nice API. Also you ignore current kernel
Eitan> implementations that exist and already perform QP1 access
Eitan> via the SA client code in the kernel.
root can already do anything at all so I don't
Or> Its not a rush its a move for enabling user space code that
Or> can offload IP Multicast. We have a library doing that which
Or> is coded over the gen1 stack and is now in porting for the
Or> gen2 stack.
OK -- I would like to hear your experiences porting on top of this.
- R.
Eitan> 1 that you know about. Others did not make it into the
Eitan> kernel but are quite productive to those running them.
What are those others?
Eitan> Changing top API for ULPs and Clients is simpler to
Eitan> implement but provide wrong tradeoff for functionality that
Eita
Michael> Why is this even a good idea? If you are looking for
Michael> reasons using mutlicast module in ipoib is good, I would
Michael> say blocking unpriviledged userspace from joining IPoIB
Michael> GID and snoopig on all mcast traffic sounds like a better
Michael> idea. BT
Eitan> Hi Or, Maybe I did not explain myself right. The idea is
Eitan> not to implement it in the mad.c code but rather to
Eitan> implement it at the lowest level: The problem with a new
Eitan> API is that a single ULP/applications which does direct
Eitan> umad or QP1 access wi
Eitan> If the tracking (ref counting) was done at the MAD level -
Eitan> no change to IPoIB would have been required ...
It doesn't seem very feasible to implement a complete local copy of
the SA (in the kernel no less) so that we can allow unprivileged
processes to send on QP1.
- R.
__
Sean Hefty wrote:
> Eitan Zahavi wrote:
>
>> User level code can be run by root. It can access QP1 and bypass your nice
>> API. Also you ignore current kernel implementations that exist and already
>> perform QP1 access via the SA client code in the kernel.
>>
>
> Yep - and that same app c
Eitan Zahavi wrote:
> User level code can be run by root. It can access QP1 and bypass your nice
> API. Also you ignore current kernel implementations that exist and already
> perform QP1 access via the SA client code in the kernel.
Yep - and that same app can, today, delete the multicast groups
On Wed, 2006-10-11 at 12:24, Eitan Zahavi wrote:
> Sean Hefty wrote:
> > Michael S. Tsirkin wrote:
> >
> >> Why is this even a good idea?
> >> If you are looking for reasons using mutlicast module in ipoib is good, I
> >> would
> >> say blocking unpriviledged userspace from joining IPoIB GID an
User level code can be run by root. It can access QP1 and bypass your
nice API.
Also you ignore current kernel implementations that exist and already
perform QP1 access via the SA client code in the kernel.
Sean Hefty wrote:
> Eitan Zahavi wrote:
>
>> The point is not the fact you need to lay
The point is not the fact you need to layer. But you can not enforce ref
counting by adding a top layer everybody can bypass.
It simply breaks on the first client that goes directly to the lower level.
Do you have a solution for this problem?
The layer you need to add is BELOW the current interfac
Hal Rosenstock wrote:
> On Wed, 2006-10-11 at 12:24, Eitan Zahavi wrote:
>
>> Sean Hefty wrote:
>>
>>> Michael S. Tsirkin wrote:
>>>
>>>
Why is this even a good idea?
If you are looking for reasons using mutlicast module in ipoib is good, I
would
say blocking
Eitan Zahavi wrote:
> The point is not the fact you need to layer. But you can not enforce ref
> counting by adding a top layer everybody can bypass.
> It simply breaks on the first client that goes directly to the lower level.
> Do you have a solution for this problem?
> The layer you need to add
Michael S. Tsirkin wrote:
> We already do state tracking to join/leave groups properly.
> It would be easier to review this change if you left killing off
> mcast->done and other such improvements for a separate patch.
The multicast module does reference counting and completion handling for us.
M
Or Gerlitz wrote:
> Just to make sure, by "ipoib multicast group" you mean an MGID which is
> derives from an IP Multicast address and the routing info on the device
> this address/subnets is routed to (ie the ipoib device pkey which you
> can get from the broadcast MAC of this device)?
I mean
Sean Hefty wrote:
> Michael S. Tsirkin wrote:
>
>> Why is this even a good idea?
>> If you are looking for reasons using mutlicast module in ipoib is good, I
>> would
>> say blocking unpriviledged userspace from joining IPoIB GID and snoopig on
>> all
>> mcast traffic sounds like a better idea
Sean Hefty wrote:
> Why is it a bad idea? The architecture allows this. However, none of
> the proposed patches allows a userspace app to join an ipoib multicast
> group.
Just to make sure, by "ipoib multicast group" you mean an MGID which is
derives from an IP Multicast address and the rou
Michael S. Tsirkin wrote:
>>+struct ib_multicast {
>>+ struct ib_sa_mcmember_rec rec;
>>+ ib_sa_comp_mask comp_mask;
>>+ int (*callback)(int status,
>>+ struct ib_multicast *multicast);
>>+ void*
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> > Finally, the patch in question also seems to introduce more cleanups and
> > such. It would be less controversial if it was just an API change.
>
> The cleanups are part of the change. Once ib_join_multicast() has been
> invoked,
> ib_free_multicas
Michael S. Tsirkin wrote:
> Why is this even a good idea?
> If you are looking for reasons using mutlicast module in ipoib is good, I
> would
> say blocking unpriviledged userspace from joining IPoIB GID and snoopig on all
> mcast traffic sounds like a better idea. BTW, Sean, I think this is some
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: Re: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add
> ib_multicast module to track join requests from the same port
>
> Michael S. Tsirkin wrote:
> > It is somewhat unfortunate that we are duplicating
Eitan Zahavi wrote:
> Maybe I did not explain myself right.
> The idea is not to implement it in the mad.c code but rather to
> implement it at the lowest level:
> The problem with a new API is that a single ULP/applications which does
> direct umad or QP1 access will break the reference count.
>
Michael S. Tsirkin wrote:
> It is somewhat unfortunate that we are duplicating the SA logic
> at the endnode in kernel memory here - current sa module has
> the advantage in that it just packs all data into a mad and sends it out.
> Something to think about.
What alternative do you propose? The S
Quoting r. Or Gerlitz <[EMAIL PROTECTED]>:
> to make the testing
> of uCMA consumers robust, applying this patch set on the for-2.6.20
> branch of the IB git tree is essential.
Not really.
Just build a git tree and put whatever you want there.
--
MST
___
Hi Or,
Maybe I did not explain myself right.
The idea is not to implement it in the mad.c code but rather to
implement it at the lowest level:
The problem with a new API is that a single ULP/applications which does
direct umad or QP1 access will break the reference count.
Implementing at the lo
Michael S. Tsirkin wrote:
> Quoting r. Or Gerlitz <[EMAIL PROTECTED]>:
>> This porting relies on Sean's work to allow for joining/leaving "IP
>> Multicast" mcast groups from user space. And anyway, when you expose
>> something to user space, you might not "see" the usage as it can be not
>> an
Quoting r. Or Gerlitz <[EMAIL PROTECTED]>:
> > Let's see it get some use before switching a basic component over.
>
> This porting relies on Sean's work to allow for joining/leaving "IP
> Multicast" mcast groups from user space. And anyway, when you expose
> something to user space, you might no
Michael S. Tsirkin wrote:
> Quoting r. Or Gerlitz <[EMAIL PROTECTED]>:
>> The ipoib change is a must to allow for user space consumers to
>> join/leave groups whose MGID is "of IP multicast origin".
>> So one process can have its IP mulitcast traffic carried out by ipoib
>> and another process c
Eitan Zahavi wrote:
> If the tracking (ref counting) was done at the MAD level - no change to
> IPoIB would have been required ...
Maybe.
You could also implement all the ib stack core in one module...
The openib designers have chosen not to do so and rather break it into
smaller modules namely
Or Gerlitz wrote:
> Michael S. Tsirkin wrote:
>
>> Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
>>
>>> Add an ib_multicast module to perform reference counting of multicast
>>> join / leave requests. Modify ib_ipoib to use the multicast module.
>>> Signed-off-by: Sean Hefty <[EMAIL PROTECTED
Quoting r. Or Gerlitz <[EMAIL PROTECTED]>:
> The ipoib change is a must to allow for user space consumers to
> join/leave groups whose MGID is "of IP multicast origin".
>
> So one process can have its IP mulitcast traffic carried out by ipoib
> and another process can use librdmacm to join the sa
Michael S. Tsirkin wrote:
> Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
>> Add an ib_multicast module to perform reference counting of multicast
>> join / leave requests. Modify ib_ipoib to use the multicast module.
>> Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>
> On the ipoib change - whya re w
OK, I'll talk about the ipoib change part here.
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> >You might be right. But I wander whether we'll regret it later that
> >we switched to the slower generic thing when we already had a stable,
> >streamlined version.
>
> To be direct, I'm not sure that I'
IPoIB discussion aside, some more comments on the API:
> +struct ib_multicast {
> + struct ib_sa_mcmember_rec rec;
> + ib_sa_comp_mask comp_mask;
> + int (*callback)(int status,
> + struct ib_multicast *multicast);
> +
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> >Multiple distinct mcmember_rec queries could get us the same mcast group.
> >Say MTU > 256 and MTU > 512 look different but actually will get
> >same group in practice.
> >
> >Say 2 clients ask for these 2 queries.
> >What will be in the ib_sa_mcmember_
Hi
I also agree ref-counting of multicast membership is required.
However, inventing a new API is not required and actually harmful as it
delays availability by requiring every application out there to migrate
to yet another API.
Instead it would be preferable to implement this as part of mad.c
>No, I was only speaking about IPoIB part of the patch.
>I didn't look at atomic usage inside the new module yet.
I'm not clear on what you code you're referring to. No new atomics or locks
were added to ipoib. I just tried to rely solely on the flags to determine
state.
>You might be right. Bu
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> >And I'm worried about the extensive use of atomic operations
> >this patch introduces - both performance and race-condition-wise.
> >Can't we stick to simple locking? Replacing completion with a single
> >BUSY bit looks especially scary.
>
> I reworked
>It would make some sense to keep this API separate if the mcast module would
>only be targeted at userspace users - kernel consumers likely
>do not share mcast groups, so they could avoid the overhead.
>But since this patch seems to move all kernels users to the new API to -
>why does this need to
Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to
> track join requests from the same port
>
> The IB SA tracks multicast join / leave requests on a per port basis.
> In order to support multiple users of the same multicast gro
The IB SA tracks multicast join / leave requests on a per port basis.
In order to support multiple users of the same multicast group from
the same port, we need to perform local reference counting on each
of the nodes.
Add an ib_multicast module to perform reference counting of multicast
join / le
70 matches
Mail list logo