Re: [openib-general] IB mcast question

2006-08-15 Thread Dotan Barak
Hi guys. On Monday 14 August 2006 23:33, Sean Hefty wrote: Steve Wise wrote: So is this replicating done in the mthca hca? As just an FYI, I didn't see anything wrong in the mthca driver either when I was looking at this problem. Since one app is getting the mcast packet, can I

Re: [openib-general] IB mcast question

2006-08-15 Thread Steve Wise
can you send me this code? I suspect the main difference is that I'm using librdmacm to join and leave mcast groups. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please

Re: [openib-general] IB mcast question

2006-08-15 Thread Steve Wise
Just throwing out ideas here: Maybe something in the ib_sa_mcmember_rec is prohibiting replication on the HCA? And maybe ib_multicast is incorrectly building this record... struct ib_sa_mcmember_rec { union ib_gid mgid; union ib_gid port_gid; __be32 qkey;

Re: [openib-general] IB mcast question

2006-08-15 Thread Roland Dreier
Steve Just throwing out ideas here: Maybe something in the Steve ib_sa_mcmember_rec is prohibiting replication on the HCA? Steve And maybe ib_multicast is incorrectly building this Steve record... Shouldn't make a difference -- if one copy of the packet arrives at the HCA then

Re: [openib-general] IB mcast question

2006-08-15 Thread Steve Wise
How about qp attributes? pkeys? qkeys? On Tue, 2006-08-15 at 07:15 -0700, Roland Dreier wrote: Steve Just throwing out ideas here: Maybe something in the Steve ib_sa_mcmember_rec is prohibiting replication on the HCA? Steve And maybe ib_multicast is incorrectly building this

Re: [openib-general] IB mcast question

2006-08-15 Thread Roland Dreier
Steve How about qp attributes? pkeys? qkeys? Good question -- yes, the QPs will need be to set up with the right keys for packets to appear. It's definitely something to check. If different mcmembers are used for the first join of the group and subsequent joins by another QP, that could

Re: [openib-general] IB mcast question

2006-08-15 Thread Sean Hefty
Steve How about qp attributes? pkeys? qkeys? Good question -- yes, the QPs will need be to set up with the right keys for packets to appear. It's definitely something to check. The qkeys used by the RDMA CM sound like they may be the problem. I'll verify this and see how to fix it if so.

Re: [openib-general] IB mcast question

2006-08-15 Thread Sean Hefty
The qkeys used by the RDMA CM sound like they may be the problem. I'll verify this and see how to fix it if so. If I set the qkeys for the QPs and MCMemberRecord to 0, I can get this to work now. The RDMA CM uses a qkey = port number for UD QPs, and a qkey = IPv4 address for MCMemberRecords. A

Re: [openib-general] IB mcast question

2006-08-15 Thread Steve Wise
On Tue, 2006-08-15 at 09:58 -0700, Sean Hefty wrote: The qkeys used by the RDMA CM sound like they may be the problem. I'll verify this and see how to fix it if so. If I set the qkeys for the QPs and MCMemberRecord to 0, I can get this to work now. The RDMA CM uses a qkey = port number

Re: [openib-general] IB mcast question

2006-08-15 Thread Hal Rosenstock
On Tue, 2006-08-15 at 12:58, Sean Hefty wrote: The qkeys used by the RDMA CM sound like they may be the problem. I'll verify this and see how to fix it if so. If I set the qkeys for the QPs and MCMemberRecord to 0, I can get this to work now. The RDMA CM uses a qkey = port number for UD

Re: [openib-general] IB mcast question

2006-08-15 Thread Hal Rosenstock
On Tue, 2006-08-15 at 14:18, Sean Hefty wrote: A potential fix I see for this is to use the same qkey for all UD QPs and multicast groups created by the RDMA CM. Otherwise we restrict UD QPs to using a single destination (remote UD QP or multicast group.) Doesn't the QKey need to be the

Re: [openib-general] IB mcast question

2006-08-15 Thread Sean Hefty
In my IP-centric mind, the sender specifies the ip mcast address and a remote port. All hosts with subscribers to the ip mcast address get the packet, and all sockets on those hosts who are bound to the dst_port receive a copy. Other sockets on those hosts that joined the ipmcast group but are

Re: [openib-general] IB mcast question

2006-08-15 Thread Sean Hefty
Is the IP address only used locally to construct the MGID ? What does the MGID look like ? What signature does it use if any ? The IP address may also used be used to lookup routing information in order to bind to a local device. The address is then used locally construct the MGID. The MGID

Re: [openib-general] IB mcast question

2006-08-15 Thread Hal Rosenstock
On Tue, 2006-08-15 at 14:33, Sean Hefty wrote: Is the IP address only used locally to construct the MGID ? What does the MGID look like ? What signature does it use if any ? The IP address may also used be used to lookup routing information in order to bind to a local device. The address is

Re: [openib-general] IB mcast question

2006-08-15 Thread Sean Hefty
One of the reserved bytes in the MGID is 1 rather than 0 and it's using an IPv4 signature (0x401b) ? It uses a signature of 0x4001 to avoid conflicts with ipoib groups. Where does the qkey come from on the creation of the group ? The qkey is the same as the IPv4 address. I need to spend some

Re: [openib-general] IB mcast question

2006-08-15 Thread Steve Wise
[adding back to list] On Tue, 2006-08-15 at 11:59 -0700, Sean Hefty wrote: For type SOCK_DGRAM (UDP), the socket will receive packets from multiple subscribed ip mcast groups iff the dst_port of the incoming packet matches the port to which the socket is bound... This is what I was

Re: [openib-general] IB mcast question

2006-08-15 Thread Sean Hefty
Why are these separated? Isn't an address handle needed for each destination QP? If so, then why is the remote qpn/qkey also needed to transmit a datagram? The address handle doesn't include QPN/QKey information. Maybe think of them more as specifying the path to some port. - Sean

Re: [openib-general] IB mcast question

2006-08-15 Thread Steve Wise
On Tue, 2006-08-15 at 12:39 -0700, Sean Hefty wrote: Why are these separated? Isn't an address handle needed for each destination QP? If so, then why is the remote qpn/qkey also needed to transmit a datagram? The address handle doesn't include QPN/QKey information. Maybe think of them

Re: [openib-general] IB mcast question

2006-08-15 Thread Roland Dreier
Steve I was able to create a mcast group with the mc Steve qkey==0xe00a0a0a, and 3 apps joined this group, but their Steve qp qkeys were 0 (I changed ucma_init_ud_qp() to set the qp Steve qkey to 0). One app sent to the mcgroup ah/qkey/qpn and Steve the other two received the

Re: [openib-general] IB mcast question

2006-08-15 Thread Steve Wise
On Tue, 2006-08-15 at 13:17 -0700, Roland Dreier wrote: Steve I was able to create a mcast group with the mc Steve qkey==0xe00a0a0a, and 3 apps joined this group, but their Steve qp qkeys were 0 (I changed ucma_init_ud_qp() to set the qp Steve qkey to 0). One app sent to the

Re: [openib-general] IB mcast question

2006-08-15 Thread Hal Rosenstock
On Tue, 2006-08-15 at 16:07, Steve Wise wrote: On Tue, 2006-08-15 at 12:39 -0700, Sean Hefty wrote: Why are these separated? Isn't an address handle needed for each destination QP? If so, then why is the remote qpn/qkey also needed to transmit a datagram? The address handle doesn't

Re: [openib-general] IB mcast question

2006-08-15 Thread Sean Hefty
Steve I was able to create a mcast group with the mc Steve qkey==0xe00a0a0a, and 3 apps joined this group, but their Steve qp qkeys were 0 (I changed ucma_init_ud_qp() to set the qp Steve qkey to 0). One app sent to the mcgroup ah/qkey/qpn and Steve the other two received

Re: [openib-general] IB mcast question

2006-08-14 Thread Sean Hefty
However, if I run 2 instances of the app that reads mcasts and dumps them to stdout, I only get the mcast packets delivered to one of the applications. Namely the first one who joins the group seems to get the mcasts. I know for UDP/IP multicast, all applications bound to the same port and

Re: [openib-general] IB mcast question

2006-08-14 Thread Roland Dreier
Steve However, if I run 2 instances of the app that reads mcasts Steve and dumps them to stdout, I only get the mcast packets Steve delivered to one of the applications. Namely the first one Steve who joins the group seems to get the mcasts. I know for Steve UDP/IP multicast,

Re: [openib-general] IB mcast question

2006-08-14 Thread Roland Dreier
Sean My testing revealed the same issue, and I was unable to Sean locate the root cause of the problem. I was not able to Sean confirm that this configuration had ever been successfully Sean tested. Are you positive ibv_attach_mcast() is called on all the QPs, and that the MGID

Re: [openib-general] IB mcast question

2006-08-14 Thread Steve Wise
On Mon, 2006-08-14 at 12:43 -0700, Roland Dreier wrote: Sean My testing revealed the same issue, and I was unable to Sean locate the root cause of the problem. I was not able to Sean confirm that this configuration had ever been successfully Sean tested. Are you positive

Re: [openib-general] IB mcast question

2006-08-14 Thread Steve Wise
On Mon, 2006-08-14 at 12:31 -0700, Sean Hefty wrote: However, if I run 2 instances of the app that reads mcasts and dumps them to stdout, I only get the mcast packets delivered to one of the applications. Namely the first one who joins the group seems to get the mcasts. I know for UDP/IP

Re: [openib-general] IB mcast question

2006-08-14 Thread Sean Hefty
Roland Dreier wrote: Are you positive ibv_attach_mcast() is called on all the QPs, and that the MGID is passed correctly in to all calls? Yes - ibv_attach_mcast() is being called with the same MLID, MGID by both receiving processes. That doesn't necessarily mean that there's not a bug in

Re: [openib-general] IB mcast question

2006-08-14 Thread Steve Wise
On Mon, 2006-08-14 at 12:42 -0700, Roland Dreier wrote: Steve However, if I run 2 instances of the app that reads mcasts Steve and dumps them to stdout, I only get the mcast packets Steve delivered to one of the applications. Namely the first one Steve who joins the group

Re: [openib-general] IB mcast question

2006-08-14 Thread Hal Rosenstock
PM To: Roland Dreier Cc: openib-general Subject: Re: [openib-general] IB mcast question On Mon, 2006-08-14 at 12:42 -0700, Roland Dreier wrote: Steve However, if I run 2 instances of the app that reads mcasts Steve and dumps them to stdout, I only get the mcast packets Steve

Re: [openib-general] IB mcast question

2006-08-14 Thread Sean Hefty
Steve Wise wrote: So is this replicating done in the mthca hca? As just an FYI, I didn't see anything wrong in the mthca driver either when I was looking at this problem. Since one app is getting the mcast packet, can I assume the opensm code is doing the right thing switch/port wise?

Re: [openib-general] IB mcast question

2006-08-14 Thread Steve Wise
On Mon, 2006-08-14 at 13:33 -0700, Sean Hefty wrote: Steve Wise wrote: So is this replicating done in the mthca hca? As just an FYI, I didn't see anything wrong in the mthca driver either when I was looking at this problem. Ok. I added printks in the mcast attach/detach and they're

Re: [openib-general] IB mcast question

2006-08-14 Thread Steve Wise
Only the first join request should make it to the SA. The second join request is fulfilled by ib_multicast. This is what makes ib_multicast suspect. I'll look into this module... ib_multicast takes care of sending the join/leave info to the SA, right? It keeps track of _when_

Re: [openib-general] IB mcast question

2006-08-14 Thread Roland Dreier
Steve So is this replicating done in the mthca hca? Yes, it should be. There may be a bug in the mthca kernel multicast code for handling multiple QPs attached to the same group. Steve Since one app is getting the mcast packet, can I assume the Steve opensm code is doing the right

Re: [openib-general] IB mcast question

2006-08-14 Thread Sean Hefty
Steve Wise wrote: ib_multicast takes care of sending the join/leave info to the SA, right? It keeps track of _when_ to leave, for instance. So since opensm -is- getting the join and setting up the group, and the mcast packet is being passed to the first member who joined, then I don't think

Re: [openib-general] IB mcast question

2006-08-14 Thread Hal Rosenstock
of Roland Dreier Sent: Mon 8/14/2006 5:24 PM To: Steve Wise Cc: openib-general Subject: Re: [openib-general] IB mcast question Steve So is this replicating done in the mthca hca? Yes, it should be. There may be a bug in the mthca kernel multicast code for handling multiple QPs attached

Re: [openib-general] IB mcast question

2006-08-14 Thread Steve Wise
On Tue, 2006-08-15 at 00:38 +0300, Hal Rosenstock wrote: This is not the main issue (the lack of replication is) but I don't think a subsequent join from the same port does any harm but ib_multicast shouldn't be doing this. It would matter in terms of the leave though. The osm logs seem to

Re: [openib-general] IB mcast question

2006-08-14 Thread Steve Wise
I added some debug printks in mthca_multicast_attach(). Roland, does this look ok to you? It seems correct to me: # dmesg mthca_multicast_attach qp_num 406 gid ff124001:000a0a0a lid c003 mthca_multicast_attach line 167 - found mgm, hash a20, prev , index a20

Re: [openib-general] IB mcast question

2006-08-14 Thread Roland Dreier
I added some debug printks in mthca_multicast_attach(). Roland, does this look ok to you? It seems correct to me: # dmesg mthca_multicast_attach qp_num 406 gid ff124001:000a0a0a lid c003 mthca_multicast_attach line 167 - found mgm, hash a20, prev , index