Hi guys.
On Monday 14 August 2006 23:33, Sean Hefty wrote:
Steve Wise wrote:
So is this replicating done in the mthca hca?
As just an FYI, I didn't see anything wrong in the mthca driver either when I
was looking at this problem.
Since one app is getting the mcast packet, can I
can you send me this code?
I suspect the main difference is that I'm using librdmacm to join and
leave mcast groups.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please
Just throwing out ideas here:
Maybe something in the ib_sa_mcmember_rec is prohibiting replication on
the HCA? And maybe ib_multicast is incorrectly building this record...
struct ib_sa_mcmember_rec {
union ib_gid mgid;
union ib_gid port_gid;
__be32 qkey;
Steve Just throwing out ideas here: Maybe something in the
Steve ib_sa_mcmember_rec is prohibiting replication on the HCA?
Steve And maybe ib_multicast is incorrectly building this
Steve record...
Shouldn't make a difference -- if one copy of the packet arrives at
the HCA then
How about qp attributes?
pkeys?
qkeys?
On Tue, 2006-08-15 at 07:15 -0700, Roland Dreier wrote:
Steve Just throwing out ideas here: Maybe something in the
Steve ib_sa_mcmember_rec is prohibiting replication on the HCA?
Steve And maybe ib_multicast is incorrectly building this
Steve How about qp attributes? pkeys? qkeys?
Good question -- yes, the QPs will need be to set up with the right
keys for packets to appear. It's definitely something to check.
If different mcmembers are used for the first join of the group and
subsequent joins by another QP, that could
Steve How about qp attributes? pkeys? qkeys?
Good question -- yes, the QPs will need be to set up with the right
keys for packets to appear. It's definitely something to check.
The qkeys used by the RDMA CM sound like they may be the problem. I'll verify
this and see how to fix it if so.
The qkeys used by the RDMA CM sound like they may be the problem. I'll verify
this and see how to fix it if so.
If I set the qkeys for the QPs and MCMemberRecord to 0, I can get this to work
now. The RDMA CM uses a qkey = port number for UD QPs, and a qkey = IPv4
address for MCMemberRecords.
A
On Tue, 2006-08-15 at 09:58 -0700, Sean Hefty wrote:
The qkeys used by the RDMA CM sound like they may be the problem. I'll
verify
this and see how to fix it if so.
If I set the qkeys for the QPs and MCMemberRecord to 0, I can get this to work
now. The RDMA CM uses a qkey = port number
On Tue, 2006-08-15 at 12:58, Sean Hefty wrote:
The qkeys used by the RDMA CM sound like they may be the problem. I'll
verify
this and see how to fix it if so.
If I set the qkeys for the QPs and MCMemberRecord to 0, I can get this to work
now. The RDMA CM uses a qkey = port number for UD
On Tue, 2006-08-15 at 14:18, Sean Hefty wrote:
A potential fix I see for this is to use the same qkey for all UD QPs and
multicast groups created by the RDMA CM. Otherwise we restrict UD QPs to
using
a single destination (remote UD QP or multicast group.)
Doesn't the QKey need to be the
In my IP-centric mind, the sender specifies the ip mcast address and a
remote port. All hosts with subscribers to the ip mcast address get the
packet, and all sockets on those hosts who are bound to the dst_port
receive a copy. Other sockets on those hosts that joined the ipmcast
group but are
Is the IP address only used locally to construct the MGID ? What does
the MGID look like ? What signature does it use if any ?
The IP address may also used be used to lookup routing information in order to
bind to a local device. The address is then used locally construct the MGID.
The MGID
On Tue, 2006-08-15 at 14:33, Sean Hefty wrote:
Is the IP address only used locally to construct the MGID ? What does
the MGID look like ? What signature does it use if any ?
The IP address may also used be used to lookup routing information in order to
bind to a local device. The address is
One of the reserved bytes in the MGID is 1 rather than 0 and it's using
an IPv4 signature (0x401b) ?
It uses a signature of 0x4001 to avoid conflicts with ipoib groups.
Where does the qkey come from on the creation of the group ?
The qkey is the same as the IPv4 address.
I need to spend some
[adding back to list]
On Tue, 2006-08-15 at 11:59 -0700, Sean Hefty wrote:
For type SOCK_DGRAM (UDP), the socket will receive packets from multiple
subscribed ip mcast groups iff the dst_port of the incoming packet
matches the port to which the socket is bound...
This is what I was
Why are these separated? Isn't an address handle needed for each
destination QP? If so, then why is the remote qpn/qkey also needed to
transmit a datagram?
The address handle doesn't include QPN/QKey information. Maybe think of them
more as specifying the path to some port.
- Sean
On Tue, 2006-08-15 at 12:39 -0700, Sean Hefty wrote:
Why are these separated? Isn't an address handle needed for each
destination QP? If so, then why is the remote qpn/qkey also needed to
transmit a datagram?
The address handle doesn't include QPN/QKey information. Maybe think of them
Steve I was able to create a mcast group with the mc
Steve qkey==0xe00a0a0a, and 3 apps joined this group, but their
Steve qp qkeys were 0 (I changed ucma_init_ud_qp() to set the qp
Steve qkey to 0). One app sent to the mcgroup ah/qkey/qpn and
Steve the other two received the
On Tue, 2006-08-15 at 13:17 -0700, Roland Dreier wrote:
Steve I was able to create a mcast group with the mc
Steve qkey==0xe00a0a0a, and 3 apps joined this group, but their
Steve qp qkeys were 0 (I changed ucma_init_ud_qp() to set the qp
Steve qkey to 0). One app sent to the
On Tue, 2006-08-15 at 16:07, Steve Wise wrote:
On Tue, 2006-08-15 at 12:39 -0700, Sean Hefty wrote:
Why are these separated? Isn't an address handle needed for each
destination QP? If so, then why is the remote qpn/qkey also needed to
transmit a datagram?
The address handle doesn't
Steve I was able to create a mcast group with the mc
Steve qkey==0xe00a0a0a, and 3 apps joined this group, but their
Steve qp qkeys were 0 (I changed ucma_init_ud_qp() to set the qp
Steve qkey to 0). One app sent to the mcgroup ah/qkey/qpn and
Steve the other two received
However, if I run 2 instances of the app that reads mcasts and dumps
them to stdout, I only get the mcast packets delivered to one of the
applications. Namely the first one who joins the group seems to get the
mcasts. I know for UDP/IP multicast, all applications bound to the same
port and
Steve However, if I run 2 instances of the app that reads mcasts
Steve and dumps them to stdout, I only get the mcast packets
Steve delivered to one of the applications. Namely the first one
Steve who joins the group seems to get the mcasts. I know for
Steve UDP/IP multicast,
Sean My testing revealed the same issue, and I was unable to
Sean locate the root cause of the problem. I was not able to
Sean confirm that this configuration had ever been successfully
Sean tested.
Are you positive ibv_attach_mcast() is called on all the QPs, and that
the MGID
On Mon, 2006-08-14 at 12:43 -0700, Roland Dreier wrote:
Sean My testing revealed the same issue, and I was unable to
Sean locate the root cause of the problem. I was not able to
Sean confirm that this configuration had ever been successfully
Sean tested.
Are you positive
On Mon, 2006-08-14 at 12:31 -0700, Sean Hefty wrote:
However, if I run 2 instances of the app that reads mcasts and dumps
them to stdout, I only get the mcast packets delivered to one of the
applications. Namely the first one who joins the group seems to get the
mcasts. I know for UDP/IP
Roland Dreier wrote:
Are you positive ibv_attach_mcast() is called on all the QPs, and that
the MGID is passed correctly in to all calls?
Yes - ibv_attach_mcast() is being called with the same MLID, MGID by both
receiving processes. That doesn't necessarily mean that there's not a bug in
On Mon, 2006-08-14 at 12:42 -0700, Roland Dreier wrote:
Steve However, if I run 2 instances of the app that reads mcasts
Steve and dumps them to stdout, I only get the mcast packets
Steve delivered to one of the applications. Namely the first one
Steve who joins the group
PM
To: Roland Dreier
Cc: openib-general
Subject: Re: [openib-general] IB mcast question
On Mon, 2006-08-14 at 12:42 -0700, Roland Dreier wrote:
Steve However, if I run 2 instances of the app that reads mcasts
Steve and dumps them to stdout, I only get the mcast packets
Steve
Steve Wise wrote:
So is this replicating done in the mthca hca?
As just an FYI, I didn't see anything wrong in the mthca driver either when I
was looking at this problem.
Since one app is getting the mcast packet, can I assume the opensm code
is doing the right thing switch/port wise?
On Mon, 2006-08-14 at 13:33 -0700, Sean Hefty wrote:
Steve Wise wrote:
So is this replicating done in the mthca hca?
As just an FYI, I didn't see anything wrong in the mthca driver either when I
was looking at this problem.
Ok. I added printks in the mcast attach/detach and they're
Only the first join request should make it to the SA. The second join
request
is fulfilled by ib_multicast. This is what makes ib_multicast suspect.
I'll look into this module...
ib_multicast takes care of sending the join/leave info to the SA, right?
It keeps track of _when_
Steve So is this replicating done in the mthca hca?
Yes, it should be. There may be a bug in the mthca kernel multicast
code for handling multiple QPs attached to the same group.
Steve Since one app is getting the mcast packet, can I assume the
Steve opensm code is doing the right
Steve Wise wrote:
ib_multicast takes care of sending the join/leave info to the SA, right?
It keeps track of _when_ to leave, for instance. So since opensm -is-
getting the join and setting up the group, and the mcast packet is being
passed to the first member who joined, then I don't think
of Roland Dreier
Sent: Mon 8/14/2006 5:24 PM
To: Steve Wise
Cc: openib-general
Subject: Re: [openib-general] IB mcast question
Steve So is this replicating done in the mthca hca?
Yes, it should be. There may be a bug in the mthca kernel multicast
code for handling multiple QPs attached
On Tue, 2006-08-15 at 00:38 +0300, Hal Rosenstock wrote:
This is not the main issue (the lack of replication is) but I don't
think a subsequent join from the same port does any harm but
ib_multicast shouldn't be doing this. It would matter in terms of the
leave though.
The osm logs seem to
I added some debug printks in mthca_multicast_attach().
Roland, does this look ok to you? It seems correct to me:
# dmesg
mthca_multicast_attach qp_num 406 gid ff124001:000a0a0a lid c003
mthca_multicast_attach line 167 - found mgm, hash a20, prev , index a20
I added some debug printks in mthca_multicast_attach().
Roland, does this look ok to you? It seems correct to me:
# dmesg
mthca_multicast_attach qp_num 406 gid ff124001:000a0a0a lid
c003
mthca_multicast_attach line 167 - found mgm, hash a20, prev , index
39 matches
Mail list logo