Is this a correct summary of this thread?
* IPoIB uses an InfiniBand multicast group to fake ethernet broadcast
* This is optional, I'm not sure what functionality is lost without it
* MVAPICH uses a multicast group for some MPI collectives
* This can be turned off by setting env var DISABLE_
On Tue, 2006-02-21 at 12:08, Fabian Tillier wrote:
> On 21 Feb 2006 11:23:45 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > Hi Fab,
> >
> > On Tue, 2006-02-21 at 11:15, Fabian Tillier wrote:
> > > On 21 Feb 2006 09:42:10 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > > > On Tue, 2006-02
On 21 Feb 2006 11:23:45 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> Hi Fab,
>
> On Tue, 2006-02-21 at 11:15, Fabian Tillier wrote:
> > On 21 Feb 2006 09:42:10 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > > On Tue, 2006-02-21 at 01:10, Fabian Tillier wrote:
> > > > The lack of detaile
Hi Fab,
On Tue, 2006-02-21 at 11:15, Fabian Tillier wrote:
> On 21 Feb 2006 09:42:10 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > On Tue, 2006-02-21 at 01:10, Fabian Tillier wrote:
> > > The lack of detailed error reporting in SA queries could stand to be
> > > improved, and something as s
On 21 Feb 2006 09:42:10 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> On Tue, 2006-02-21 at 01:10, Fabian Tillier wrote:
> > The lack of detailed error reporting in SA queries could stand to be
> > improved, and something as simple as the SA returning a component mask
> > indicating which comp
On Mon, 2006-02-20 at 21:53, Fabian Tillier wrote:
> On 2/20/06, Roland Dreier <[EMAIL PROTECTED]> wrote:
> >Fabian> Second, if so, how is IPoIB supposed to interact with
> >Fabian> subnet managers that don't pre-create an empty broadcast
> >Fabian> group?
> >
> >Fabian> Shouldn't I
On Mon, 2006-02-20 at 21:14, Fabian Tillier wrote:
[snip...]
> Shouldn't IPoIB first do a GET for the broadcast group, and use those
> settings if it exist, otherwise create it?
That's one possible algorithm but not the only one.
-- Hal
> Thanks,
>
> - Fab
> __
Hi Fab,
On Tue, 2006-02-21 at 01:10, Fabian Tillier wrote:
> On 2/20/06, Roland Dreier <[EMAIL PROTECTED]> wrote:
> >Fabian> What is the behavior of SMs that pre-create the group in
> >Fabian> response to a GET query for the MC group parameters? Does
> >Fabian> the query return a reco
On 2/20/06, Roland Dreier <[EMAIL PROTECTED]> wrote:
>Fabian> What is the behavior of SMs that pre-create the group in
>Fabian> response to a GET query for the MC group parameters? Does
>Fabian> the query return a record, or does it fail with no
>Fabian> records?
>
> I guess it dep
Fabian> The only paramter that can be problematic is the QKey, but
Fabian> it's not a problem for it to just make one up, as long as
Fabian> it's a privileged one. All other parameters can be taken
Fabian> from the local port info.
Actually all of the extra parameters (Q_Key, SL, f
On 2/20/06, Roland Dreier <[EMAIL PROTECTED]> wrote:
>Fabian> Second, if so, how is IPoIB supposed to interact with
>Fabian> subnet managers that don't pre-create an empty broadcast
>Fabian> group?
>
>Fabian> Shouldn't IPoIB first do a GET for the broadcast group,
>Fabian> and u
Fabian> If I understand the code correctly, IPoIB depends on the
Fabian> broadcast MC group existing, as it only ever issues a MC
Fabian> join that does not create the group to the SA.
Fabian> First, is this correct?
Yes.
Fabian> Second, if so, how is IPoIB supposed to intera
If I understand the code correctly, IPoIB depends on the broadcast MC
group existing, as it only ever issues a MC join that does not create
the group to the SA.
First, is this correct?
Second, if so, how is IPoIB supposed to interact with subnet managers
that don't pre-create an empty broadcast g
Hi, Roland!
What's going on with ipoib patches in contrib/mellanox?
There are still 9 patches outstanding, most of them are really simple
and should be safe bet even for 2.6.16.
There's also mthca_cosmetic_icm_page_size.patch there which looks
like a safe one.
Other patches might be good candidat
Hal Rosenstock wrote:
Hi Eitan,
On Mon, 2006-02-13 at 10:23, Eitan Zahavi wrote:
Hi,
I had a long discussion today with Michael, Yael and Tziporet regarding
this issue.
We have got to the following conclusions/proposal:
1. As we use only GID[0] (that can not change) and a QP that is reserved
Hi Eitan,
On Mon, 2006-02-13 at 10:23, Eitan Zahavi wrote:
> Hi,
>
> I had a long discussion today with Michael, Yael and Tziporet regarding
> this issue.
> We have got to the following conclusions/proposal:
>
> 1. As we use only GID[0] (that can not change) and a QP that is reserved
> for the i
Hi,
I had a long discussion today with Michael, Yael and Tziporet regarding
this issue.
We have got to the following conclusions/proposal:
1. As we use only GID[0] (that can not change) and a QP that is reserved
for the interface even if it is down we actually "never" change IPoIB
MAC (unless you
Eitan Zahavi wrote:
Hi
The issue with IPoIB address change is not just LID change but also QP
change.
(IPoIB define the MAC to be QP,GID) .
Anytime you do ifconfig down/up you might get a new QP and thus you need
to refresh the ARP...
I second Mike K. and propose we use gratuitous ARP reply w
On Sun, 2006-02-12 at 08:25, Eitan Zahavi wrote:
> Hi
>
> The issue with IPoIB address change is not just LID change but also QP
> change.
> (IPoIB define the MAC to be QP,GID) .
>
> Anytime you do ifconfig down/up you might get a new QP and thus you need
> to refresh the ARP...
>
> I second Mi
Hi
The issue with IPoIB address change is not just LID change but also QP
change.
(IPoIB define the MAC to be QP,GID) .
Anytime you do ifconfig down/up you might get a new QP and thus you need
to refresh the ARP...
I second Mike K. and propose we use gratuitous ARP reply whenever an
IPoIB inter
At 09:43 AM 2/10/2006, Grant Grundler wrote:
On Fri, Feb 10, 2006 at
11:05:34AM -0500, Hal Rosenstock wrote:
> > Hi, Roland!
> > One issue we have with IPoIB is that IPoIB may cache a remote
node path
> > for a long time. Remote LID may get changed e.g. if the SM is
changed,
> > and IPoIB might l
On Fri, 2006-02-10 at 12:43, Grant Grundler wrote:
> On Fri, Feb 10, 2006 at 11:05:34AM -0500, Hal Rosenstock wrote:
> > > Hi, Roland!
> > > One issue we have with IPoIB is that IPoIB may cache a remote node path
> > > for a long time. Remote LID may get changed e.g. if the SM is changed,
> > > and
On Fri, Feb 10, 2006 at 11:05:34AM -0500, Hal Rosenstock wrote:
> > Hi, Roland!
> > One issue we have with IPoIB is that IPoIB may cache a remote node path
> > for a long time. Remote LID may get changed e.g. if the SM is changed,
> > and IPoIB might lose connectivity.
I wonder if this is why when
On Wed, 2006-02-08 at 15:14, Michael S. Tsirkin wrote:
> Hi, Roland!
> One issue we have with IPoIB is that IPoIB may cache a remote node path for a
> long time. Remote LID may get changed e.g. if the SM is changed, and IPoIB
> might
> lose connectivity.
The remote LID may get changed for other r
Hi, Roland!
One issue we have with IPoIB is that IPoIB may cache a remote node path for a
long time. Remote LID may get changed e.g. if the SM is changed, and IPoIB might
lose connectivity.
One simple way to address this would be to have a list of all
address handles per net device and kill them o
With latest OpenIB svn on an i386, when shutting down the machine with
IPoIB, I got the following on the console:
BUG: spinlock lockup on CPU #0, ipoib/6181, cefeca80
The traceback showed:
__ipoib_reap_ah+0x24/0xdb
ipoib_reap_ah+0xb
This was only the last message. The others scrolled off the s
I just went over the patches again in detail.
Here's the list of patches from
https://openib.org/svn/trunk/contrib/mellanox/patches
Quoting Michael S. Tsirkin <[EMAIL PROTECTED]>:
> Fixes for oopses that we saw in testing:
> ipoib_up_flag_race.patch
ipoib_up_flag_race.patch is removed.
It is rep
Roland, where exactly does the following math come from?
static inline struct ipoib_neigh **to_ipoib_neigh(struct neighbour *neigh)
{
return (struct ipoib_neigh **) (neigh->ha + 24 -
(offsetof(struct neighbour, ha) & 4));
}
1. What does & 4 do here?
Roland, from some ipoib oopses that I see, it seems,
that ipoib_mcast_join_finish is running when
priv->dev->broadcast is NULL.
Any idea how could that be the case?
--
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mail
From: [EMAIL PROTECTED] on behalf of Sean Hubbell
Sent: Wed 11/16/2005 9:14 AM
To: openib-general@openib.org
Subject: [openib-general] IPoIB
Hello,
I ran across something that continues to puzzle me. We upgraded to the
latest infiniband source code tree as of
From: [EMAIL PROTECTED] on behalf of Sean Hubbell
Sent: Wed 11/16/2005 9:14 AM
To: openib-general@openib.org
Subject: [openib-general] IPoIB
Hello,
I ran across something that continues to puzzle me. We upgraded to the
latest infiniband source code tree as of yesterday and I
Hello,
I ran across something that continues to puzzle me. We upgraded to the
latest infiniband source code tree as of yesterday and I tried to run my
program that has been working for months using the new infiniband
modules. Here is what I am seeing:
1) I can ping and ibping the head node
Hello, Roland!
I am still seeing IPoIB oopsing about once a week around
ipoib_mcast_join_complete (oops below).
While looking at it, a question occured to me:
what protects the following code in ipoib_mcast_stop_thread
list_for_each_entry(mcast, &priv->multicast_list, list) {
Hello, Roland!
While debugging a (gen1) problem with IPoIB,
I have noticed the following code in function neigh_update:
net/core/neighbour.c:1015
if (lladdr != neigh->ha) {
memcpy(&neigh->ha, lladdr, dev->addr_len);
neigh_update_hhs(neigh);
Hi!
I saw this in /var/log/messages recently.
Unfortunately I cant say exactly what I did to trigger this problem.
Roland, its the same thing we were seeing a couple of months ago that went
unresolved, isnt it?
Unable to handle kernel NULL pointer dereference at 0488 RIP:
{:ib_ipoib:ip
On Thursday 29 September 2005 23:44, Woodruff, Robert J wrote:
> I would try 2 nodes point to point. If that works, then
> I suspect the switch. I did see an issue with one of our MT2400 switches
> with IPoIB connectivity. We replaced the switch and it
> seemed to fix the problem, so we did not inv
On Thursday 29 September 2005 23:44, you wrote:
> I would try 2 nodes point to point. If that works, then
> I suspect the switch. I did see an issue with one of our MT2400 switches
> with IPoIB connectivity. We replaced the switch and it
> seemed to fix the problem, so we did not investigate furthe
Hal wrote,
>> > Also, what is your HCA firmware version ?
>>
>> $ cat /sys/class/infiniband/mthca0/fw_ver
>> 3.3.3
>That's the most recent.
>-- Hal
I would try 2 nodes point to point. If that works, then
I suspect the switch. I did see an issue with one of our MT2400 switches
with IPoIB connec
On Thu, 2005-09-29 at 17:01, Thomas Moschny wrote:
> On Thursday 29 September 2005 22:08, you wrote:
> > On Thu, 2005-09-29 at 16:01, Thomas Moschny wrote:
> > > Maybe a switch firmware problem? We once observed a complete switch
> > > lockup that shut down all communication.
> >
> > Could be. Do y
On Thursday 29 September 2005 22:08, you wrote:
> On Thu, 2005-09-29 at 16:01, Thomas Moschny wrote:
> > Maybe a switch firmware problem? We once observed a complete switch
> > lockup that shut down all communication.
>
> Could be. Do you know what rev of firmware you are running ? Is it 0.7.0
> ?
>Also, what is your HCA firmware version ?
>-- Hal
Good point. I have seen IPoIB connectivity issues in the past
when dealing with down rev FW.
I just re-tested IPoIB on my IPF machines and they seem to
work OK for me. I suspect either the HCA FW rev or the switch.
[EMAIL PROTECTED] SPECS]# ca
Thomas wrote,
>Yes, it's a single MTS-2400 with 24 ports.
>Maybe a switch firmware problem? We once observed a complete switch
lockup
>that shut down all communication.
If you suspect a bad switch, do you have another one you could try ?
or you can try to direct connect a couple of nodes.
woo
On Thu, 2005-09-29 at 16:01, Thomas Moschny wrote:
> Maybe a switch firmware problem? We once observed a complete switch lockup
> that shut down all communication.
Could be. Do you know what rev of firmware you are running ? Is it 0.7.0
? (MTS-2400 is Anafa-2 based).
Also, what is your HCA firmw
On Thursday 29 September 2005 21:25, Hal Rosenstock wrote:
> In the log, I do see several nodes successfully join the IPoIB broadcast
> group and the multicast tree for this got setup (I didn't actually
> validate the tree itself).
>
> PortGid.0xfe80 : 0x0002c9021575
On Thu, 2005-09-29 at 15:11, Thomas Moschny wrote:
> On Thursday 29 September 2005 20:32, you wrote:
> > Can you ping the subnet broadcast address (e.g. ping -b 192.168.0.255 if
> > the ib0 is 192.168.0.x) ?
>
> The only answer I get is from the sender itself:
>
> $ ping -b 192.168.204.255
> WAR
On Thu, 2005-09-29 at 14:00, Thomas Moschny wrote:
> Hi,
>
> Do I have to do something special in order to configure IPoverIB besides
> from loading the ib_ipoib kernel module (and it's dependencies), and calling
> ifconfig ib0 up?
No, that should be sufficient.
> On our machines, the module
Hi,
Do I have to do something special in order to configure IPoverIB besides
from loading the ib_ipoib kernel module (and it's dependencies), and calling
ifconfig ib0 up?
On our machines, the modules load fine, opensm runs, ports are in active
state, no error messages from ifconfig. However,
On Tue, 2005-09-27 at 04:11, Abhijit Gadgil wrote:
> Hi All,
>
> I am new to IPoIB. I have a query, as per the IPoIB Architecture
> document, whenever an IPoIB interface is brought up, it needs to do a
> Full Member Join to the "broadcast" Multicast group. Where exactly in
> the code, is this taki
Title: RE: [openib-general] IPoIB question
> Further, I am putting SM in testability 'debug' mode (DEBUG=10 in
> /etc/opensm.conf), however I am still not seeing any dump of messages
> about FullMember join whenever I try restarting the IB interfaces. What
> should be
Hi All,
I am new to IPoIB. I have a query, as per the IPoIB Architecture
document, whenever an IPoIB interface is brought up, it needs to do a
Full Member Join to the "broadcast" Multicast group. Where exactly in
the code, is this taking place? I have been able to trace a little bit -
eg. in ipoib
On Tue, 2005-09-20 at 07:31, Ali Ayoub wrote:
> Hi all,
> How can I retrieve the MAC address for a specific IPoIB interface?
ip addr show dev ib0
19: ib0: mtu 2044 qdisc pfifo_fast qlen 128
link/[32]
00:0e:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:96:05:59 brd
00:ff:ff:ff:ff:12:40:1b:ff:f
Title: IPoIB interface MAC
Hi all,
How can I retrieve the MAC address for a specific IPoIB interface? Using ifconfig doesn't produce a good results, here is ifconfig output for machines with GEN2:
SUSE 9. 3, 2.6.13
ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-
Hi Roland,
The following is what I am seeing:
SM brings the subnet up.
IPoIB does its multicast registration.
That all works fine.
Sometime later, the SM does a SM Set of PortInfo which causes IPoIB to
first deregister all its multicasts and then register them.
What I see is the following:
If t
On Tue, 2005-09-13 at 11:54, Jack Morgenstein wrote:
> I noticed that at startup, IPoIB attempts a send-only join to the MGID
> ff12:401b::0:0:0:0:16 (equivalent to IP 224.0.0.22 -- the IGMP
> multicast group -- see
> http://www.iana.org/assignments/multicast-addresses).
>
> 1. Why is this a s
Title: ipoib send-only join to IGMP multicast group
I noticed that at startup, IPoIB attempts a send-only join to the MGID ff12:401b::0:0:0:0:16 (equivalent to IP 224.0.0.22 -- the IGMP multicast group -- see http://www.iana.org/assignments/multicast-addresses).
1. Why is this a send-only
Hi Sean,
Here's my (somewhat long winded) analysis of your osm.log:
First I see:
Sep 02 13:46:34 [AB43F140] -> osm_vendor_bind: Unable to register class 129
version 1.
Sep 02 13:46:34 [AB43F140] -> osm_vendor_bind: ]
Sep 02 13:46:34 [AB43F140] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific
Hi, Roland!
The following crash was triggered by ifconfig down.
The crash site is at db7:
drivers/infiniband/ulp/ipoib/ipoib_multicast.c:225
db3: 49 8b 45 70 mov0x70(%r13),%rax
include/linux/byteorder/swab.h:147
db7: 8b 40 20mov0x20(%rax),
Hi, Roland!
I have seen the following oops recently, typically after
restarting opensm on the same machine. This is on ipoib rev 3113
Pls note I'm running with my two event patches.
The oops seems to be around offset db7 below:
drivers/infiniband/ulp/ipoib/ipoib_multicast.c:223
da4: 49
Attached is an udpated draft (will be posting to internet drafts after the
current ietf ends) for ipoib-connected mode based on the discussions on
ipoib wg, openib (IB on Linux), and other communications. Two threads
that saw good discussion are given below. I believe the attached updated
draft ca
[IPoIB] Add support for MTU module parameter. This is so there can be a
non default MTU at boot up if the administrator so desires (prior to
being able to invoke ifconfig).
Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>
Index: ipoib_main.c
==
On 6/10/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
>Hal> dev/core.c netdev_wait_allrefs says: * Any protocol or device
>Hal> that holds a reference should register * for netdevice
>Hal> notification, and cleanup and put back the * reference if
>Hal> they receive an UNREGISTER even
Hal> dev/core.c netdev_wait_allrefs says: * Any protocol or device
Hal> that holds a reference should register * for netdevice
Hal> notification, and cleanup and put back the * reference if
Hal> they receive an UNREGISTER event.
Hal> Is it correct that IPoIB does not need to re
Hi Roland and Troy,
Last week, Troy reported the following:
On Fri, 2005-06-03 at 16:33, Troy Benjegerdes wrote:
> > > Also, I have two machines in a state right now where they are
> > > printing out:
> > >
> > > kernel: unregister_netdevice: waiting for ib0 to become free.
> > > Usage count =
Hi Roland,
I have a question about ipoib_main.c::ipoib_change_mtu:
static int ipoib_change_mtu(struct net_device *dev, int new_mtu)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN)
return -EINVAL;
Shouldn't the
Tom> Should it be the case that bringing down ib0 should kill off
Tom> the other pkey devices:
Looks like a bug -- I'll take a look.
- R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-genera
Should it be the case that bringing down ib0 should kill off the other
pkey devices:
[EMAIL PROTECTED] ~]# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
10.6.98.0 0.0.0.0 255.255.255.0 U 0 0 0 eth
On Mon, Apr 04, 2005 at 06:48:19PM -0400, Hal Rosenstock wrote:
> Do you mean IB or IP bridge/router ? IB bridges are switches. IB routers
> forward at the IB network layer and are not completely specified. I
> suspect you mean an IP router with one or more IPoIB interfaces.
Yes, I was thinking IP
On Mon, 2005-04-04 at 18:35, Grant Grundler wrote:
> On Mon, Apr 04, 2005 at 06:08:03PM -0400, Hal Rosenstock wrote:
> > A while ago, Tom brought up the issue of IPoIB link level broadcasting
> > from user space (with the arping tool). Is it possible to do this from
> > kernel space?
>
> I would t
On Mon, Apr 04, 2005 at 06:08:03PM -0400, Hal Rosenstock wrote:
> A while ago, Tom brought up the issue of IPoIB link level broadcasting
> from user space (with the arping tool). Is it possible to do this from
> kernel space?
I would think any driver can call hard_xmit() for any "NIC".
pktgen.c do
A while ago, Tom brought up the issue of IPoIB link level broadcasting
from user space (with the arping tool). Is it possible to do this from
kernel space ? For example, how would/could sendto() work when sending
to a IPoIB link layer address ? If all we wanted to support was
broadcast, perhaps the
On Wed, Mar 23, 2005 at 04:36:37PM -0800, Bob Woodruff wrote:
> I think on these nodes I have some very old PCI-X HCAs (A0 silicon)
> that I cannot even upgrade to the newest firmware.
Ok - good to know. AFAICT, I only have rev A1 silicon.
> I have also seen a switch get into a weird state from t
Grant wrote>
>Yup. The switch was hosed and cycling power got it back to life again:
>[EMAIL PROTECTED]:~$ cat /sys/class/infiniband/mthca0/ports/*/state
>4: ACTIVE
>1: DOWN
>[EMAIL PROTECTED]:~$ cat /sys/class/infiniband/mthca0/ports/*/state
>4: ACTIVE
>4: ACTIVE
>Of course, ping works too.
On Wed, Mar 23, 2005 at 12:33:05PM -0800, Roland Dreier wrote:
> Grant> *nod*. I try the above first...then cycle power and see if
> Grant> it comes back to life. The switch has been on since
> Grant> December or so.
>
> If you have a serial console or ethernet configured for the switch
Grant> *nod*. I try the above first...then cycle power and see if
Grant> it comes back to life. The switch has been on since
Grant> December or so.
If you have a serial console or ethernet configured for the switch,
you can check if it still looks happy as well. It wouldn't really
sur
On Wed, Mar 23, 2005 at 12:16:08PM -0800, Roland Dreier wrote:
> It looks like the driver is working but the SM isn't bringing the
> ports to the active state. The problem could still be on the host or
> the switch unfortunately. What do you see in the files
>
> /sys/class/infiniband/mthca0/
On Wed, Mar 23, 2005 at 03:16:43PM -0500, Hal Rosenstock wrote:
> Hi Grant,
> > iowa:/usr/src/linux-2.6# cat /sys/class/infiniband/mthca0/ports/*/state
> > 1: DOWN
> > 2: INIT
>
> Looks like port 2 is plugged in. It needs to get to ACTIVE before IPoIB
> will work. Is the SM enabled in the TS switc
Hi Grant,
On Wed, 2005-03-23 at 15:08, Grant Grundler wrote:
> Hi,
> I wanted to run netpipe and basics aren't working. I haven't
> tried the SVN tree in over a month. It could have been broken
> for ia64 for a while. Sorry for lagging on that...
>
> I'm running 2.6.11 kernel with TOB svn bits an
> iowa:/usr/src/linux-2.6# cat /sys/class/infiniband/mthca0/ports/*/state
> 1: DOWN
> 2: INIT
It looks like the driver is working but the SM isn't bringing the
ports to the active state. The problem could still be on the host or
the switch unfortunately. What do you see in the files
Hi,
I wanted to run netpipe and basics aren't working. I haven't
tried the SVN tree in over a month. It could have been broken
for ia64 for a while. Sorry for lagging on that...
I'm running 2.6.11 kernel with TOB svn bits and building
the IB modules "in tree". Just replaced the drivers/infiniband
Here is use of pfmon 3.1 to sample address events.
In short, nothing here jumps out at me and screams
for a big opportunity to optimize. But I don't fully
understand the data and instruction flow either.
Maybe someone else sees more opportunity.
e.g. I'm wondering if netfilter is a significant
pe
Hal Rosenstock wrote:
Why key off bit in wc.wr_id rather than use wc.opcode to determine where
the completed operation was a receive or transmit ?
The opcode isn't set if status != success.
Also, when wc.status != success, wr_id cannot be trusted but is still
The wr_id is always valid, regardless o
Hi Roland,
I have a couple of questions on the IPoIB completion handler:
Why key off bit in wc.wr_id rather than use wc.opcode to determine where
the completed operation was a receive or transmit ?
Also, when wc.status != success, wr_id cannot be trusted but is still
used to determine operation
On Tue, 2005-01-04 at 11:18 -0800, Roland Dreier wrote:
> Josh> I'm getting great numbers from IPoIB...right up until it
> Josh> dies. The system is x86_64 and PCIe with latest stuff under
> Josh> 2.6.10. Streaming tests with both netperf and NetPIPE die
> Josh> almost instantly (
Josh> I'm getting great numbers from IPoIB...right up until it
Josh> dies. The system is x86_64 and PCIe with latest stuff under
Josh> 2.6.10. Streaming tests with both netperf and NetPIPE die
Josh> almost instantly (NetPIPE actually lasts for a couple
Josh> iterations). Ping
I'm getting great numbers from IPoIB...right up until it dies.
The system is x86_64 and PCIe with latest stuff under 2.6.10. Streaming
tests with both netperf and NetPIPE die almost instantly (NetPIPE
actually lasts for a couple iterations). Ping-pong tests with NetPIPE
seem to consistently die
Ido> 1. We can divide the single CQ into two separate completion
Ido> queues: one for the RQ and the other for SQ. Then we can
Ido> change the CQ policy affiliated with the SQ into
Ido> IB_CQ_CONSUMER_REARM and in mainstream not arm the CQ. In
Ido> such case the poll_cq_tq wil
Hello
I have been investigating the performance of the IPoIB for a while and I
have discovered 3 interesting points
which I think are worth implementing in gen2.
1. We can divide the single CQ into two separate completion queues: one for
the RQ and the other for SQ.
Then we can change the CQ p
>Are you at 4.6.1 now on all your PCIe HCAs ? Not sure whether 4.5.3
does indeed work
>but it sounds like something has changed for the worse since Roland had
it working.
>Thanks.
>-- Hal
I had problems with the 4.3.5 firmware and it seemed to work with my
early
version of the 4.6.0-rc4 f
Hi Josh,
[You wrote:]
I've tried with the latest from
SVN as well as the exact same kernel/rootFS you were using, and I still
can't ping between nodes. I'm just doing 'modprobe ib_mthca; modprobe
ib_ipoib; ifconfig ...; ping ...' on the same two nodes. Am I missing
something?
Anything in /var/
Josh> Roland, Alright...I'm not sure whats going on. I just now
Josh> got a chance to test this out and it still isn't working.
Josh> I've tried with the latest from SVN as well as the exact
Josh> same kernel/rootFS you were using, and I still can't ping
Josh> between nodes. I
Roland,
Alright...I'm not sure whats going on. I just now got a chance to test
this out and it still isn't working. I've tried with the latest from
SVN as well as the exact same kernel/rootFS you were using, and I still
can't ping between nodes. I'm just doing 'modprobe ib_mthca; modprobe
ib_ip
Hi Roland,
It looks to me like after obtaining the PathRecord, the static rate is
not used when the AV is created. Shouldn't it be ? Is there an issue
with doing this ? There is a similar issue with the multicast AVs as
well. I know there is an assumption that everything is 4x but I am not
sure th
I've looked at the remote side to understand what it was (or wasn't
doing). The partial connectivity stems from an issue in resolving the
path on the remote side.
I have a proposal:
Rather than a single SA Get(PathRecord) with a 1 second timeout, what
about a retry or two with a smaller (0.33 - 0.
>Are you running the latest code from svn? I fixed a bug this morning
>that would cause problems with more than 2 nodes.
>Thanks,
> Roland
With the 1348 version I just downloaded, I can now ping
from all nodes to all other nodes. I will not try to install and run
some
MPI tests and/or other
TED]
Subject: Re: [openib-general] IPoIB oops on path record completion
Robert> I also seem to be having some partial connectivity
Robert> problems. The first 2 nodes seem to be able to
Robert> communicate, but adding the 3rd and 4th nodes, they cannot
Robert> pi
Robert> I also seem to be having some partial connectivity
Robert> problems. The first 2 nodes seem to be able to
Robert> communicate, but adding the 3rd and 4th nodes, they cannot
Robert> ping the first 2.
Are you running the latest code from svn? I fixed a bug this morning
that
On Thu, 2004-12-16 at 12:35, Roland Dreier wrote:
> Are you running the latest code from svn? I fixed a bug this morning
> that would cause problems with more than 2 nodes.
I am.
-- Hal
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.o
>Still have the partial connectivity problem. I can see the ARP going
out
>on the broadcast group followed by ARPs coming oin on the broadcast
>group followed by the PathRecord requests/responses with the SA
followed
>by the unicast ARP and ICMP. After the unicast ARP to one of the
nodes,
>it is
Title: RE: [openib-general] IPoIB still not working
They're on 4.5.3.
-JE
-Original Message-
From: Hal Rosenstock [mailto:[EMAIL PROTECTED]]
Sent: Thu 12/16/2004 8:54 AM
To: England, Joshua J
Cc: Roland Dreier; Robert J Woodruff; [EMAIL PROTECTED]
Subject: RE: [openib-ge
On Wed, 2004-12-15 at 13:22, England, Joshua J wrote:
> I'll definitely pound on the stuff and let you know if anything
> breaks.
You are using the 4.3.5 firmware, right ? I want to put the proper info
into the IPoIB FAQ. Thanks.
-- Hal
___
openib-gene
201 - 300 of 408 matches
Mail list logo