Re: [openib-general] IPoIB broadcast MC group membership

2006-02-21 Thread Greg Lindahl
Is this a correct summary of this thread? * IPoIB uses an InfiniBand multicast group to fake ethernet broadcast * This is optional, I'm not sure what functionality is lost without it * MVAPICH uses a multicast group for some MPI collectives * This can be turned off by setting env var DISABLE_

Re: [openib-general] IPoIB broadcast MC group membership

2006-02-21 Thread Hal Rosenstock
On Tue, 2006-02-21 at 12:08, Fabian Tillier wrote: > On 21 Feb 2006 11:23:45 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > Hi Fab, > > > > On Tue, 2006-02-21 at 11:15, Fabian Tillier wrote: > > > On 21 Feb 2006 09:42:10 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > > > On Tue, 2006-02

Re: [openib-general] IPoIB broadcast MC group membership

2006-02-21 Thread Fabian Tillier
On 21 Feb 2006 11:23:45 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > Hi Fab, > > On Tue, 2006-02-21 at 11:15, Fabian Tillier wrote: > > On 21 Feb 2006 09:42:10 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > > On Tue, 2006-02-21 at 01:10, Fabian Tillier wrote: > > > > The lack of detaile

Re: [openib-general] IPoIB broadcast MC group membership

2006-02-21 Thread Hal Rosenstock
Hi Fab, On Tue, 2006-02-21 at 11:15, Fabian Tillier wrote: > On 21 Feb 2006 09:42:10 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > > On Tue, 2006-02-21 at 01:10, Fabian Tillier wrote: > > > The lack of detailed error reporting in SA queries could stand to be > > > improved, and something as s

Re: [openib-general] IPoIB broadcast MC group membership

2006-02-21 Thread Fabian Tillier
On 21 Feb 2006 09:42:10 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote: > On Tue, 2006-02-21 at 01:10, Fabian Tillier wrote: > > The lack of detailed error reporting in SA queries could stand to be > > improved, and something as simple as the SA returning a component mask > > indicating which comp

Re: [openib-general] IPoIB broadcast MC group membership

2006-02-21 Thread Hal Rosenstock
On Mon, 2006-02-20 at 21:53, Fabian Tillier wrote: > On 2/20/06, Roland Dreier <[EMAIL PROTECTED]> wrote: > >Fabian> Second, if so, how is IPoIB supposed to interact with > >Fabian> subnet managers that don't pre-create an empty broadcast > >Fabian> group? > > > >Fabian> Shouldn't I

Re: [openib-general] IPoIB broadcast MC group membership

2006-02-21 Thread Hal Rosenstock
On Mon, 2006-02-20 at 21:14, Fabian Tillier wrote: [snip...] > Shouldn't IPoIB first do a GET for the broadcast group, and use those > settings if it exist, otherwise create it? That's one possible algorithm but not the only one. -- Hal > Thanks, > > - Fab > __

Re: [openib-general] IPoIB broadcast MC group membership

2006-02-21 Thread Hal Rosenstock
Hi Fab, On Tue, 2006-02-21 at 01:10, Fabian Tillier wrote: > On 2/20/06, Roland Dreier <[EMAIL PROTECTED]> wrote: > >Fabian> What is the behavior of SMs that pre-create the group in > >Fabian> response to a GET query for the MC group parameters? Does > >Fabian> the query return a reco

Re: [openib-general] IPoIB broadcast MC group membership

2006-02-20 Thread Fabian Tillier
On 2/20/06, Roland Dreier <[EMAIL PROTECTED]> wrote: >Fabian> What is the behavior of SMs that pre-create the group in >Fabian> response to a GET query for the MC group parameters? Does >Fabian> the query return a record, or does it fail with no >Fabian> records? > > I guess it dep

Re: [openib-general] IPoIB broadcast MC group membership

2006-02-20 Thread Roland Dreier
Fabian> The only paramter that can be problematic is the QKey, but Fabian> it's not a problem for it to just make one up, as long as Fabian> it's a privileged one. All other parameters can be taken Fabian> from the local port info. Actually all of the extra parameters (Q_Key, SL, f

Re: [openib-general] IPoIB broadcast MC group membership

2006-02-20 Thread Fabian Tillier
On 2/20/06, Roland Dreier <[EMAIL PROTECTED]> wrote: >Fabian> Second, if so, how is IPoIB supposed to interact with >Fabian> subnet managers that don't pre-create an empty broadcast >Fabian> group? > >Fabian> Shouldn't IPoIB first do a GET for the broadcast group, >Fabian> and u

Re: [openib-general] IPoIB broadcast MC group membership

2006-02-20 Thread Roland Dreier
Fabian> If I understand the code correctly, IPoIB depends on the Fabian> broadcast MC group existing, as it only ever issues a MC Fabian> join that does not create the group to the SA. Fabian> First, is this correct? Yes. Fabian> Second, if so, how is IPoIB supposed to intera

[openib-general] IPoIB broadcast MC group membership

2006-02-20 Thread Fabian Tillier
If I understand the code correctly, IPoIB depends on the broadcast MC group existing, as it only ever issues a MC join that does not create the group to the SA. First, is this correct? Second, if so, how is IPoIB supposed to interact with subnet managers that don't pre-create an empty broadcast g

[openib-general] ipoib patches

2006-02-20 Thread Michael S. Tsirkin
Hi, Roland! What's going on with ipoib patches in contrib/mellanox? There are still 9 patches outstanding, most of them are really simple and should be safe bet even for 2.6.16. There's also mthca_cosmetic_icm_page_size.patch there which looks like a safe one. Other patches might be good candidat

Re: [openib-general] IPoIB and lid change

2006-02-13 Thread Eitan Zahavi
Hal Rosenstock wrote: Hi Eitan, On Mon, 2006-02-13 at 10:23, Eitan Zahavi wrote: Hi, I had a long discussion today with Michael, Yael and Tziporet regarding this issue. We have got to the following conclusions/proposal: 1. As we use only GID[0] (that can not change) and a QP that is reserved

RE: [openib-general] IPoIB and lid change

2006-02-13 Thread Hal Rosenstock
Hi Eitan, On Mon, 2006-02-13 at 10:23, Eitan Zahavi wrote: > Hi, > > I had a long discussion today with Michael, Yael and Tziporet regarding > this issue. > We have got to the following conclusions/proposal: > > 1. As we use only GID[0] (that can not change) and a QP that is reserved > for the i

RE: [openib-general] IPoIB and lid change

2006-02-13 Thread Eitan Zahavi
Hi, I had a long discussion today with Michael, Yael and Tziporet regarding this issue. We have got to the following conclusions/proposal: 1. As we use only GID[0] (that can not change) and a QP that is reserved for the interface even if it is down we actually "never" change IPoIB MAC (unless you

Re: [openib-general] IPoIB and lid change

2006-02-12 Thread Tziporet Koren
Eitan Zahavi wrote: Hi The issue with IPoIB address change is not just LID change but also QP change. (IPoIB define the MAC to be QP,GID) . Anytime you do ifconfig down/up you might get a new QP and thus you need to refresh the ARP... I second Mike K. and propose we use gratuitous ARP reply w

RE: [openib-general] IPoIB and lid change

2006-02-12 Thread Hal Rosenstock
On Sun, 2006-02-12 at 08:25, Eitan Zahavi wrote: > Hi > > The issue with IPoIB address change is not just LID change but also QP > change. > (IPoIB define the MAC to be QP,GID) . > > Anytime you do ifconfig down/up you might get a new QP and thus you need > to refresh the ARP... > > I second Mi

RE: [openib-general] IPoIB and lid change

2006-02-12 Thread Eitan Zahavi
Hi The issue with IPoIB address change is not just LID change but also QP change. (IPoIB define the MAC to be QP,GID) . Anytime you do ifconfig down/up you might get a new QP and thus you need to refresh the ARP... I second Mike K. and propose we use gratuitous ARP reply whenever an IPoIB inter

Re: [openib-general] IPoIB and lid change

2006-02-10 Thread Michael Krause
At 09:43 AM 2/10/2006, Grant Grundler wrote: On Fri, Feb 10, 2006 at 11:05:34AM -0500, Hal Rosenstock wrote: > > Hi, Roland! > > One issue we have with IPoIB is that IPoIB may cache a remote node path > > for a long time. Remote LID may get changed e.g. if the SM is changed, > > and IPoIB might l

Re: [openib-general] IPoIB and lid change

2006-02-10 Thread Hal Rosenstock
On Fri, 2006-02-10 at 12:43, Grant Grundler wrote: > On Fri, Feb 10, 2006 at 11:05:34AM -0500, Hal Rosenstock wrote: > > > Hi, Roland! > > > One issue we have with IPoIB is that IPoIB may cache a remote node path > > > for a long time. Remote LID may get changed e.g. if the SM is changed, > > > and

Re: [openib-general] IPoIB and lid change

2006-02-10 Thread Grant Grundler
On Fri, Feb 10, 2006 at 11:05:34AM -0500, Hal Rosenstock wrote: > > Hi, Roland! > > One issue we have with IPoIB is that IPoIB may cache a remote node path > > for a long time. Remote LID may get changed e.g. if the SM is changed, > > and IPoIB might lose connectivity. I wonder if this is why when

Re: [openib-general] IPoIB and lid change

2006-02-10 Thread Hal Rosenstock
On Wed, 2006-02-08 at 15:14, Michael S. Tsirkin wrote: > Hi, Roland! > One issue we have with IPoIB is that IPoIB may cache a remote node path for a > long time. Remote LID may get changed e.g. if the SM is changed, and IPoIB > might > lose connectivity. The remote LID may get changed for other r

[openib-general] IPoIB and lid change

2006-02-08 Thread Michael S. Tsirkin
Hi, Roland! One issue we have with IPoIB is that IPoIB may cache a remote node path for a long time. Remote LID may get changed e.g. if the SM is changed, and IPoIB might lose connectivity. One simple way to address this would be to have a list of all address handles per net device and kill them o

[openib-general] IPoIB BUG at shutdown with latest OpenIB svn

2006-01-15 Thread Hal Rosenstock
With latest OpenIB svn on an i386, when shutting down the machine with IPoIB, I got the following on the console: BUG: spinlock lockup on CPU #0, ipoib/6181, cefeca80 The traceback showed: __ipoib_reap_ah+0x24/0xdb ipoib_reap_ah+0xb This was only the last message. The others scrolled off the s

[openib-general] ipoib: outstanding patches

2006-01-11 Thread Michael S. Tsirkin
I just went over the patches again in detail. Here's the list of patches from https://openib.org/svn/trunk/contrib/mellanox/patches Quoting Michael S. Tsirkin <[EMAIL PROTECTED]>: > Fixes for oopses that we saw in testing: > ipoib_up_flag_race.patch ipoib_up_flag_race.patch is removed. It is rep

[openib-general] ipoib: question

2005-12-14 Thread Michael S. Tsirkin
Roland, where exactly does the following math come from? static inline struct ipoib_neigh **to_ipoib_neigh(struct neighbour *neigh) { return (struct ipoib_neigh **) (neigh->ha + 24 - (offsetof(struct neighbour, ha) & 4)); } 1. What does & 4 do here?

[openib-general] ipoib: ipoib_mcast_join_finish oops

2005-12-08 Thread Michael S. Tsirkin
Roland, from some ipoib oopses that I see, it seems, that ipoib_mcast_join_finish is running when priv->dev->broadcast is NULL. Any idea how could that be the case? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mail

Re: [openib-general] IPoIB

2005-11-16 Thread Sean Hubbell
From: [EMAIL PROTECTED] on behalf of Sean Hubbell Sent: Wed 11/16/2005 9:14 AM To: openib-general@openib.org Subject: [openib-general] IPoIB Hello, I ran across something that continues to puzzle me. We upgraded to the latest infiniband source code tree as of

RE: [openib-general] IPoIB

2005-11-16 Thread Hal Rosenstock
From: [EMAIL PROTECTED] on behalf of Sean Hubbell Sent: Wed 11/16/2005 9:14 AM To: openib-general@openib.org Subject: [openib-general] IPoIB Hello, I ran across something that continues to puzzle me. We upgraded to the latest infiniband source code tree as of yesterday and I

[openib-general] IPoIB

2005-11-16 Thread Sean Hubbell
Hello, I ran across something that continues to puzzle me. We upgraded to the latest infiniband source code tree as of yesterday and I tried to run my program that has been working for months using the new infiniband modules. Here is what I am seeing: 1) I can ping and ibping the head node

[openib-general] ipoib oops

2005-11-14 Thread Michael S. Tsirkin
Hello, Roland! I am still seeing IPoIB oopsing about once a week around ipoib_mcast_join_complete (oops below). While looking at it, a question occured to me: what protects the following code in ipoib_mcast_stop_thread list_for_each_entry(mcast, &priv->multicast_list, list) {

[openib-general] IPoIB question/problem

2005-11-07 Thread Michael S. Tsirkin
Hello, Roland! While debugging a (gen1) problem with IPoIB, I have noticed the following code in function neigh_update: net/core/neighbour.c:1015 if (lladdr != neigh->ha) { memcpy(&neigh->ha, lladdr, dev->addr_len); neigh_update_hhs(neigh);

[openib-general] ipoib oops

2005-11-07 Thread Michael S. Tsirkin
Hi! I saw this in /var/log/messages recently. Unfortunately I cant say exactly what I did to trigger this problem. Roland, its the same thing we were seeing a couple of months ago that went unresolved, isnt it? Unable to handle kernel NULL pointer dereference at 0488 RIP: {:ib_ipoib:ip

Re: [openib-general] IPoIB configuration

2005-09-30 Thread Thomas Moschny
On Thursday 29 September 2005 23:44, Woodruff, Robert J wrote: > I would try 2 nodes point to point. If that works, then > I suspect the switch. I did see an issue with one of our MT2400 switches > with IPoIB connectivity. We replaced the switch and it > seemed to fix the problem, so we did not inv

Re: [openib-general] IPoIB configuration

2005-09-29 Thread Thomas Moschny
On Thursday 29 September 2005 23:44, you wrote: > I would try 2 nodes point to point. If that works, then > I suspect the switch. I did see an issue with one of our MT2400 switches > with IPoIB connectivity. We replaced the switch and it > seemed to fix the problem, so we did not investigate furthe

RE: [openib-general] IPoIB configuration

2005-09-29 Thread Woodruff, Robert J
Hal wrote, >> > Also, what is your HCA firmware version ? >> >> $ cat /sys/class/infiniband/mthca0/fw_ver >> 3.3.3 >That's the most recent. >-- Hal I would try 2 nodes point to point. If that works, then I suspect the switch. I did see an issue with one of our MT2400 switches with IPoIB connec

Re: [openib-general] IPoIB configuration

2005-09-29 Thread Hal Rosenstock
On Thu, 2005-09-29 at 17:01, Thomas Moschny wrote: > On Thursday 29 September 2005 22:08, you wrote: > > On Thu, 2005-09-29 at 16:01, Thomas Moschny wrote: > > > Maybe a switch firmware problem? We once observed a complete switch > > > lockup that shut down all communication. > > > > Could be. Do y

Re: [openib-general] IPoIB configuration

2005-09-29 Thread Thomas Moschny
On Thursday 29 September 2005 22:08, you wrote: > On Thu, 2005-09-29 at 16:01, Thomas Moschny wrote: > > Maybe a switch firmware problem? We once observed a complete switch > > lockup that shut down all communication. > > Could be. Do you know what rev of firmware you are running ? Is it 0.7.0 > ?

RE: [openib-general] IPoIB configuration

2005-09-29 Thread Woodruff, Robert J
>Also, what is your HCA firmware version ? >-- Hal Good point. I have seen IPoIB connectivity issues in the past when dealing with down rev FW. I just re-tested IPoIB on my IPF machines and they seem to work OK for me. I suspect either the HCA FW rev or the switch. [EMAIL PROTECTED] SPECS]# ca

RE: [openib-general] IPoIB configuration

2005-09-29 Thread Woodruff, Robert J
Thomas wrote, >Yes, it's a single MTS-2400 with 24 ports. >Maybe a switch firmware problem? We once observed a complete switch lockup >that shut down all communication. If you suspect a bad switch, do you have another one you could try ? or you can try to direct connect a couple of nodes. woo

Re: [openib-general] IPoIB configuration

2005-09-29 Thread Hal Rosenstock
On Thu, 2005-09-29 at 16:01, Thomas Moschny wrote: > Maybe a switch firmware problem? We once observed a complete switch lockup > that shut down all communication. Could be. Do you know what rev of firmware you are running ? Is it 0.7.0 ? (MTS-2400 is Anafa-2 based). Also, what is your HCA firmw

Re: [openib-general] IPoIB configuration

2005-09-29 Thread Thomas Moschny
On Thursday 29 September 2005 21:25, Hal Rosenstock wrote: > In the log, I do see several nodes successfully join the IPoIB broadcast > group and the multicast tree for this got setup (I didn't actually > validate the tree itself). > > PortGid.0xfe80 : 0x0002c9021575

Re: [openib-general] IPoIB configuration

2005-09-29 Thread Hal Rosenstock
On Thu, 2005-09-29 at 15:11, Thomas Moschny wrote: > On Thursday 29 September 2005 20:32, you wrote: > > Can you ping the subnet broadcast address (e.g. ping -b 192.168.0.255 if > > the ib0 is 192.168.0.x) ? > > The only answer I get is from the sender itself: > > $ ping -b 192.168.204.255 > WAR

Re: [openib-general] IPoIB configuration

2005-09-29 Thread Hal Rosenstock
On Thu, 2005-09-29 at 14:00, Thomas Moschny wrote: > Hi, > > Do I have to do something special in order to configure IPoverIB besides > from loading the ib_ipoib kernel module (and it's dependencies), and calling > ifconfig ib0 up? No, that should be sufficient. > On our machines, the module

[openib-general] IPoIB configuration

2005-09-29 Thread Thomas Moschny
Hi, Do I have to do something special in order to configure IPoverIB besides from loading the ib_ipoib kernel module (and it's dependencies), and calling ifconfig ib0 up? On our machines, the modules load fine, opensm runs, ports are in active state, no error messages from ifconfig. However,

Re: [openib-general] IPoIB question

2005-09-27 Thread Hal Rosenstock
On Tue, 2005-09-27 at 04:11, Abhijit Gadgil wrote: > Hi All, > > I am new to IPoIB. I have a query, as per the IPoIB Architecture > document, whenever an IPoIB interface is brought up, it needs to do a > Full Member Join to the "broadcast" Multicast group. Where exactly in > the code, is this taki

RE: [openib-general] IPoIB question

2005-09-27 Thread Eitan Zahavi
Title: RE: [openib-general] IPoIB question > Further, I am putting SM in testability 'debug' mode (DEBUG=10 in > /etc/opensm.conf), however I am still not seeing any dump of messages > about FullMember join whenever I try restarting the IB interfaces. What > should be

[openib-general] IPoIB question

2005-09-27 Thread Abhijit Gadgil
Hi All, I am new to IPoIB. I have a query, as per the IPoIB Architecture document, whenever an IPoIB interface is brought up, it needs to do a Full Member Join to the "broadcast" Multicast group. Where exactly in the code, is this taking place? I have been able to trace a little bit - eg. in ipoib

Re: [openib-general] IPoIB interface MAC

2005-09-20 Thread Hal Rosenstock
On Tue, 2005-09-20 at 07:31, Ali Ayoub wrote: > Hi all, > How can I retrieve the MAC address for a specific IPoIB interface? ip addr show dev ib0 19: ib0: mtu 2044 qdisc pfifo_fast qlen 128 link/[32] 00:0e:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:96:05:59 brd 00:ff:ff:ff:ff:12:40:1b:ff:f

[openib-general] IPoIB interface MAC

2005-09-20 Thread Ali Ayoub
Title: IPoIB interface MAC Hi all, How can I retrieve the MAC address for a specific IPoIB interface? Using ifconfig doesn't produce a good results, here is ifconfig output for machines with GEN2: SUSE 9. 3,   2.6.13 ib0   Link encap:UNSPEC  HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-

[openib-general] IPoIB SA Multicast Reregistration

2005-09-16 Thread Hal Rosenstock
Hi Roland, The following is what I am seeing: SM brings the subnet up. IPoIB does its multicast registration. That all works fine. Sometime later, the SM does a SM Set of PortInfo which causes IPoIB to first deregister all its multicasts and then register them. What I see is the following: If t

Re: [openib-general] ipoib send-only join to IGMP multicast group

2005-09-13 Thread Hal Rosenstock
On Tue, 2005-09-13 at 11:54, Jack Morgenstein wrote: > I noticed that at startup, IPoIB attempts a send-only join to the MGID > ff12:401b::0:0:0:0:16 (equivalent to IP 224.0.0.22 -- the IGMP > multicast group -- see > http://www.iana.org/assignments/multicast-addresses). > > 1. Why is this a s

[openib-general] ipoib send-only join to IGMP multicast group

2005-09-13 Thread Jack Morgenstein
Title: ipoib send-only join to IGMP multicast group I noticed that at startup, IPoIB attempts a send-only join to the MGID ff12:401b::0:0:0:0:16 (equivalent to IP 224.0.0.22 -- the IGMP multicast group -- see http://www.iana.org/assignments/multicast-addresses). 1. Why is this a send-only

[openib-general] IPoIB Multicast Connectivity

2005-09-02 Thread Hal Rosenstock
Hi Sean, Here's my (somewhat long winded) analysis of your osm.log: First I see: Sep 02 13:46:34 [AB43F140] -> osm_vendor_bind: Unable to register class 129 version 1. Sep 02 13:46:34 [AB43F140] -> osm_vendor_bind: ] Sep 02 13:46:34 [AB43F140] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific

[openib-general] ipoib oops (again)

2005-08-31 Thread Michael S. Tsirkin
Hi, Roland! The following crash was triggered by ifconfig down. The crash site is at db7: drivers/infiniband/ulp/ipoib/ipoib_multicast.c:225 db3: 49 8b 45 70 mov0x70(%r13),%rax include/linux/byteorder/swab.h:147 db7: 8b 40 20mov0x20(%rax),

[openib-general] ipoib oops

2005-08-24 Thread Michael S. Tsirkin
Hi, Roland! I have seen the following oops recently, typically after restarting opensm on the same machine. This is on ipoib rev 3113 Pls note I'm running with my two event patches. The oops seems to be around offset db7 below: drivers/infiniband/ulp/ipoib/ipoib_multicast.c:223 da4: 49

[openib-general] IPoIB -- connected mode update

2005-08-03 Thread Vivek Kashyap
Attached is an udpated draft (will be posting to internet drafts after the current ietf ends) for ipoib-connected mode based on the discussions on ipoib wg, openib (IB on Linux), and other communications. Two threads that saw good discussion are given below. I believe the attached updated draft ca

[openib-general] [IPoIB] Add support for MTU module parameter

2005-06-16 Thread Hal Rosenstock
[IPoIB] Add support for MTU module parameter. This is so there can be a non default MTU at boot up if the administrator so desires (prior to being able to invoke ifconfig). Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]> Index: ipoib_main.c ==

Re: [openib-general] IPoIB: Waiting for ib0 to become free

2005-06-10 Thread William Jordan
On 6/10/05, Roland Dreier <[EMAIL PROTECTED]> wrote: >Hal> dev/core.c netdev_wait_allrefs says: * Any protocol or device >Hal> that holds a reference should register * for netdevice >Hal> notification, and cleanup and put back the * reference if >Hal> they receive an UNREGISTER even

Re: [openib-general] IPoIB: Waiting for ib0 to become free

2005-06-10 Thread Roland Dreier
Hal> dev/core.c netdev_wait_allrefs says: * Any protocol or device Hal> that holds a reference should register * for netdevice Hal> notification, and cleanup and put back the * reference if Hal> they receive an UNREGISTER event. Hal> Is it correct that IPoIB does not need to re

[openib-general] IPoIB: Waiting for ib0 to become free

2005-06-10 Thread Hal Rosenstock
Hi Roland and Troy, Last week, Troy reported the following: On Fri, 2005-06-03 at 16:33, Troy Benjegerdes wrote: > > > Also, I have two machines in a state right now where they are > > > printing out: > > > > > > kernel: unregister_netdevice: waiting for ib0 to become free. > > > Usage count =

[openib-general] IPoIB MTU Changing

2005-06-09 Thread Hal Rosenstock
Hi Roland, I have a question about ipoib_main.c::ipoib_change_mtu: static int ipoib_change_mtu(struct net_device *dev, int new_mtu) { struct ipoib_dev_priv *priv = netdev_priv(dev); if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN) return -EINVAL; Shouldn't the

Re: [openib-general] ipoib: bringing ib0 down kills ib0.8001?

2005-06-03 Thread Roland Dreier
Tom> Should it be the case that bringing down ib0 should kill off Tom> the other pkey devices: Looks like a bug -- I'll take a look. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-genera

[openib-general] ipoib: bringing ib0 down kills ib0.8001?

2005-06-03 Thread Tom Duffy
Should it be the case that bringing down ib0 should kill off the other pkey devices: [EMAIL PROTECTED] ~]# netstat -nr Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 10.6.98.0 0.0.0.0 255.255.255.0 U 0 0 0 eth

Re: [openib-general] IPoIB

2005-04-04 Thread Grant Grundler
On Mon, Apr 04, 2005 at 06:48:19PM -0400, Hal Rosenstock wrote: > Do you mean IB or IP bridge/router ? IB bridges are switches. IB routers > forward at the IB network layer and are not completely specified. I > suspect you mean an IP router with one or more IPoIB interfaces. Yes, I was thinking IP

Re: [openib-general] IPoIB

2005-04-04 Thread Hal Rosenstock
On Mon, 2005-04-04 at 18:35, Grant Grundler wrote: > On Mon, Apr 04, 2005 at 06:08:03PM -0400, Hal Rosenstock wrote: > > A while ago, Tom brought up the issue of IPoIB link level broadcasting > > from user space (with the arping tool). Is it possible to do this from > > kernel space? > > I would t

Re: [openib-general] IPoIB

2005-04-04 Thread Grant Grundler
On Mon, Apr 04, 2005 at 06:08:03PM -0400, Hal Rosenstock wrote: > A while ago, Tom brought up the issue of IPoIB link level broadcasting > from user space (with the arping tool). Is it possible to do this from > kernel space? I would think any driver can call hard_xmit() for any "NIC". pktgen.c do

[openib-general] IPoIB

2005-04-04 Thread Hal Rosenstock
A while ago, Tom brought up the issue of IPoIB link level broadcasting from user space (with the arping tool). Is it possible to do this from kernel space ? For example, how would/could sendto() work when sending to a IPoIB link layer address ? If all we wanted to support was broadcast, perhaps the

Re: [openib-general] ipoib/mthca broken on ia64?

2005-03-23 Thread Grant Grundler
On Wed, Mar 23, 2005 at 04:36:37PM -0800, Bob Woodruff wrote: > I think on these nodes I have some very old PCI-X HCAs (A0 silicon) > that I cannot even upgrade to the newest firmware. Ok - good to know. AFAICT, I only have rev A1 silicon. > I have also seen a switch get into a weird state from t

RE: [openib-general] ipoib/mthca broken on ia64?

2005-03-23 Thread Bob Woodruff
Grant wrote> >Yup. The switch was hosed and cycling power got it back to life again: >[EMAIL PROTECTED]:~$ cat /sys/class/infiniband/mthca0/ports/*/state >4: ACTIVE >1: DOWN >[EMAIL PROTECTED]:~$ cat /sys/class/infiniband/mthca0/ports/*/state >4: ACTIVE >4: ACTIVE >Of course, ping works too.

Re: [openib-general] ipoib/mthca broken on ia64?

2005-03-23 Thread Grant Grundler
On Wed, Mar 23, 2005 at 12:33:05PM -0800, Roland Dreier wrote: > Grant> *nod*. I try the above first...then cycle power and see if > Grant> it comes back to life. The switch has been on since > Grant> December or so. > > If you have a serial console or ethernet configured for the switch

Re: [openib-general] ipoib/mthca broken on ia64?

2005-03-23 Thread Roland Dreier
Grant> *nod*. I try the above first...then cycle power and see if Grant> it comes back to life. The switch has been on since Grant> December or so. If you have a serial console or ethernet configured for the switch, you can check if it still looks happy as well. It wouldn't really sur

Re: [openib-general] ipoib/mthca broken on ia64?

2005-03-23 Thread Grant Grundler
On Wed, Mar 23, 2005 at 12:16:08PM -0800, Roland Dreier wrote: > It looks like the driver is working but the SM isn't bringing the > ports to the active state. The problem could still be on the host or > the switch unfortunately. What do you see in the files > > /sys/class/infiniband/mthca0/

Re: [openib-general] ipoib/mthca broken on ia64?

2005-03-23 Thread Grant Grundler
On Wed, Mar 23, 2005 at 03:16:43PM -0500, Hal Rosenstock wrote: > Hi Grant, > > iowa:/usr/src/linux-2.6# cat /sys/class/infiniband/mthca0/ports/*/state > > 1: DOWN > > 2: INIT > > Looks like port 2 is plugged in. It needs to get to ACTIVE before IPoIB > will work. Is the SM enabled in the TS switc

Re: [openib-general] ipoib/mthca broken on ia64?

2005-03-23 Thread Hal Rosenstock
Hi Grant, On Wed, 2005-03-23 at 15:08, Grant Grundler wrote: > Hi, > I wanted to run netpipe and basics aren't working. I haven't > tried the SVN tree in over a month. It could have been broken > for ia64 for a while. Sorry for lagging on that... > > I'm running 2.6.11 kernel with TOB svn bits an

Re: [openib-general] ipoib/mthca broken on ia64?

2005-03-23 Thread Roland Dreier
> iowa:/usr/src/linux-2.6# cat /sys/class/infiniband/mthca0/ports/*/state > 1: DOWN > 2: INIT It looks like the driver is working but the SM isn't bringing the ports to the active state. The problem could still be on the host or the switch unfortunately. What do you see in the files

[openib-general] ipoib/mthca broken on ia64?

2005-03-23 Thread Grant Grundler
Hi, I wanted to run netpipe and basics aren't working. I haven't tried the SVN tree in over a month. It could have been broken for ia64 for a while. Sorry for lagging on that... I'm running 2.6.11 kernel with TOB svn bits and building the IB modules "in tree". Just replaced the drivers/infiniband

[openib-general] IPoIB/ia64 cacheline misses

2005-01-22 Thread Grant Grundler
Here is use of pfmon 3.1 to sample address events. In short, nothing here jumps out at me and screams for a big opportunity to optimize. But I don't fully understand the data and instruction flow either. Maybe someone else sees more opportunity. e.g. I'm wondering if netfilter is a significant pe

Re: [openib-general] IPoIB completion handler

2005-01-11 Thread Sean Hefty
Hal Rosenstock wrote: Why key off bit in wc.wr_id rather than use wc.opcode to determine where the completed operation was a receive or transmit ? The opcode isn't set if status != success. Also, when wc.status != success, wr_id cannot be trusted but is still The wr_id is always valid, regardless o

[openib-general] IPoIB completion handler

2005-01-11 Thread Hal Rosenstock
Hi Roland, I have a couple of questions on the IPoIB completion handler: Why key off bit in wc.wr_id rather than use wc.opcode to determine where the completed operation was a receive or transmit ? Also, when wc.status != success, wr_id cannot be trusted but is still used to determine operation

Re: [openib-general] IPoIB breaks

2005-01-04 Thread Josh England
On Tue, 2005-01-04 at 11:18 -0800, Roland Dreier wrote: > Josh> I'm getting great numbers from IPoIB...right up until it > Josh> dies. The system is x86_64 and PCIe with latest stuff under > Josh> 2.6.10. Streaming tests with both netperf and NetPIPE die > Josh> almost instantly (

Re: [openib-general] IPoIB breaks

2005-01-04 Thread Roland Dreier
Josh> I'm getting great numbers from IPoIB...right up until it Josh> dies. The system is x86_64 and PCIe with latest stuff under Josh> 2.6.10. Streaming tests with both netperf and NetPIPE die Josh> almost instantly (NetPIPE actually lasts for a couple Josh> iterations). Ping

[openib-general] IPoIB breaks

2005-01-04 Thread Josh England
I'm getting great numbers from IPoIB...right up until it dies. The system is x86_64 and PCIe with latest stuff under 2.6.10. Streaming tests with both netperf and NetPIPE die almost instantly (NetPIPE actually lasts for a couple iterations). Ping-pong tests with NetPIPE seem to consistently die

Re: [openib-general] IPoIB performance.

2004-12-27 Thread Roland Dreier
Ido> 1. We can divide the single CQ into two separate completion Ido> queues: one for the RQ and the other for SQ. Then we can Ido> change the CQ policy affiliated with the SQ into Ido> IB_CQ_CONSUMER_REARM and in mainstream not arm the CQ. In Ido> such case the poll_cq_tq wil

[openib-general] IPoIB performance.

2004-12-27 Thread Ido Bukspan
Hello I have been investigating the performance of the IPoIB for a while and I have discovered 3 interesting points which I think are worth implementing in gen2. 1. We can divide the single CQ into two separate completion queues: one for the RQ and the other for SQ. Then we can change the CQ p

RE: [openib-general] IPoIB still not working

2004-12-21 Thread Woodruff, Robert J
>Are you at 4.6.1 now on all your PCIe HCAs ? Not sure whether 4.5.3 does indeed work >but it sounds like something has changed for the worse since Roland had it working. >Thanks. >-- Hal I had problems with the 4.3.5 firmware and it seemed to work with my early version of the 4.6.0-rc4 f

RE: [openib-general] IPoIB still not working

2004-12-21 Thread Hal Rosenstock
Hi Josh, [You wrote:] I've tried with the latest from SVN as well as the exact same kernel/rootFS you were using, and I still can't ping between nodes. I'm just doing 'modprobe ib_mthca; modprobe ib_ipoib; ifconfig ...; ping ...' on the same two nodes. Am I missing something? Anything in /var/

Re: [openib-general] IPoIB still not working

2004-12-21 Thread Roland Dreier
Josh> Roland, Alright...I'm not sure whats going on. I just now Josh> got a chance to test this out and it still isn't working. Josh> I've tried with the latest from SVN as well as the exact Josh> same kernel/rootFS you were using, and I still can't ping Josh> between nodes. I

Re: [openib-general] IPoIB still not working

2004-12-20 Thread Josh England
Roland, Alright...I'm not sure whats going on. I just now got a chance to test this out and it still isn't working. I've tried with the latest from SVN as well as the exact same kernel/rootFS you were using, and I still can't ping between nodes. I'm just doing 'modprobe ib_mthca; modprobe ib_ip

[openib-general] IPoIB Path Static Rate

2004-12-16 Thread Hal Rosenstock
Hi Roland, It looks to me like after obtaining the PathRecord, the static rate is not used when the AV is created. Shouldn't it be ? Is there an issue with doing this ? There is a similar issue with the multicast AVs as well. I know there is an assumption that everything is 4x but I am not sure th

[openib-general] IPoIB Partial Connectivity Scenario

2004-12-16 Thread Hal Rosenstock
I've looked at the remote side to understand what it was (or wasn't doing). The partial connectivity stems from an issue in resolving the path on the remote side. I have a proposal: Rather than a single SA Get(PathRecord) with a 1 second timeout, what about a retry or two with a smaller (0.33 - 0.

RE: [openib-general] IPoIB oops on path record completion

2004-12-16 Thread Woodruff, Robert J
>Are you running the latest code from svn? I fixed a bug this morning >that would cause problems with more than 2 nodes. >Thanks, > Roland With the 1348 version I just downloaded, I can now ping from all nodes to all other nodes. I will not try to install and run some MPI tests and/or other

RE: [openib-general] IPoIB oops on path record completion

2004-12-16 Thread Woodruff, Robert J
TED] Subject: Re: [openib-general] IPoIB oops on path record completion Robert> I also seem to be having some partial connectivity Robert> problems. The first 2 nodes seem to be able to Robert> communicate, but adding the 3rd and 4th nodes, they cannot Robert> pi

Re: [openib-general] IPoIB oops on path record completion

2004-12-16 Thread Roland Dreier
Robert> I also seem to be having some partial connectivity Robert> problems. The first 2 nodes seem to be able to Robert> communicate, but adding the 3rd and 4th nodes, they cannot Robert> ping the first 2. Are you running the latest code from svn? I fixed a bug this morning that

Re: [openib-general] IPoIB oops on path record completion

2004-12-16 Thread Hal Rosenstock
On Thu, 2004-12-16 at 12:35, Roland Dreier wrote: > Are you running the latest code from svn? I fixed a bug this morning > that would cause problems with more than 2 nodes. I am. -- Hal ___ openib-general mailing list [EMAIL PROTECTED] http://openib.o

RE: [openib-general] IPoIB oops on path record completion

2004-12-16 Thread Woodruff, Robert J
>Still have the partial connectivity problem. I can see the ARP going out >on the broadcast group followed by ARPs coming oin on the broadcast >group followed by the PathRecord requests/responses with the SA followed >by the unicast ARP and ICMP. After the unicast ARP to one of the nodes, >it is

RE: [openib-general] IPoIB still not working

2004-12-16 Thread England, Joshua J
Title: RE: [openib-general] IPoIB still not working They're on 4.5.3. -JE -Original Message- From: Hal Rosenstock [mailto:[EMAIL PROTECTED]] Sent: Thu 12/16/2004 8:54 AM To: England, Joshua J Cc: Roland Dreier; Robert J Woodruff; [EMAIL PROTECTED] Subject: RE: [openib-ge

RE: [openib-general] IPoIB still not working

2004-12-16 Thread Hal Rosenstock
On Wed, 2004-12-15 at 13:22, England, Joshua J wrote: > I'll definitely pound on the stuff and let you know if anything > breaks. You are using the 4.3.5 firmware, right ? I want to put the proper info into the IPoIB FAQ. Thanks. -- Hal ___ openib-gene

<    1   2   3   4   5   >