Re: [E1000-devel] [PATCH RFC 0/2] e1000e: 82574 also needs ASPM L1 completely disabled

2012-04-29 Thread Nix
On 24 Apr 2012, Jesse Brandeburg outgrape: Please let us know the results of your testing, we will let you know if we see any issues as well. Alas, it has no effect at all here; L0s and L1 claim to be being disabled at boot time, but if you ask with lspci you see that they are not. I strongly

Re: [E1000-devel] [PATCH 1/2] e1000e: Disable ASPM L1 on 82574

2012-04-24 Thread Nix
On 23 Apr 2012, Chris Boot uttered the following: ASPM on the 82574 causes trouble. Currently the driver disables L0s for this NIC but only disables L1 if the MTU is 1500. This patch simply causes L1 to be disabled regardless of the MTU setting. FWIW, that existing code doesn't actually work

Re: [E1000-devel] e1000e interface hang on 82574L

2012-04-06 Thread Nix
On 6 Apr 2012, Bjorn Helgaas outgrape: If I understand correctly, e1000e attempts to disable ASPM to work around an 82574L hardware erratum, but the PCI core either doesn't disable ASPM or it gets re-enabled somehow. It gets re-enabled. If you explicitly do a setpci in the boot process to turn

Re: [E1000-devel] e1000e interface hang on 82574L

2012-03-19 Thread Nix
On 19 Mar 2012, Carolyn Wyborny stated: So, at least we are clear in your situation, the ASPM needs to be disabled. Please let me know if there are continued problems after booting with pcie_aspm=off. If you look further down in

Re: [E1000-devel] e1000e interface hang on 82574L

2012-03-19 Thread Nix
On 19 Mar 2012, Carolyn Wyborny said: you'll see that I tested that, and it doesn't work :( even if it did work, it shouldn't be needed: the driver attempts to turn off PCIe ASPM on affected NICs, and fails, apparently because *something* turns it back on again. The driver attempts to disable

Re: [E1000-devel] e1000e interface hang on 82574L

2012-03-17 Thread Nix
On 17 Mar 2012, Chris Boot verbalised: Most notably it appears as though MSI-X is not enabled on the Supermicro, and ASPM L1 is. There appears to be no difference on the Supermicro as to the MSI-X status when booting with IntMode=1,1 compared to without it. This bug is an ASPM bug, not an

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-05-18 Thread Nix
On 17 May 2011, German Gomez stated: Sorry for replying to an old thread, but I'm getting exactly the same problem with a 00:19.0 Ethernet controller: Intel Corporation 82567LM Gigabit Network Connection (rev 03) FWIW I'm still getting it, and still 'fixing' it with a setpci after boot

Re: [E1000-devel] transmit hang under load 82574L

2011-03-23 Thread Nix
On 22 Mar 2011, Stephen Hemminger verbalised: All seems happy now, system has been chugging along for 5 days. Mine often ran that long before imploding, particularly if the NIC was relatively idle. The thing to ask is if ASPM is still turned on on the affected card... if it is, I suspect you'll

Re: [E1000-devel] transmit hang under load 82574L

2011-03-19 Thread Nix
On 17 Mar 2011, Bruce W. Allan said: OK, it looks like you are hitting the same issue described in http://sourceforge.net/tracker/?func=detailaid=3170405group_id=42302atid=447449 where ASPM L0s is supposed to be disabled on the device (as indicated by dmesg output) but is really not (as

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-02-02 Thread Nix
On 1 Feb 2011, Bruce W. Allan spake thusly: -Original Message- From: Nix [mailto:n...@esperi.org.uk] I am... confuzzled, but am happy to try turning L0s/L1 off (if I can figure out how to do it: setpci is... not the most friendly of tools and I've never even looked at its manpage before

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-02-02 Thread Nix
On 1 Feb 2011, Bruce W. Allan stated: From: Jesse Brandeburg [mailto:jesse.brandeb...@gmail.com] Please, for our benefit, file a bug at e1000.sf.net (if you have not already) so you can attach the .config and full dmesg file from a non-working kernel, also please attach the full lspci -vvv

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Nix
On 31 Jan 2011, Bruce W. Allan said: From: Nix [mailto:n...@esperi.org.uk] I wonder if this has something to do with PCI ASPM? The driver turns ASPM off at least partially for this NIC, but if the NIC is being flipped into some sort of low-power state when transmission ceases for a while

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Nix
On 31 Jan 2011, n...@esperi.org.uk stated: On 31 Jan 2011, Bruce W. Allan said: Have you tried booting with pcie_aspm=off kernel parameter? I didn't know that parameter existe. Added, will reboot shortly: let us see what happens. :) No change: LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Nix
On 31 Jan 2011, Bruce W. Allan spake thusly: From: Nix [mailto:n...@esperi.org.uk] I'm not so sure anymore. In 2.6.35.4, everything works -- but in 2.6.35.4, the lspci output is *exactly the same*, i.e. even there lspci claims that ASPM L0s and L1 are enabled. This seems unlikely, since even

[E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-29 Thread Nix
Way back in November, in http://sourceforge.net/mailarchive/forum.php?thread_name=87k4kfq1at.fsf%40spindle.srvr.nixforum_name=e1000-devel, I reported a problem with the 82754 in one of my machines freezing up at random. This problem continues in 2.6.37, and bisection has still failed because the

Re: [E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)

2010-11-14 Thread Nix
On 8 Nov 2010, n...@esperi.org.uk stated: On 8 Nov 2010, Emil S. Tantilov verbalised: Nix wrote: For the record, cherry-picking ff10e13cd06f3dbe90e9fffc3c2dd2057a116e4b (the periodic phy-crash-and-reset check) atop 2.6.36 seems to have fixed it: at least, the machine has been up for a day

Re: [E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)

2010-11-08 Thread Nix
On 4 Nov 2010, Jesse Brandeburg spake thusly: The above could be responsible for your issue. If you don't want to disable ASPM system wide, then you could just make sure to run a recent kernel with the ASPM patches, or get our e1000.sf.net e1000e driver and try it, as it will work around the

Re: [E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)

2010-11-08 Thread Nix
On 8 Nov 2010, Emil S. Tantilov verbalised: Nix wrote: For the record, cherry-picking ff10e13cd06f3dbe90e9fffc3c2dd2057a116e4b (the periodic phy-crash-and-reset check) atop 2.6.36 seems to have fixed it: at least, the machine has been up for a day now without trouble. This commit doesn't

Re: [E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)

2010-11-04 Thread Nix
On 4 Nov 2010, Jesse Brandeburg outgrape: On Mon, 2010-11-01 at 16:08 -0700, Nix wrote: 03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt

[E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)

2010-10-31 Thread Nix
It's the weekend, the time when busy servers get upgraded without annoying the users. I was just congratulating myself on an upgrade to 2.6.36 with only a few problems (the NFS -ESTALE bug I have yet to localize, and a watchdog bug causing constant reboots which may well be the fault of the

Re: [E1000-devel] [in-tree drivers] freezing e1000e in 2.6.31 (SMP only? MSI? PAUSE!)

2009-11-09 Thread Nix
On 8 Nov 2009, n...@esperi.org.uk told this: On 6 Nov 2009, Emil S. Tantilov verbalised: Also try disabling Tx pause frames: ethtool -A fastnet tx off autoneg off Trying that now. No freezes yet, but I haven't really given it long enough. I just did a large number of

Re: [E1000-devel] [in-tree drivers] freezing e1000e in 2.6.31 (SMP only? MSI?)

2009-11-08 Thread Nix
On 6 Nov 2009, Emil S. Tantilov verbalised: Nix wrote: Ever since 2.6.31 was released, my gigabit e1000e link has been acting up. Notably, under sufficient load (generally, on this machine, NFS load), packets cease to be transferred, and the (MSI) interrupt count ceases to rise. Pulling

[E1000-devel] 2.6.31 regression: e1000e jumbo frames no longer work: 'Unsupported MTU setting'

2009-09-26 Thread Nix
[Bruce, you have changes in net-next in this area, so you might have a clue what's going on here.] In 2.6.30.x, I was happily bringing up the 82574L cards in one server like this: ip link set fastnet up mtu 7200 As of 2.6.31.x, what I see is this: spindle:/root# ip link set mtu 7200 dev

Re: [E1000-devel] 2.6.30rc7: ksoftirqd CPU saturation (x86-64 and x86-32 both) (in-tree e1000e at fault)

2009-07-04 Thread Nix
On 4 Jul 2009, n...@esperi.org.uk outgrape: On 1 Jul 2009, Jesse Brandeburg spake thusly: Just FYI, our development tree is internal only for our out of tree driver, but we send patches to the kernel ASAP, after they have passed testing. Aha! So... is it worth reporting bugs in mainline

Re: [E1000-devel] 2.6.30rc7: ksoftirqd CPU saturation (x86-64 and x86-32 both) (in-tree e1000e at fault)

2009-06-01 Thread Nix
On 1 Jun 2009, David Miller uttered the following: From: Nix n...@esperi.org.uk Date: Mon, 01 Jun 2009 01:16:26 +0100 I plan to try out 2.6.29 (and back to 2.6.25 or thereabouts) tomorrow and see if it ever worked: if it did I'll bisect for it (rendered tricky by the out-of-tree e1000e

Re: [E1000-devel] 2.6.30rc7: ksoftirqd CPU saturation (x86-64 only, not x86-32) (e1000e-related?)

2009-06-01 Thread Nix
On 1 Jun 2009, Jesse Brandeburg spake thusly: 57: 0 0 0 7654 0 0 0 0 PCI-MSI-edge gordianet-rx-0 58: 0 0 0 0 8065 0 0 0 PCI-MSI-edge gordianet-tx-0 59: 0 0 0 0 3 0 0

[E1000-devel] 2.6.30rc7: ksoftirqd CPU saturation (x86-64 only, not x86-32) (e1000e-related?)

2009-05-31 Thread Nix
I've just compiled a 64-bit kernel for a couple of quad-core Nehalems (one L5520, one Core i7) for the first time. Both were using 32-bit kernels happily before, and one (the Core i7) is happy afterwards: but the other sees two ksoftirqd threads saturating the CPU (well, half of it, this being a

Re: [E1000-devel] 2.6.30rc7: ksoftirqd CPU saturation (x86-64 only, not x86-32) (e1000e-related?)

2009-05-31 Thread Nix
On 1 Jun 2009, Andrew Morton said: Let's cc netdev on this. Presumably it is a post-2.6.29 regression. I don't know: the earliest kernel this machine has ever run was 2.6.30rc5, and this failing 2.6.30rc7 kernel is the first 64-bit kernel I've ever run on it. So currently I have one single

[E1000-devel] [PATCH] cater for enumization of irqreturn_t in 2.6.30 (was: Re: nfsroot on multiple-e1000e serial-over-LAN system - deadlock?)

2009-05-21 Thread Nix
On 20 May 2009, n...@esperi.org.uk spake thusly: All is not well with the out-of-tree driver, though: 0.5.18.3 doesn't even build without the patch below, and screams loudly in the log at startup, e.g.: [ 93.041327] irq event 57: bogus return value f70b5eb4 [ 93.046871] Pid: 0, comm: