Re: [E1000-devel] [PATCH RFC 0/2] e1000e: 82574 also needs ASPM L1 completely disabled

2012-04-29 Thread Nix
On 24 Apr 2012, Jesse Brandeburg outgrape: > Please let us know the results of your testing, we will let you know if > we see any issues as well. Alas, it has no effect at all here; L0s and L1 claim to be being disabled at boot time, but if you ask with lspci you see that they are not. I strongly

Re: [E1000-devel] [PATCH 1/2] e1000e: Disable ASPM L1 on 82574

2012-04-24 Thread Nix
On 23 Apr 2012, Chris Boot uttered the following: > ASPM on the 82574 causes trouble. Currently the driver disables L0s for > this NIC but only disables L1 if the MTU is >1500. This patch simply > causes L1 to be disabled regardless of the MTU setting. FWIW, that existing code doesn't actually wo

Re: [E1000-devel] e1000e interface hang on 82574L

2012-04-06 Thread Nix
On 6 Apr 2012, Henrique de Moraes Holschuh outgrape: > You probably need to disable it upstream of the 82574L as well. Here > (SuperMicro C7X58) I managed to get it to be stable by telling the BIOS > to disable L0s and L1 system-wide. > > But not all BIOSes will have that option... Indeed not :(

Re: [E1000-devel] e1000e interface hang on 82574L

2012-04-06 Thread Nix
On 6 Apr 2012, Bjorn Helgaas outgrape: > If I understand correctly, e1000e attempts to disable ASPM to work > around an 82574L hardware erratum, but the PCI core either doesn't > disable ASPM or it gets re-enabled somehow. It gets re-enabled. If you explicitly do a setpci in the boot process to tu

Re: [E1000-devel] e1000e interface hang on 82574L

2012-03-19 Thread Nix
On 19 Mar 2012, Carolyn Wyborny said: >>you'll see that I tested that, and it doesn't work :( even if it did >>work, it shouldn't be needed: the driver attempts to turn off PCIe ASPM >>on affected NICs, and fails, apparently because *something* turns it >>back on again. >> > The driver attempts to

Re: [E1000-devel] e1000e interface hang on 82574L

2012-03-19 Thread Nix
On 19 Mar 2012, Carolyn Wyborny stated: > So, at least we are clear in your situation, the ASPM needs to be > disabled. Please let me know if there are continued problems after > booting with pcie_aspm=off. If you look further down in

Re: [E1000-devel] e1000e interface hang on 82574L

2012-03-17 Thread Nix
On 17 Mar 2012, Chris Boot verbalised: > Most notably it appears as though MSI-X is not enabled on the > Supermicro, and ASPM L1 is. There appears to be no difference on the > Supermicro as to the MSI-X status when booting with IntMode=1,1 compared > to without it. This bug is an ASPM bug, not

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-05-18 Thread Nix
On 17 May 2011, German Gomez stated: > Sorry for replying to an old thread, but I'm getting exactly the same > problem with a > > 00:19.0 Ethernet controller: Intel Corporation 82567LM Gigabit Network > Connection (rev 03) FWIW I'm still getting it, and still 'fixing' it with a setpci after boo

Re: [E1000-devel] transmit hang under load 82574L

2011-03-23 Thread Nix
On 22 Mar 2011, Stephen Hemminger verbalised: > All seems happy now, system has been chugging along for 5 days. Mine often ran that long before imploding, particularly if the NIC was relatively idle. The thing to ask is if ASPM is still turned on on the affected card... if it is, I suspect you'll

Re: [E1000-devel] transmit hang under load 82574L

2011-03-19 Thread Nix
On 17 Mar 2011, Bruce W. Allan said: > OK, it looks like you are hitting the same issue described in > > > > where ASPM L0s is supposed to be disabled on the device (as indicated by > dmesg output) but is really n

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-02-09 Thread Nix
On 2 Feb 2011, n...@esperi.org.uk uttered the following: > On 1 Feb 2011, Bruce W. Allan stated: > >>>From: Jesse Brandeburg [mailto:jesse.brandeb...@gmail.com] >>>Please, for our benefit, file a bug at e1000.sf.net (if you have not >>>already) so you can attach the .config and full dmesg file fro

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-02-02 Thread Nix
On 1 Feb 2011, Bruce W. Allan stated: >>From: Jesse Brandeburg [mailto:jesse.brandeb...@gmail.com] >>Please, for our benefit, file a bug at e1000.sf.net (if you have not >>already) so you can attach the .config and full dmesg file from a >>non-working kernel, also please attach the full lspci -vvv

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-02-02 Thread Nix
On 1 Feb 2011, Bruce W. Allan spake thusly: >>-Original Message- >>From: Nix [mailto:n...@esperi.org.uk] >>I am... confuzzled, but am happy to try turning L0s/L1 off (if I can >>figure out how to do it: setpci is... not the most friendly of tools >>and

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Nix
On 31 Jan 2011, Bruce W. Allan spake thusly: >>From: Nix [mailto:n...@esperi.org.uk] >>I'm not so sure anymore. In 2.6.35.4, everything works -- but in 2.6.35.4, >>the lspci output is *exactly the same*, i.e. even there lspci claims that >>ASPM L0s and L1 are enable

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Nix
On 31 Jan 2011, Bruce W. Allan spake thusly: >>Because lspci simply reads the PCI configuration space (IIRC), I doubt it >>is reporting incorrect information. The e1000e driver uses the kernel >>API to disable ASPM (when CONFIG_PCIEASPM is enabled in the kernel config >>otherwise it writes direct

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Nix
On 31 Jan 2011, n...@esperi.org.uk stated: > On 31 Jan 2011, Bruce W. Allan said: >> Have you tried booting with pcie_aspm=off kernel parameter? > > I didn't know that parameter existe. Added, will reboot shortly: let us > see what happens. :) No change: LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Nix
On 31 Jan 2011, Bruce W. Allan said: >>From: Nix [mailto:n...@esperi.org.uk] >>I wonder if this has something to do with PCI ASPM? The driver turns >>ASPM off at least partially for this NIC, but if the NIC is being >>flipped into some sort of low-power state when t

[E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-29 Thread Nix
Way back in November, in , I reported a problem with the 82754 in one of my machines freezing up at random. This problem continues in 2.6.37, and bisection has still failed because th

Re: [E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)

2010-11-14 Thread Nix
On 8 Nov 2010, n...@esperi.org.uk stated: > On 8 Nov 2010, Emil S. Tantilov verbalised: > >> Nix wrote: >>> For the record, cherry-picking >>> ff10e13cd06f3dbe90e9fffc3c2dd2057a116e4b (the periodic >>> phy-crash-and-reset check) atop 2.6.36 seems to have

Re: [E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)

2010-11-08 Thread Nix
On 8 Nov 2010, Emil S. Tantilov verbalised: > Nix wrote: >> For the record, cherry-picking >> ff10e13cd06f3dbe90e9fffc3c2dd2057a116e4b (the periodic >> phy-crash-and-reset check) atop 2.6.36 seems to have fixed it: at >> least, the machine has been up for a day now with

Re: [E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)

2010-11-08 Thread Nix
On 4 Nov 2010, Jesse Brandeburg spake thusly: > The above could be responsible for your issue. If you don't want to > disable ASPM system wide, then you could just make sure to run a recent > kernel with the ASPM patches, or get our e1000.sf.net e1000e driver and > try it, as it will work around t

Re: [E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)

2010-11-04 Thread Nix
On 4 Nov 2010, Jesse Brandeburg outgrape: > On Mon, 2010-11-01 at 16:08 -0700, Nix wrote: >> 03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network >> Connection > >> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ >> ExtSy

[E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)

2010-10-31 Thread Nix
It's the weekend, the time when busy servers get upgraded without annoying the users. I was just congratulating myself on an upgrade to 2.6.36 with only a few problems (the NFS -ESTALE bug I have yet to localize, and a watchdog bug causing constant reboots which may well be the fault of the daemon)

Re: [E1000-devel] [in-tree drivers] freezing e1000e in 2.6.31 (SMP only? MSI? PAUSE!)

2009-11-09 Thread Nix
On 8 Nov 2009, n...@esperi.org.uk told this: > On 6 Nov 2009, Emil S. Tantilov verbalised: >> Also try disabling Tx pause frames: >> ethtool -A fastnet tx off autoneg off > > Trying that now. No freezes yet, but I haven't really given it long > enough. I just did a large number of kernel-build-an

Re: [E1000-devel] [in-tree drivers] freezing e1000e in 2.6.31 (SMP only? MSI?)

2009-11-08 Thread Nix
On 6 Nov 2009, Emil S. Tantilov verbalised: > Nix wrote: >> Ever since 2.6.31 was released, my gigabit e1000e link has been acting >> up. Notably, under sufficient load (generally, on this machine, NFS >> load), packets cease to be transferred, and the (MSI) interrupt co

Re: [E1000-devel] 2.6.31 regression: e1000e jumbo frames no longer work: 'Unsupported MTU setting'

2009-09-28 Thread Nix
On 27 Sep 2009, Alexander Duyck said: > It looks like the problem is that the 82574 and 82583 seem to have > their max_hw_frame_size values swapped. You might try applying the > patch below. I am not sure if it will apply since I hand generated it Applies fine: works fine. Thank you! > using th

[E1000-devel] 2.6.31 regression: e1000e jumbo frames no longer work: 'Unsupported MTU setting'

2009-09-26 Thread Nix
[Bruce, you have changes in net-next in this area, so you might have a clue what's going on here.] In 2.6.30.x, I was happily bringing up the 82574L cards in one server like this: ip link set fastnet up mtu 7200 As of 2.6.31.x, what I see is this: spindle:/root# ip link set mtu 7200 dev fastne

Re: [E1000-devel] 2.6.30rc7: ksoftirqd CPU saturation (x86-64 and x86-32 both) (in-tree e1000e at fault)

2009-07-04 Thread Nix
On 4 Jul 2009, n...@esperi.org.uk outgrape: > On 1 Jul 2009, Jesse Brandeburg spake thusly: > >> Just FYI, our development tree is internal only for our out of tree >> driver, but we send patches to the kernel ASAP, after they have passed >> testing. > > Aha! So... is it worth reporting bugs in ma

Re: [E1000-devel] 2.6.30rc7: ksoftirqd CPU saturation (x86-64 and x86-32 both) (in-tree e1000e at fault)

2009-07-04 Thread Nix
On 1 Jul 2009, Jesse Brandeburg spake thusly: > Just FYI, our development tree is internal only for our out of tree > driver, but we send patches to the kernel ASAP, after they have passed > testing. Aha! So... is it worth reporting bugs in mainline that aren't in evidence when using the out-of-t

Re: [E1000-devel] 2.6.30rc7: ksoftirqd CPU saturation (x86-64 and x86-32 both) (in-tree e1000e at fault)

2009-06-06 Thread Nix
On 2 Jun 2009, Waskiewicz Jr said: > http://e1000.sf.net What about a git tree so we can use -rc kernels without having to redo forward-porting work that someone else has probably already done? There must *be* a dev tree somewhere but so far I've had no success figuring out where. -

Re: [E1000-devel] 2.6.30rc7: ksoftirqd CPU saturation (x86-64 only, not x86-32) (e1000e-related?)

2009-06-01 Thread Nix
On 1 Jun 2009, Jesse Brandeburg spake thusly: >> 57: 0 0 0 7654 0 0 0 0 PCI-MSI-edge >>gordianet-rx-0 >> 58: 0 0 0 0 8065 0 0 0 PCI-MSI-edge >>gordianet-tx-0 >> 59: 0 0 0 0 3 0

Re: [E1000-devel] 2.6.30rc7: ksoftirqd CPU saturation (x86-64 and x86-32 both) (in-tree e1000e at fault)

2009-06-01 Thread Nix
On 1 Jun 2009, David Miller uttered the following: > From: Nix > Date: Mon, 01 Jun 2009 01:16:26 +0100 > >> I plan to try out 2.6.29 (and back to 2.6.25 or thereabouts) tomorrow >> and see if it ever worked: if it did I'll bisect for it (rendered tricky >> by th

Re: [E1000-devel] 2.6.30rc7: ksoftirqd CPU saturation (x86-64 only, not x86-32) (e1000e-related?)

2009-05-31 Thread Nix
On 1 Jun 2009, Andrew Morton said: > Let's cc netdev on this. > > Presumably it is a post-2.6.29 regression. I don't know: the earliest kernel this machine has ever run was 2.6.30rc5, and this failing 2.6.30rc7 kernel is the first 64-bit kernel I've ever run on it. So currently I have one single

[E1000-devel] 2.6.30rc7: ksoftirqd CPU saturation (x86-64 only, not x86-32) (e1000e-related?)

2009-05-31 Thread Nix
I've just compiled a 64-bit kernel for a couple of quad-core Nehalems (one L5520, one Core i7) for the first time. Both were using 32-bit kernels happily before, and one (the Core i7) is happy afterwards: but the other sees two ksoftirqd threads saturating the CPU (well, half of it, this being a 4-

[E1000-devel] [PATCH] cater for enumization of irqreturn_t in 2.6.30 (was: Re: nfsroot on multiple-e1000e serial-over-LAN system -> deadlock?)

2009-05-21 Thread Nix
On 20 May 2009, n...@esperi.org.uk spake thusly: > All is not well with the out-of-tree driver, though: 0.5.18.3 doesn't > even build without the patch below, and screams loudly in the log at > startup, e.g.: > > [ 93.041327] irq event 57: bogus return value f70b5eb4 > [ 93.046871] Pid: 0, com

Re: [E1000-devel] nfsroot on multiple-e1000e serial-over-LAN system -> deadlock?

2009-05-20 Thread Nix
(e1000-devel, this is with an 82574L in 100Mb/s mode and upstream git up-to-date as of a couple of days ago. Your driver works, modulo a small patch and some unpleasant screaming in the log on boot: the in-tree one doesn't work.) On 19 May 2009, n...@esperi.org.uk uttered the following: > But then