RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-12-16 Thread Mattis Lorentzon
Hi Russell, > Now because things have changed during the last merge window, I've got > an even bigger problem sorting through that patch set and getting it > back into a submittable state. I've just sent out v2 for it onto the > net...@vger.kernel.org mailing list. > > The initial version (marked

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-29 Thread Fabio Estevam
Hi Mattis, On Fri, Aug 29, 2014 at 7:57 AM, Mattis Lorentzon wrote: > Iain, > >> Interesting. We obviously have some differences in how we boot, my >> changes to your config to get it to boot basically amount to reverting the >> patch you attached and then enabling sata and mmc. So far I've been

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-29 Thread Mattis Lorentzon
Iain, > Interesting. We obviously have some differences in how we boot, my > changes to your config to get it to boot basically amount to reverting the > patch you attached and then enabling sata and mmc. So far I've been unable > to get your config to fail. Our version of U-boot doesn't support

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-27 Thread Iain Paton
On 27/08/14 07:32, Mattis Lorentzon wrote: > Hi Iain, Russell and Fabio, > >> The config is attached. Note that there's a lot of additional stuff enabled >> as >> I'm aiming for a single general purpose kernel that covers i.MX6, AM3359, >> Allwinner A10/A20 along with several versions of boards u

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-26 Thread Mattis Lorentzon
Hi Iain, Russell and Fabio, > The config is attached. Note that there's a lot of additional stuff enabled as > I'm aiming for a single general purpose kernel that covers i.MX6, AM3359, > Allwinner A10/A20 along with several versions of boards using those > particular SoCs. > > Same kernel binary o

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-26 Thread Iain Paton
On 21/08/14 10:39, Iain Paton wrote: > On 19/08/14 07:03, Iain Paton wrote: >> On 17/08/14 22:46, Fabio Estevam wrote: >>> Iain, >>> >>> On Sun, Aug 17, 2014 at 6:34 PM, Iain Paton wrote: On 15/08/14 06:42, Mattis Lorentzon wrote: > We mostly run SSH with benchmarks using NFS, it can

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-26 Thread Iain Paton
On 25/08/14 11:18, Russell King - ARM Linux wrote: > On Wed, Aug 13, 2014 at 01:39:27PM +, Mattis Lorentzon wrote: >> All our tests seem to behave the same way on the Sabrelite as on our own >> board. >> A working theory is that the switch (3Com Switch 4400) triggers the >> degeneration >> of

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-25 Thread Russell King - ARM Linux
On Wed, Aug 13, 2014 at 01:39:27PM +, Mattis Lorentzon wrote: > All our tests seem to behave the same way on the Sabrelite as on our own > board. > A working theory is that the switch (3Com Switch 4400) triggers the > degeneration > of the network stack from which Linux does not seem to recov

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-22 Thread Iain Paton
On 22/08/14 01:01, Fabio Estevam wrote: > On Thu, Aug 21, 2014 at 6:39 AM, Iain Paton wrote: > >> two and a half days of running this against both a sabre-lite and a >> wandboard quad B1 and I still have no reason to think there's any >> sort of a problem. >> >> Up to now, my testing has been don

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-22 Thread Russell King - ARM Linux
On Thu, Aug 14, 2014 at 02:43:56PM +, Mattis Lorentzon wrote: > Fabio and Russell, > > > A working theory is that the switch (3Com Switch 4400) triggers the > > degeneration of the network stack from which Linux does not seem to > > recover, even if we later bypass the switch and directly conn

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-21 Thread Mattis Lorentzon
Fabio, > What is the silicon version of the mx6 in your sabrelite? What GCC version do > you use? The silicon version is PCIMX6Q6AVT10AA and the GCC version we use is arm-none-eabi-gcc (Fedora 2013.11.24-2.fc19) 4.8.1. Iain, > Up to now, my testing has been done with my own config, I'll now > r

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-21 Thread Fabio Estevam
On Thu, Aug 21, 2014 at 6:39 AM, Iain Paton wrote: > two and a half days of running this against both a sabre-lite and a > wandboard quad B1 and I still have no reason to think there's any > sort of a problem. > > Up to now, my testing has been done with my own config, I'll now > repeat the whole

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-21 Thread Iain Paton
On 19/08/14 07:03, Iain Paton wrote: > On 17/08/14 22:46, Fabio Estevam wrote: >> Iain, >> >> On Sun, Aug 17, 2014 at 6:34 PM, Iain Paton wrote: >>> On 15/08/14 06:42, Mattis Lorentzon wrote: >>> We mostly run SSH with benchmarks using NFS, it can probably be triggered by using only SSH

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-18 Thread Iain Paton
On 17/08/14 22:46, Fabio Estevam wrote: > Iain, > > On Sun, Aug 17, 2014 at 6:34 PM, Iain Paton wrote: >> On 15/08/14 06:42, Mattis Lorentzon wrote: >> >>> We mostly run SSH with benchmarks using NFS, it can probably be >>> triggered by using only SSH with the following loop: >>> >>> # while : ;

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-17 Thread Fabio Estevam
Iain, On Sun, Aug 17, 2014 at 6:34 PM, Iain Paton wrote: > On 15/08/14 06:42, Mattis Lorentzon wrote: > >> We mostly run SSH with benchmarks using NFS, it can probably be >> triggered by using only SSH with the following loop: >> >> # while : ; do ssh arm-card date; done > > Mattis, > > What sort

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-17 Thread Iain Paton
On 15/08/14 06:42, Mattis Lorentzon wrote: > We mostly run SSH with benchmarks using NFS, it can probably be > triggered by using only SSH with the following loop: > > # while : ; do ssh arm-card date; done Mattis, What sort of time does it take for you to see a problem? I've been running the

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-14 Thread Mattis Lorentzon
Fabio, > Do the stalls also happen on a pure 3.16 kernel? Yes, we just tried this out overnight and we get the same stalls here. We have seen similar problems on a Zynq-based board. It might be worth noting that a common chip between all three boards is, for example, the KSZ9021RN, while the FEC

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-14 Thread Fabio Estevam
On Thu, Aug 14, 2014 at 11:43 AM, Mattis Lorentzon wrote: > After a few more tests we have finally been able to trigger the exact same > stalls > on the Sabrelite board with a direct network connection (i.e. without the > switch). Do the stalls also happen on a pure 3.16 kernel? How can we re

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-14 Thread Mattis Lorentzon
Fabio and Russell, > A working theory is that the switch (3Com Switch 4400) triggers the > degeneration of the network stack from which Linux does not seem to > recover, even if we later bypass the switch and directly connect the board to > the server machine. After a few more tests we have final

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-13 Thread Mattis Lorentzon
Fabio and Russell, > In order to try to narrow down whether this is a board issue, could you try to > run the same kernel on a mx6q development board, such as mx6qsabresd, > cubox-i, wandboard, etc? Indeed, we have a Sabrelite development board and have run the same kernel configuration (please f

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-11 Thread Fabio Estevam
On Mon, Aug 11, 2014 at 10:32 AM, Mattis Lorentzon wrote: > Russell and Fabio, > >> I'd be interested to hear whether removing the >> >> interrupts-extended = ... >> >> property from your board's DT file, thereby causing you to revert back to the >> default I list above, also fixes the insta

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-11 Thread Mattis Lorentzon
Russell and Fabio, > I'd be interested to hear whether removing the > > interrupts-extended = ... > > property from your board's DT file, thereby causing you to revert back to the > default I list above, also fixes the instability you are seeing. We have tried to remove the board specific

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-08 Thread Russell King - ARM Linux
On Thu, Aug 07, 2014 at 01:12:48PM +0100, Russell King - ARM Linux wrote: > On Thu, Aug 07, 2014 at 11:11:06AM +, Mattis Lorentzon wrote: > > Russell, > > > > > Can you ascertain whether these stalls are a result of some failure of the > > > receive side or the transmit side - you should be ab

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-08 Thread Fabio Estevam
Mattis, On Thu, Aug 7, 2014 at 11:20 AM, Fabio Estevam wrote: > On Thu, Aug 7, 2014 at 9:12 AM, Russell King - ARM Linux > wrote: > >> Hmm, I'm slightly confused. On my iMX6Q, I have: >> >> 150: 581754 0 0 0 GIC 150 >> 2188000.ethernet >> 151: 0

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-07 Thread Troy Kisky
On 8/7/2014 7:38 AM, Fabio Estevam wrote: > On Thu, Aug 7, 2014 at 11:20 AM, Fabio Estevam wrote: > > ,but I am wondering if we should also do: > > --- a/arch/arm/boot/dts/imx6qdl-sabreauto.dtsi > +++ b/arch/arm/boot/dts/imx6qdl-sabreauto.dtsi > @@ -66,6 +66,7 @@ > pinctrl-0 = <&pinctrl_

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-07 Thread Fabio Estevam
On Thu, Aug 7, 2014 at 11:20 AM, Fabio Estevam wrote: > On a imx6q sabreauto I also get: > > 151: 0 0 0 0 GIC 151 > 2188000.ethernet > 166: 4577 0 0 0 gpio-mxc 6 > 2188000.ethernet > > and the GPIO1_6 interrupt come

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-07 Thread Fabio Estevam
On Thu, Aug 7, 2014 at 9:12 AM, Russell King - ARM Linux wrote: > Hmm, I'm slightly confused. On my iMX6Q, I have: > > 150: 581754 0 0 0 GIC 150 > 2188000.ethernet > 151: 0 0 0 0 GIC 151 > 2188000.ethernet Same h

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-07 Thread Russell King - ARM Linux
On Thu, Aug 07, 2014 at 11:11:06AM +, Mattis Lorentzon wrote: > Russell, > > > Can you ascertain whether these stalls are a result of some failure of the > > receive side or the transmit side - you should be able to tell that if you > > watch > > the packet counts via ifconfig on the stalled

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-07 Thread Mattis Lorentzon
Russell, > Can you ascertain whether these stalls are a result of some failure of the > receive side or the transmit side - you should be able to tell that if you > watch > the packet counts via ifconfig on the stalled card. Also, it would be useful > to > know whether the FEC interrupt was fir

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-06 Thread Russell King - ARM Linux
On Wed, Aug 06, 2014 at 11:10:06AM +, Mattis Lorentzon wrote: > Russell, > > > What is on the other end of the link? > > 16 ARM cards connected to a 3Com Switch 4400 connected to a Linux FC 20 > machine (Intel Corporation 82541PI Gigabit Ethernet Controller rev 05). > > There may be multiple

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-06 Thread Mattis Lorentzon
Russell, > What is on the other end of the link? 16 ARM cards connected to a 3Com Switch 4400 connected to a Linux FC 20 machine (Intel Corporation 82541PI Gigabit Ethernet Controller rev 05). There may be multiple problems. The backtrace has only been seen a few times, on two different cards. M

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-06 Thread Russell King - ARM Linux
On Tue, Aug 05, 2014 at 01:31:29PM +, Mattis Lorentzon wrote: > We have applied your V2 patch set of 30 patches on top of v3.16-rc2 and are > currently running some stability tests. > > During our first test round we triggered a timeout which caused the fec driver > to become unresponsive for

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-05 Thread Mattis Lorentzon
Hi Fabio, > Could this problem be the same one as reported at: > http://www.spinics.net/lists/arm-kernel/msg347914.html ? The problem you link to describes a permanent issue, our problem seems to be sporadic as most of our tests work fine (at least for a while). > Which Ethernet PHY do you use?

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-05 Thread Fabio Estevam
On Tue, Aug 5, 2014 at 10:31 AM, Mattis Lorentzon wrote: > We have applied your V2 patch set of 30 patches on top of v3.16-rc2 and are > currently running some stability tests. > > During our first test round we triggered a timeout which caused the fec driver > to become unresponsive for several

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-08-05 Thread Mattis Lorentzon
Hi Russell! > Now because things have changed during the last merge window, I've got an > even bigger problem sorting through that patch set and getting it back into a > submittable state. I've just sent out v2 for it onto the > net...@vger.kernel.org mailing list. > > The initial version (marked

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-07-01 Thread Fredrik Noring
Hi Russell, > -Original Message- > > The initial version (marked RFC) attracted very little interest from > > testers, or acks. I'd very much like to have some testing of it, so > > if you want to try it out, I can provide you with a git URL, patches > > or a combined patch. > > Sure! A

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-06-30 Thread Nathan Lynch
On 06/30/2014 07:30 AM, Fredrik Noring wrote: >> >> On Fri, Jun 27, 2014 at 04:16:57PM +, Fredrik Noring wrote: >>> Please find below a trace that appeared once with 3.16-rc2. Perhaps it >>> is of some interest? >> >> It's not that serious... I know that the FEC ethernet driver is horrendously

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-06-30 Thread Fredrik Noring
Hi Russell, It seems to be a compiler issue, where (GCC) 4.8.2 does not produce a properly working kernel. Happily, (Fedora 2013.11.24-2.fc19) 4.8.1 appears to do a lot better. No crashes so far with v3.16-rc2! All the best, Fredrik > -Original Message- > Hi Fredrik, > > On Fri, Jun 27,

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-06-29 Thread Fredrik Noring
Hi Russell, > -Original Message- > It's not that serious... I know that the FEC ethernet driver is horrendously > racy (I have had a patch set for about the last six months which fixes some of > its problems) but as I've had a lot of patches to deal with, and it's been > pushed to the back

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-06-27 Thread Fredrik Noring
Hi Russel, > On Thu, Jun 26, 2014 at 04:14:24PM +0100, Russell King - ARM Linux wrote: > > That's a similar workload to the one which is mentioned in the > > previous report. I've just set a similar transfer going, but this > > will be a 16GB file. > > I've run this transfer several times, but s

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-06-27 Thread Russell King - ARM Linux
Hi Fredrik, On Fri, Jun 27, 2014 at 04:16:57PM +, Fredrik Noring wrote: > Please find below a trace that appeared once with 3.16-rc2. Perhaps it is of > some interest? It's not that serious... I know that the FEC ethernet driver is horrendously racy (I have had a patch set for about the last

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-06-27 Thread Russell King - ARM Linux
On Thu, Jun 26, 2014 at 04:14:24PM +0100, Russell King - ARM Linux wrote: > On Thu, Jun 26, 2014 at 02:44:52PM +, Mattis Lorentzon wrote: > > We have managed to trigger the Oops by just transferring a large file > > over nfs > > cat /mnt/foo > /dev/null > > where foo is a file that is approxima

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-06-26 Thread Russell King - ARM Linux
On Thu, Jun 26, 2014 at 02:44:52PM +, Mattis Lorentzon wrote: > Thank you for your reply, > > > On Wed, Jun 25, 2014 at 01:55:05PM +, Mattis Lorentzon wrote: > > > I have a similar issue with v3.16-rc2 as previously reported by Waldemar > > Brodkorb for v3.15-rc4. > > > https://lkml.org/lk

RE: Oops: 17 SMP ARM (v3.16-rc2)

2014-06-26 Thread Mattis Lorentzon
Thank you for your reply, > On Wed, Jun 25, 2014 at 01:55:05PM +, Mattis Lorentzon wrote: > > I have a similar issue with v3.16-rc2 as previously reported by Waldemar > Brodkorb for v3.15-rc4. > > https://lkml.org/lkml/2014/5/9/330 > > This URL returns no useful information. I find that lkml

Re: Oops: 17 SMP ARM (v3.16-rc2)

2014-06-26 Thread Russell King - ARM Linux
On Wed, Jun 25, 2014 at 01:55:05PM +, Mattis Lorentzon wrote: > Hello kernel people, You may wish to also copy linux-arm-ker...@lists.infradead.org, which is where ARM kernel people are. > I have a similar issue with v3.16-rc2 as previously reported by Waldemar > Brodkorb for v3.15-rc4. > ht