Re: reproducable panic eviction work queue

2015-07-22 Thread Frank Schreuder
Op 7/22/2015 om 4:14 PM schreef Nikolay Aleksandrov: On 07/22/2015 04:03 PM, Nikolay Aleksandrov wrote: On 07/22/2015 03:58 PM, Florian Westphal wrote: Nikolay Aleksandrov niko...@cumulusnetworks.com wrote: On 07/22/2015 10:17 AM, Frank Schreuder wrote: I got some additional information

Re: reproducable panic eviction work queue

2015-07-22 Thread Frank Schreuder
I got some additional information from syslog: Jul 22 09:49:33 dommy0 kernel: [ 675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42] Jul 22 09:49:42 dommy0 kernel: [ 685.114033] INFO: rcu_sched self-detected stall on CPU { 3} (t=39918 jiffies g=988 c=987

Re: reproducable panic eviction work queue

2015-07-22 Thread Frank Schreuder
Op 7/21/2015 om 8:34 PM schreef Florian Westphal: Frank Schreuder fschreu...@transip.nl wrote: [ inet frag evictor crash ] We believe we found the bug. This patch should fix it. We cannot share list for buckets and evictor, the flag member is subject to race conditions so flags

Re: reproducable panic eviction work queue

2015-07-22 Thread Nikolay Aleksandrov
On 07/22/2015 10:17 AM, Frank Schreuder wrote: I got some additional information from syslog: Jul 22 09:49:33 dommy0 kernel: [ 675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42] Jul 22 09:49:42 dommy0 kernel: [ 685.114033] INFO: rcu_sched self-detected

Re: reproducable panic eviction work queue

2015-07-22 Thread Frank Schreuder
Hi Nikolay, Thanks for this patch. I'm no longer able to reproduce this panic on our test environment! The server has been handling 120k fragmented UDP packets per second for over 40 minutes So far everything is running stable without stacktraces in the logs. All other panics happened within

Re: reproducable panic eviction work queue

2015-07-22 Thread Florian Westphal
Nikolay Aleksandrov niko...@cumulusnetworks.com wrote: On 07/22/2015 10:17 AM, Frank Schreuder wrote: I got some additional information from syslog: Jul 22 09:49:33 dommy0 kernel: [ 675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:42] Jul 22 09:49:42

Re: reproducable panic eviction work queue

2015-07-22 Thread Nikolay Aleksandrov
On 07/22/2015 04:03 PM, Nikolay Aleksandrov wrote: On 07/22/2015 03:58 PM, Florian Westphal wrote: Nikolay Aleksandrov niko...@cumulusnetworks.com wrote: On 07/22/2015 10:17 AM, Frank Schreuder wrote: I got some additional information from syslog: Jul 22 09:49:33 dommy0 kernel: [

Re: reproducable panic eviction work queue

2015-07-22 Thread Nikolay Aleksandrov
On 07/22/2015 03:58 PM, Florian Westphal wrote: Nikolay Aleksandrov niko...@cumulusnetworks.com wrote: On 07/22/2015 10:17 AM, Frank Schreuder wrote: I got some additional information from syslog: Jul 22 09:49:33 dommy0 kernel: [ 675.987890] NMI watchdog: BUG: soft lockup - CPU#3 stuck for

Re: reproducable panic eviction work queue

2015-07-21 Thread Florian Westphal
Frank Schreuder fschreu...@transip.nl wrote: [ inet frag evictor crash ] We believe we found the bug. This patch should fix it. We cannot share list for buckets and evictor, the flag member is subject to race conditions so flags INET_FRAG_EVICTED test is not reliable. It would be great if

Re: reproducable panic eviction work queue

2015-07-21 Thread Frank Schreuder
On 7/20/2015 04:30 PM Florian Westphal wrote: Frank Schreuder fschreu...@transip.nl wrote: On 7/18/2015 05:32 PM, Nikolay Aleksandrov wrote: On 07/18/2015 05:28 PM, Johan Schuijt wrote: Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could

Re: reproducable panic eviction work queue

2015-07-20 Thread Frank Schreuder
On 7/18/2015 05:32 PM, Nikolay Aleksandrov wrote: On 07/18/2015 05:28 PM, Johan Schuijt wrote: Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Of course, please see attached file. Also could you

Re: reproducable panic eviction work queue

2015-07-20 Thread Florian Westphal
Frank Schreuder fschreu...@transip.nl wrote: On 7/18/2015 05:32 PM, Nikolay Aleksandrov wrote: On 07/18/2015 05:28 PM, Johan Schuijt wrote: Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Of

Re: reproducable panic eviction work queue

2015-07-20 Thread Nikolay Aleksandrov
On 07/20/2015 02:47 PM, Frank Schreuder wrote: On 7/18/2015 05:32 PM, Nikolay Aleksandrov wrote: On 07/18/2015 05:28 PM, Johan Schuijt wrote: Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Of

Re: reproducable panic eviction work queue

2015-07-18 Thread Johan Schuijt
Yes, we already found these and are included in our kernel, but even with these patches we still receive the panic. - Johan On 18 Jul 2015, at 10:56, Eric Dumazet eric.duma...@gmail.com wrote: On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote: Hey guys, We’re currently running

Re: reproducable panic eviction work queue

2015-07-18 Thread Eric Dumazet
On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote: Hey guys, We’re currently running into a reproducible panic in the eviction work queue code when we pin al our eth* IRQ to different CPU cores (in order to scale our networking performance for our virtual servers). This only occurs

Re: reproducable panic eviction work queue

2015-07-18 Thread Nikolay Aleksandrov
On 07/18/2015 12:02 PM, Nikolay Aleksandrov wrote: On 07/18/2015 11:01 AM, Johan Schuijt wrote: Yes, we already found these and are included in our kernel, but even with these patches we still receive the panic. - Johan On 18 Jul 2015, at 10:56, Eric Dumazet eric.duma...@gmail.com wrote:

Re: reproducable panic eviction work queue

2015-07-18 Thread Nikolay Aleksandrov
On 07/18/2015 05:28 PM, Johan Schuijt wrote: Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Of course, please see attached file. Also could you test with a clean current kernel from Linus' tree

Re: reproducable panic eviction work queue

2015-07-18 Thread Johan Schuijt
With attachment this time, also not sure wether this is what you were referring to, so let me know if anything else needed! - Johan On 18 Jul 2015, at 17:28, Johan Schuijt-Li jo...@transip.nl wrote: Thx for your looking into this! Thank you for the report, I will try to reproduce this

Re: reproducable panic eviction work queue

2015-07-18 Thread Johan Schuijt
Thx for your looking into this! Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Of course, please see attached file. Also could you test with a clean current kernel from Linus' tree or Dave's -net ? Will do. These are available

Re: reproducable panic eviction work queue

2015-07-18 Thread Nikolay Aleksandrov
On 07/18/2015 11:01 AM, Johan Schuijt wrote: Yes, we already found these and are included in our kernel, but even with these patches we still receive the panic. - Johan On 18 Jul 2015, at 10:56, Eric Dumazet eric.duma...@gmail.com wrote: On Fri, 2015-07-17 at 21:18 +, Johan