Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-10 Thread Christoph Lameter
On Thu, 10 Sep 2015, Nikolay Borisov wrote: > > echo 1 >santy_checks > > [root@kernighan linux-stable]# cd /sys/kernel/slab/kmalloc-32/ > [root@kernighan kmalloc-32]# echo 1 > sanity_checks > [root@kernighan kmalloc-32]# cat sanity_checks > 1 > > So this works as expected when set by echo.

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-10 Thread Nikolay Borisov
On 09/09/2015 05:01 PM, Christoph Lameter wrote: > On Wed, 9 Sep 2015, Nikolay Borisov wrote: > >> [root@kernighan vm]# ./slabinfo -da kmalloc-32 >> Cannot write to dma-kmalloc-32/sanity >> [root@kernighan vm]# ./slabinfo -dF kmalloc-32 >> Cannot write to dma-kmalloc-32/sanity >>

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-10 Thread Christoph Lameter
On Thu, 10 Sep 2015, Nikolay Borisov wrote: > > echo 1 >santy_checks > > [root@kernighan linux-stable]# cd /sys/kernel/slab/kmalloc-32/ > [root@kernighan kmalloc-32]# echo 1 > sanity_checks > [root@kernighan kmalloc-32]# cat sanity_checks > 1 > > So this works as expected when set by echo.

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-10 Thread Nikolay Borisov
On 09/09/2015 05:01 PM, Christoph Lameter wrote: > On Wed, 9 Sep 2015, Nikolay Borisov wrote: > >> [root@kernighan vm]# ./slabinfo -da kmalloc-32 >> Cannot write to dma-kmalloc-32/sanity >> [root@kernighan vm]# ./slabinfo -dF kmalloc-32 >> Cannot write to dma-kmalloc-32/sanity >>

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-09 Thread Christoph Lameter
On Wed, 9 Sep 2015, Vlastimil Babka wrote: > On 09/09/2015 04:01 PM, Christoph Lameter wrote: > > On Wed, 9 Sep 2015, Nikolay Borisov wrote: > > > > What does: > > > > echo 1 >trace > > > > do? Could crash the sysem due to overload of messages. > > Yes I've seen that happen. Did you consider

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-09 Thread Vlastimil Babka
On 09/09/2015 04:01 PM, Christoph Lameter wrote: On Wed, 9 Sep 2015, Nikolay Borisov wrote: What does: echo 1 >trace do? Could crash the sysem due to overload of messages. Yes I've seen that happen. Did you consider hooking it to trace_printk() instead of printk()? -- To

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-09 Thread Christoph Lameter
On Wed, 9 Sep 2015, Nikolay Borisov wrote: > [root@kernighan vm]# ./slabinfo -da kmalloc-32 > Cannot write to dma-kmalloc-32/sanity > [root@kernighan vm]# ./slabinfo -dF kmalloc-32 > Cannot write to dma-kmalloc-32/sanity > [root@kernighan vm]# ./slabinfo -dz kmalloc-32 > kmalloc-32 not empty

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-09 Thread Nikolay Borisov
On 09/08/2015 06:15 PM, Christoph Lameter wrote: > On Tue, 8 Sep 2015, Nikolay Borisov wrote: > >>> You have read https://www.kernel.org/doc/Documentation/vm/slub.txt? >> >> I've read that I'm also following the merge/nomerge thread on the DM >> mailing list. I guess my understanding is wrong

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-09 Thread Nikolay Borisov
On 09/08/2015 06:15 PM, Christoph Lameter wrote: > On Tue, 8 Sep 2015, Nikolay Borisov wrote: > >>> You have read https://www.kernel.org/doc/Documentation/vm/slub.txt? >> >> I've read that I'm also following the merge/nomerge thread on the DM >> mailing list. I guess my understanding is wrong

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-09 Thread Christoph Lameter
On Wed, 9 Sep 2015, Nikolay Borisov wrote: > [root@kernighan vm]# ./slabinfo -da kmalloc-32 > Cannot write to dma-kmalloc-32/sanity > [root@kernighan vm]# ./slabinfo -dF kmalloc-32 > Cannot write to dma-kmalloc-32/sanity > [root@kernighan vm]# ./slabinfo -dz kmalloc-32 > kmalloc-32 not empty

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-09 Thread Vlastimil Babka
On 09/09/2015 04:01 PM, Christoph Lameter wrote: On Wed, 9 Sep 2015, Nikolay Borisov wrote: What does: echo 1 >trace do? Could crash the sysem due to overload of messages. Yes I've seen that happen. Did you consider hooking it to trace_printk() instead of printk()? -- To

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-09 Thread Christoph Lameter
On Wed, 9 Sep 2015, Vlastimil Babka wrote: > On 09/09/2015 04:01 PM, Christoph Lameter wrote: > > On Wed, 9 Sep 2015, Nikolay Borisov wrote: > > > > What does: > > > > echo 1 >trace > > > > do? Could crash the sysem due to overload of messages. > > Yes I've seen that happen. Did you consider

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-08 Thread Christoph Lameter
On Tue, 8 Sep 2015, Nikolay Borisov wrote: > > You have read https://www.kernel.org/doc/Documentation/vm/slub.txt? > > I've read that I'm also following the merge/nomerge thread on the DM > mailing list. I guess my understanding is wrong in that if multiple slab > caches are merged, then it's

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-08 Thread Nikolay Borisov
On 09/08/2015 05:27 PM, Christoph Lameter wrote: > On Tue, 8 Sep 2015, Nikolay Borisov wrote: > >> Unfortunately I haven't found a way to reproduce it so the only option >> would be to do this on a live server. However, the performance impact I >> believe is going to be very prohibitive :(.

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-08 Thread Christoph Lameter
On Tue, 8 Sep 2015, Nikolay Borisov wrote: > Unfortunately I haven't found a way to reproduce it so the only option > would be to do this on a live server. However, the performance impact I > believe is going to be very prohibitive :(. Alternatively what I could > do is probably leave merging on

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-08 Thread Nikolay Borisov
On 09/08/2015 04:58 PM, Christoph Lameter wrote: > On Mon, 7 Sep 2015, Nikolay Borisov wrote: > >> Did a bit more investigation and it turns out the >> corruption is happening in slab_alloc_node, in the >> 'else' branch when get_freepointer is being called: > > Please reboot the system and

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-08 Thread Christoph Lameter
On Mon, 7 Sep 2015, Nikolay Borisov wrote: > Did a bit more investigation and it turns out the > corruption is happening in slab_alloc_node, in the > 'else' branch when get_freepointer is being called: Please reboot the system and specify slub_debug on the kernel command line. This

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-08 Thread Christoph Lameter
On Tue, 8 Sep 2015, Nikolay Borisov wrote: > Unfortunately I haven't found a way to reproduce it so the only option > would be to do this on a live server. However, the performance impact I > believe is going to be very prohibitive :(. Alternatively what I could > do is probably leave merging on

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-08 Thread Christoph Lameter
On Mon, 7 Sep 2015, Nikolay Borisov wrote: > Did a bit more investigation and it turns out the > corruption is happening in slab_alloc_node, in the > 'else' branch when get_freepointer is being called: Please reboot the system and specify slub_debug on the kernel command line. This

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-08 Thread Nikolay Borisov
On 09/08/2015 04:58 PM, Christoph Lameter wrote: > On Mon, 7 Sep 2015, Nikolay Borisov wrote: > >> Did a bit more investigation and it turns out the >> corruption is happening in slab_alloc_node, in the >> 'else' branch when get_freepointer is being called: > > Please reboot the system and

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-08 Thread Nikolay Borisov
On 09/08/2015 05:27 PM, Christoph Lameter wrote: > On Tue, 8 Sep 2015, Nikolay Borisov wrote: > >> Unfortunately I haven't found a way to reproduce it so the only option >> would be to do this on a live server. However, the performance impact I >> believe is going to be very prohibitive :(.

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-08 Thread Christoph Lameter
On Tue, 8 Sep 2015, Nikolay Borisov wrote: > > You have read https://www.kernel.org/doc/Documentation/vm/slub.txt? > > I've read that I'm also following the merge/nomerge thread on the DM > mailing list. I guess my understanding is wrong in that if multiple slab > caches are merged, then it's

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-07 Thread Holger Hoffstätte
On Mon, 07 Sep 2015 11:49:12 +, Holger Hoffstätte wrote: > On Mon, 07 Sep 2015 14:30:49 +0300, Nikolay Borisov wrote: > >> If you have the vmlinux image for the kernel you were running at the >> time, the crash occured, could you post the output of addr2line -f -e >> path/to/vmlinux

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-07 Thread Holger Hoffstätte
On Mon, 07 Sep 2015 14:30:49 +0300, Nikolay Borisov wrote: > If you have the vmlinux image for the kernel you were running at the > time, the crash occured, could you post the output of addr2line -f -e > path/to/vmlinux 8115bd4d to see if it also fails in > get_freepointer. Had to

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-07 Thread Nikolay Borisov
Hi, If you have the vmlinux image for the kernel you were running at the time, the crash occured, could you post the output of addr2line -f -e path/to/vmlinux 8115bd4d to see if it also fails in get_freepointer. Regards, Nikolay On 09/07/2015 01:37 PM, Holger Hoffstätte wrote: > On Mon,

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-07 Thread Nikolay Borisov
Did a bit more investigation and it turns out the corruption is happening in slab_alloc_node, in the 'else' branch when get_freepointer is being called: 0x81182a50 <+144>: movsxd rax,DWORD PTR [r12+0x20] 0x81182a55 <+149>: movrdi,QWORD PTR [r12] 0x81182a59

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-07 Thread Holger Hoffstätte
On Mon, 07 Sep 2015 11:41:17 +0300, Nikolay Borisov wrote: > Hello, > > On one of our servers I've observed the a kernel pannic > happening with the following backtrace: > > [654405.527070] BUG: unable to handle kernel paging request at > 00028001 > [654405.527076] IP: []

Kernel 4.1.6 Panic due to slab corruption

2015-09-07 Thread Nikolay Borisov
Hello, On one of our servers I've observed the a kernel pannic happening with the following backtrace: [654405.527070] BUG: unable to handle kernel paging request at 00028001 [654405.527076] IP: [] kmem_cache_alloc_node+0x99/0x1e0 [654405.527085] PGD 14bef58067 PUD 2ab358067 PMD 0

Kernel 4.1.6 Panic due to slab corruption

2015-09-07 Thread Nikolay Borisov
Hello, On one of our servers I've observed the a kernel pannic happening with the following backtrace: [654405.527070] BUG: unable to handle kernel paging request at 00028001 [654405.527076] IP: [] kmem_cache_alloc_node+0x99/0x1e0 [654405.527085] PGD 14bef58067 PUD 2ab358067 PMD 0

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-07 Thread Nikolay Borisov
Hi, If you have the vmlinux image for the kernel you were running at the time, the crash occured, could you post the output of addr2line -f -e path/to/vmlinux 8115bd4d to see if it also fails in get_freepointer. Regards, Nikolay On 09/07/2015 01:37 PM, Holger Hoffstätte wrote: > On Mon,

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-07 Thread Holger Hoffstätte
On Mon, 07 Sep 2015 14:30:49 +0300, Nikolay Borisov wrote: > If you have the vmlinux image for the kernel you were running at the > time, the crash occured, could you post the output of addr2line -f -e > path/to/vmlinux 8115bd4d to see if it also fails in > get_freepointer. Had to

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-07 Thread Holger Hoffstätte
On Mon, 07 Sep 2015 11:41:17 +0300, Nikolay Borisov wrote: > Hello, > > On one of our servers I've observed the a kernel pannic > happening with the following backtrace: > > [654405.527070] BUG: unable to handle kernel paging request at > 00028001 > [654405.527076] IP: []

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-07 Thread Holger Hoffstätte
On Mon, 07 Sep 2015 11:49:12 +, Holger Hoffstätte wrote: > On Mon, 07 Sep 2015 14:30:49 +0300, Nikolay Borisov wrote: > >> If you have the vmlinux image for the kernel you were running at the >> time, the crash occured, could you post the output of addr2line -f -e >> path/to/vmlinux

Re: Kernel 4.1.6 Panic due to slab corruption

2015-09-07 Thread Nikolay Borisov
Did a bit more investigation and it turns out the corruption is happening in slab_alloc_node, in the 'else' branch when get_freepointer is being called: 0x81182a50 <+144>: movsxd rax,DWORD PTR [r12+0x20] 0x81182a55 <+149>: movrdi,QWORD PTR [r12] 0x81182a59