Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-22 Thread Linus Torvalds
On Mon, Jan 22, 2018 at 5:26 AM, Rasmus Villemoes wrote: > On 2018-01-19 19:42, Linus Torvalds wrote: >> >> I actually asked (long long ago) for an optinal compiler warning for >> "pointer subtraction with non-power-of-2 sizes". Not because of it >> being undefined,

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-22 Thread Linus Torvalds
On Mon, Jan 22, 2018 at 5:26 AM, Rasmus Villemoes wrote: > On 2018-01-19 19:42, Linus Torvalds wrote: >> >> I actually asked (long long ago) for an optinal compiler warning for >> "pointer subtraction with non-power-of-2 sizes". Not because of it >> being undefined, but simply because it's

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-22 Thread Rasmus Villemoes
On 2018-01-19 19:42, Linus Torvalds wrote: > > I actually asked (long long ago) for an optinal compiler warning for > "pointer subtraction with non-power-of-2 sizes". Not because of it > being undefined, but simply because it's expensive. The > divide->multiply thing doesn't always work, Huh? If

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-22 Thread Rasmus Villemoes
On 2018-01-19 19:42, Linus Torvalds wrote: > > I actually asked (long long ago) for an optinal compiler warning for > "pointer subtraction with non-power-of-2 sizes". Not because of it > being undefined, but simply because it's expensive. The > divide->multiply thing doesn't always work, Huh? If

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-20 Thread Luc Van Oostenryck
On Sat, Jan 20, 2018 at 05:24:32AM +, Al Viro wrote: > On Sat, Jan 20, 2018 at 02:02:37AM +, Al Viro wrote: > > > Note that those sizes are rather sensitive to lockdep, spinlock debugging, > > etc. > > That they certainly are: on one of the testing .config I'm using it gave this: >

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-20 Thread Luc Van Oostenryck
On Sat, Jan 20, 2018 at 05:24:32AM +, Al Viro wrote: > On Sat, Jan 20, 2018 at 02:02:37AM +, Al Viro wrote: > > > Note that those sizes are rather sensitive to lockdep, spinlock debugging, > > etc. > > That they certainly are: on one of the testing .config I'm using it gave this: >

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Al Viro
On Sat, Jan 20, 2018 at 02:02:37AM +, Al Viro wrote: > Note that those sizes are rather sensitive to lockdep, spinlock debugging, > etc. That they certainly are: on one of the testing .config I'm using it gave this: 1104 sizeof struct page = 56 81 sizeof struct

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Al Viro
On Sat, Jan 20, 2018 at 02:02:37AM +, Al Viro wrote: > Note that those sizes are rather sensitive to lockdep, spinlock debugging, > etc. That they certainly are: on one of the testing .config I'm using it gave this: 1104 sizeof struct page = 56 81 sizeof struct

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Al Viro
On Fri, Jan 19, 2018 at 02:53:25PM -0800, Linus Torvalds wrote: > It would probably be good to add the size too, just to explain why > it's potentially expensive. > > That said, apparently we do have hundreds of them, with just > cpufreq_frequency_table having a ton. Maybe some are hidden in

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Al Viro
On Fri, Jan 19, 2018 at 02:53:25PM -0800, Linus Torvalds wrote: > It would probably be good to add the size too, just to explain why > it's potentially expensive. > > That said, apparently we do have hundreds of them, with just > cpufreq_frequency_table having a ton. Maybe some are hidden in

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Linus Torvalds
On Fri, Jan 19, 2018 at 2:12 PM, Al Viro wrote: > On Fri, Jan 19, 2018 at 10:42:18AM -0800, Linus Torvalds wrote: >> >> We *should* be careful about it. I guess sparse could be made to warn, >> but I'm afraid that we have so many of these things that a warning >> isn't

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Linus Torvalds
On Fri, Jan 19, 2018 at 2:12 PM, Al Viro wrote: > On Fri, Jan 19, 2018 at 10:42:18AM -0800, Linus Torvalds wrote: >> >> We *should* be careful about it. I guess sparse could be made to warn, >> but I'm afraid that we have so many of these things that a warning >> isn't reasonable. > > You mean

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Al Viro
On Fri, Jan 19, 2018 at 10:42:18AM -0800, Linus Torvalds wrote: > On Fri, Jan 19, 2018 at 4:55 AM, Matthew Wilcox wrote: > > > > So really we should be casting 'b' and 'a' to uintptr_t to be fully > > compliant with the spec. > > That's an unnecessary technicality. > > Any

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Al Viro
On Fri, Jan 19, 2018 at 10:42:18AM -0800, Linus Torvalds wrote: > On Fri, Jan 19, 2018 at 4:55 AM, Matthew Wilcox wrote: > > > > So really we should be casting 'b' and 'a' to uintptr_t to be fully > > compliant with the spec. > > That's an unnecessary technicality. > > Any compiler that doesn't

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Linus Torvalds
On Fri, Jan 19, 2018 at 4:55 AM, Matthew Wilcox wrote: > > So really we should be casting 'b' and 'a' to uintptr_t to be fully > compliant with the spec. That's an unnecessary technicality. Any compiler that doesn't get pointer inequality testing right is not worth even

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Linus Torvalds
On Fri, Jan 19, 2018 at 4:55 AM, Matthew Wilcox wrote: > > So really we should be casting 'b' and 'a' to uintptr_t to be fully > compliant with the spec. That's an unnecessary technicality. Any compiler that doesn't get pointer inequality testing right is not worth even worrying about. We

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Matthew Wilcox
On Fri, Jan 19, 2018 at 02:49:55AM +0300, Kirill A. Shutemov wrote: > > So that's why you can't do pointer diffs between two arrays. Not > > because you can't subtract the two pointers, but because the > > *division* part of the C pointer diff rules leads to issues. > > Thanks a lot for the

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Matthew Wilcox
On Fri, Jan 19, 2018 at 02:49:55AM +0300, Kirill A. Shutemov wrote: > > So that's why you can't do pointer diffs between two arrays. Not > > because you can't subtract the two pointers, but because the > > *division* part of the C pointer diff rules leads to issues. > > Thanks a lot for the

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Kirill A. Shutemov
On Fri, Jan 19, 2018 at 12:07:47PM +, Michal Hocko wrote: > > >From 861f68c555b87fd6c0ccc3428ace91b7e185b73a Mon Sep 17 00:00:00 2001 > > From: "Kirill A. Shutemov" > > Date: Thu, 18 Jan 2018 18:24:07 +0300 > > Subject: [PATCH] mm, page_vma_mapped: Drop faulty

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Kirill A. Shutemov
On Fri, Jan 19, 2018 at 12:07:47PM +, Michal Hocko wrote: > > >From 861f68c555b87fd6c0ccc3428ace91b7e185b73a Mon Sep 17 00:00:00 2001 > > From: "Kirill A. Shutemov" > > Date: Thu, 18 Jan 2018 18:24:07 +0300 > > Subject: [PATCH] mm, page_vma_mapped: Drop faulty pointer arithmetics in > >

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Michal Hocko
On Fri 19-01-18 14:49:17, Kirill A. Shutemov wrote: > On Fri, Jan 19, 2018 at 11:33:42AM +0100, Michal Hocko wrote: > > On Fri 19-01-18 13:02:59, Kirill A. Shutemov wrote: > > > On Thu, Jan 18, 2018 at 06:22:13PM +0100, Michal Hocko wrote: > > > > On Thu 18-01-18 18:40:26, Kirill A. Shutemov

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Michal Hocko
On Fri 19-01-18 14:49:17, Kirill A. Shutemov wrote: > On Fri, Jan 19, 2018 at 11:33:42AM +0100, Michal Hocko wrote: > > On Fri 19-01-18 13:02:59, Kirill A. Shutemov wrote: > > > On Thu, Jan 18, 2018 at 06:22:13PM +0100, Michal Hocko wrote: > > > > On Thu 18-01-18 18:40:26, Kirill A. Shutemov

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Kirill A. Shutemov
On Fri, Jan 19, 2018 at 11:33:42AM +0100, Michal Hocko wrote: > On Fri 19-01-18 13:02:59, Kirill A. Shutemov wrote: > > On Thu, Jan 18, 2018 at 06:22:13PM +0100, Michal Hocko wrote: > > > On Thu 18-01-18 18:40:26, Kirill A. Shutemov wrote: > > > [...] > > > > + /* > > > > +* Make

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Kirill A. Shutemov
On Fri, Jan 19, 2018 at 11:33:42AM +0100, Michal Hocko wrote: > On Fri 19-01-18 13:02:59, Kirill A. Shutemov wrote: > > On Thu, Jan 18, 2018 at 06:22:13PM +0100, Michal Hocko wrote: > > > On Thu 18-01-18 18:40:26, Kirill A. Shutemov wrote: > > > [...] > > > > + /* > > > > +* Make

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Michal Hocko
On Fri 19-01-18 13:02:59, Kirill A. Shutemov wrote: > On Thu, Jan 18, 2018 at 06:22:13PM +0100, Michal Hocko wrote: > > On Thu 18-01-18 18:40:26, Kirill A. Shutemov wrote: > > [...] > > > + /* > > > + * Make sure that pages are in the same section before doing pointer > > > + * arithmetics. > >

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Michal Hocko
On Fri 19-01-18 13:02:59, Kirill A. Shutemov wrote: > On Thu, Jan 18, 2018 at 06:22:13PM +0100, Michal Hocko wrote: > > On Thu 18-01-18 18:40:26, Kirill A. Shutemov wrote: > > [...] > > > + /* > > > + * Make sure that pages are in the same section before doing pointer > > > + * arithmetics. > >

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 06:22:13PM +0100, Michal Hocko wrote: > On Thu 18-01-18 18:40:26, Kirill A. Shutemov wrote: > [...] > > + /* > > +* Make sure that pages are in the same section before doing pointer > > +* arithmetics. > > +*/ > > + if (page_to_section(pvmw->page) !=

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-19 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 06:22:13PM +0100, Michal Hocko wrote: > On Thu 18-01-18 18:40:26, Kirill A. Shutemov wrote: > [...] > > + /* > > +* Make sure that pages are in the same section before doing pointer > > +* arithmetics. > > +*/ > > + if (page_to_section(pvmw->page) !=

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Tetsuo Handa
Kirill A. Shutemov wrote: > Something like this? > > > From 251e124630da82482e8b320c73162ce89af04d5d Mon Sep 17 00:00:00 2001 > From: "Kirill A. Shutemov" > Date: Thu, 18 Jan 2018 18:24:07 +0300 > Subject: [PATCH] mm, page_vma_mapped: Fix pointer arithmetics in

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Tetsuo Handa
Kirill A. Shutemov wrote: > Something like this? > > > From 251e124630da82482e8b320c73162ce89af04d5d Mon Sep 17 00:00:00 2001 > From: "Kirill A. Shutemov" > Date: Thu, 18 Jan 2018 18:24:07 +0300 > Subject: [PATCH] mm, page_vma_mapped: Fix pointer arithmetics in check_pte() > > Tetsuo reported

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 09:26:25AM -0800, Linus Torvalds wrote: > On Thu, Jan 18, 2018 at 8:56 AM, Kirill A. Shutemov > wrote: > > > > I can't say I fully grasp how 'diff' got this value and how it leads to both > > checks being false. > > I think the problem is that page

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 09:26:25AM -0800, Linus Torvalds wrote: > On Thu, Jan 18, 2018 at 8:56 AM, Kirill A. Shutemov > wrote: > > > > I can't say I fully grasp how 'diff' got this value and how it leads to both > > checks being false. > > I think the problem is that page difference when they

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Linus Torvalds
On Thu, Jan 18, 2018 at 9:26 AM, Luck, Tony wrote: >> Both are real page. But why do you expect pages to be 64-byte alinged? >> Both are aligned to 64-bit as they suppose to be IIUC. > > On a 64-bit kernel sizeof struct page == 64 (after much work by people to > trim out

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Linus Torvalds
On Thu, Jan 18, 2018 at 9:26 AM, Luck, Tony wrote: >> Both are real page. But why do you expect pages to be 64-byte alinged? >> Both are aligned to 64-bit as they suppose to be IIUC. > > On a 64-bit kernel sizeof struct page == 64 (after much work by people to > trim out excess stuff). So I

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Linus Torvalds
On Thu, Jan 18, 2018 at 8:56 AM, Kirill A. Shutemov wrote: > > I can't say I fully grasp how 'diff' got this value and how it leads to both > checks being false. I think the problem is that page difference when they are in different sections. When you do

RE: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Luck, Tony
> Both are real page. But why do you expect pages to be 64-byte alinged? > Both are aligned to 64-bit as they suppose to be IIUC. On a 64-bit kernel sizeof struct page == 64 (after much work by people to trim out excess stuff). So I thought we made sure to align the base address of blocks of

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Linus Torvalds
On Thu, Jan 18, 2018 at 8:56 AM, Kirill A. Shutemov wrote: > > I can't say I fully grasp how 'diff' got this value and how it leads to both > checks being false. I think the problem is that page difference when they are in different sections. When you do pte_page(*pvmw->pte) - pvmw->page

RE: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Luck, Tony
> Both are real page. But why do you expect pages to be 64-byte alinged? > Both are aligned to 64-bit as they suppose to be IIUC. On a 64-bit kernel sizeof struct page == 64 (after much work by people to trim out excess stuff). So I thought we made sure to align the base address of blocks of

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Michal Hocko
On Thu 18-01-18 18:40:26, Kirill A. Shutemov wrote: [...] > + /* > + * Make sure that pages are in the same section before doing pointer > + * arithmetics. > + */ > + if (page_to_section(pvmw->page) != page_to_section(page)) > + return false; OK, THPs shouldn't

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Michal Hocko
On Thu 18-01-18 18:40:26, Kirill A. Shutemov wrote: [...] > + /* > + * Make sure that pages are in the same section before doing pointer > + * arithmetics. > + */ > + if (page_to_section(pvmw->page) != page_to_section(page)) > + return false; OK, THPs shouldn't

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Linus Torvalds
On Thu, Jan 18, 2018 at 6:38 AM, Dave Hansen wrote: > On 01/18/2018 05:12 AM, Kirill A. Shutemov wrote: >> - if (pte_page(*pvmw->pte) - pvmw->page >= >> - hpage_nr_pages(pvmw->page)) { > > Is ->pte guaranteed to map a page which

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Linus Torvalds
On Thu, Jan 18, 2018 at 6:38 AM, Dave Hansen wrote: > On 01/18/2018 05:12 AM, Kirill A. Shutemov wrote: >> - if (pte_page(*pvmw->pte) - pvmw->page >= >> - hpage_nr_pages(pvmw->page)) { > > Is ->pte guaranteed to map a page which is within the same section

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 03:58:30PM +0100, Andrea Arcangeli wrote: > On Thu, Jan 18, 2018 at 06:45:00AM -0800, Dave Hansen wrote: > > On 01/18/2018 04:25 AM, Kirill A. Shutemov wrote: > > > [ 10.084024] diff: -858690919 > > > [ 10.084258] hpage_nr_pages: 1 > > > [ 10.084386] check1: 0 > > > [

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 03:58:30PM +0100, Andrea Arcangeli wrote: > On Thu, Jan 18, 2018 at 06:45:00AM -0800, Dave Hansen wrote: > > On 01/18/2018 04:25 AM, Kirill A. Shutemov wrote: > > > [ 10.084024] diff: -858690919 > > > [ 10.084258] hpage_nr_pages: 1 > > > [ 10.084386] check1: 0 > > > [

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 06:45:00AM -0800, Dave Hansen wrote: > On 01/18/2018 04:25 AM, Kirill A. Shutemov wrote: > > [ 10.084024] diff: -858690919 > > [ 10.084258] hpage_nr_pages: 1 > > [ 10.084386] check1: 0 > > [ 10.084478] check2: 0 > ... > > diff --git a/mm/page_vma_mapped.c

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 06:45:00AM -0800, Dave Hansen wrote: > On 01/18/2018 04:25 AM, Kirill A. Shutemov wrote: > > [ 10.084024] diff: -858690919 > > [ 10.084258] hpage_nr_pages: 1 > > [ 10.084386] check1: 0 > > [ 10.084478] check2: 0 > ... > > diff --git a/mm/page_vma_mapped.c

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Andrea Arcangeli
On Thu, Jan 18, 2018 at 06:45:00AM -0800, Dave Hansen wrote: > On 01/18/2018 04:25 AM, Kirill A. Shutemov wrote: > > [ 10.084024] diff: -858690919 > > [ 10.084258] hpage_nr_pages: 1 > > [ 10.084386] check1: 0 > > [ 10.084478] check2: 0 > ... > > diff --git a/mm/page_vma_mapped.c

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Andrea Arcangeli
On Thu, Jan 18, 2018 at 06:45:00AM -0800, Dave Hansen wrote: > On 01/18/2018 04:25 AM, Kirill A. Shutemov wrote: > > [ 10.084024] diff: -858690919 > > [ 10.084258] hpage_nr_pages: 1 > > [ 10.084386] check1: 0 > > [ 10.084478] check2: 0 > ... > > diff --git a/mm/page_vma_mapped.c

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Dave Hansen
On 01/18/2018 06:45 AM, Kirill A. Shutemov wrote: > On Thu, Jan 18, 2018 at 06:38:10AM -0800, Dave Hansen wrote: >> On 01/18/2018 05:12 AM, Kirill A. Shutemov wrote: >>> - if (pte_page(*pvmw->pte) - pvmw->page >= >>> - hpage_nr_pages(pvmw->page)) { >> Is ->pte

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Dave Hansen
On 01/18/2018 06:45 AM, Kirill A. Shutemov wrote: > On Thu, Jan 18, 2018 at 06:38:10AM -0800, Dave Hansen wrote: >> On 01/18/2018 05:12 AM, Kirill A. Shutemov wrote: >>> - if (pte_page(*pvmw->pte) - pvmw->page >= >>> - hpage_nr_pages(pvmw->page)) { >> Is ->pte

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 06:38:10AM -0800, Dave Hansen wrote: > On 01/18/2018 05:12 AM, Kirill A. Shutemov wrote: > > - if (pte_page(*pvmw->pte) - pvmw->page >= > > - hpage_nr_pages(pvmw->page)) { > > Is ->pte guaranteed to map a page which is within the same

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 06:38:10AM -0800, Dave Hansen wrote: > On 01/18/2018 05:12 AM, Kirill A. Shutemov wrote: > > - if (pte_page(*pvmw->pte) - pvmw->page >= > > - hpage_nr_pages(pvmw->page)) { > > Is ->pte guaranteed to map a page which is within the same

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Dave Hansen
On 01/18/2018 04:25 AM, Kirill A. Shutemov wrote: > [ 10.084024] diff: -858690919 > [ 10.084258] hpage_nr_pages: 1 > [ 10.084386] check1: 0 > [ 10.084478] check2: 0 ... > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > index d22b84310f6d..57b4397f1ea5 100644 > ---

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Dave Hansen
On 01/18/2018 04:25 AM, Kirill A. Shutemov wrote: > [ 10.084024] diff: -858690919 > [ 10.084258] hpage_nr_pages: 1 > [ 10.084386] check1: 0 > [ 10.084478] check2: 0 ... > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > index d22b84310f6d..57b4397f1ea5 100644 > ---

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Dave Hansen
On 01/18/2018 05:12 AM, Kirill A. Shutemov wrote: > - if (pte_page(*pvmw->pte) - pvmw->page >= > - hpage_nr_pages(pvmw->page)) { Is ->pte guaranteed to map a page which is within the same section as pvmw->page? Otherwise, with sparsemem (non-vmemmap), the

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Dave Hansen
On 01/18/2018 05:12 AM, Kirill A. Shutemov wrote: > - if (pte_page(*pvmw->pte) - pvmw->page >= > - hpage_nr_pages(pvmw->page)) { Is ->pte guaranteed to map a page which is within the same section as pvmw->page? Otherwise, with sparsemem (non-vmemmap), the

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 04:12:10PM +0300, Kirill A. Shutemov wrote: > On Thu, Jan 18, 2018 at 03:25:50PM +0300, Kirill A. Shutemov wrote: > > On Thu, Jan 18, 2018 at 05:12:45PM +0900, Tetsuo Handa wrote: > > > Tetsuo Handa wrote: > > > > OK. I missed the mark. I overlooked that 4.11 already has

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 04:12:10PM +0300, Kirill A. Shutemov wrote: > On Thu, Jan 18, 2018 at 03:25:50PM +0300, Kirill A. Shutemov wrote: > > On Thu, Jan 18, 2018 at 05:12:45PM +0900, Tetsuo Handa wrote: > > > Tetsuo Handa wrote: > > > > OK. I missed the mark. I overlooked that 4.11 already has

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 03:25:50PM +0300, Kirill A. Shutemov wrote: > On Thu, Jan 18, 2018 at 05:12:45PM +0900, Tetsuo Handa wrote: > > Tetsuo Handa wrote: > > > OK. I missed the mark. I overlooked that 4.11 already has this problem. > > > > > > I needed to bisect between 4.10 and 4.11, and I got

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 03:25:50PM +0300, Kirill A. Shutemov wrote: > On Thu, Jan 18, 2018 at 05:12:45PM +0900, Tetsuo Handa wrote: > > Tetsuo Handa wrote: > > > OK. I missed the mark. I overlooked that 4.11 already has this problem. > > > > > > I needed to bisect between 4.10 and 4.11, and I got

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 05:12:45PM +0900, Tetsuo Handa wrote: > Tetsuo Handa wrote: > > OK. I missed the mark. I overlooked that 4.11 already has this problem. > > > > I needed to bisect between 4.10 and 4.11, and I got plausible culprit. > > > > I haven't completed bisecting between

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Kirill A. Shutemov
On Thu, Jan 18, 2018 at 05:12:45PM +0900, Tetsuo Handa wrote: > Tetsuo Handa wrote: > > OK. I missed the mark. I overlooked that 4.11 already has this problem. > > > > I needed to bisect between 4.10 and 4.11, and I got plausible culprit. > > > > I haven't completed bisecting between

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Tetsuo Handa
Tetsuo Handa wrote: > OK. I missed the mark. I overlooked that 4.11 already has this problem. > > I needed to bisect between 4.10 and 4.11, and I got plausible culprit. > > I haven't completed bisecting between b4fb8f66f1ae2e16 and c470abd4fde40ea6, > but > b4fb8f66f1ae2e16 ("mm, page_alloc:

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Tetsuo Handa
Tetsuo Handa wrote: > OK. I missed the mark. I overlooked that 4.11 already has this problem. > > I needed to bisect between 4.10 and 4.11, and I got plausible culprit. > > I haven't completed bisecting between b4fb8f66f1ae2e16 and c470abd4fde40ea6, > but > b4fb8f66f1ae2e16 ("mm, page_alloc:

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 2:00 PM, Dave Hansen wrote: > > I thought that page_zone_id() stuff was there to prevent this kind of > cross-zone stuff from happening. Ahh, that was the part I missed. Yeah looks like that checks things properly. Although the mask generation

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 2:00 PM, Dave Hansen wrote: > > I thought that page_zone_id() stuff was there to prevent this kind of > cross-zone stuff from happening. Ahh, that was the part I missed. Yeah looks like that checks things properly. Although the mask generation is *so* confusing that I

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-17 Thread Dave Hansen
On 01/17/2018 01:51 PM, Linus Torvalds wrote: > In fact, it seems to be such a fundamental bug that I suspect I'm > entirely wrong, and full of shit. So it's an interesting and not > _obviously_ incorrect theory, but I suspect I must be missing > something. I'll just note that a few of the pfns I

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-17 Thread Dave Hansen
On 01/17/2018 01:51 PM, Linus Torvalds wrote: > In fact, it seems to be such a fundamental bug that I suspect I'm > entirely wrong, and full of shit. So it's an interesting and not > _obviously_ incorrect theory, but I suspect I must be missing > something. I'll just note that a few of the pfns I

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-17 Thread Dave Hansen
On 01/17/2018 01:39 PM, Linus Torvalds wrote: > > So maybe something like this to test the theory? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 76c9688b6a0a..f919a5548943 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -756,6 +756,8 @@ static inline

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-17 Thread Dave Hansen
On 01/17/2018 01:39 PM, Linus Torvalds wrote: > > So maybe something like this to test the theory? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 76c9688b6a0a..f919a5548943 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -756,6 +756,8 @@ static inline

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 1:39 PM, Linus Torvalds wrote: > > In fact, the whole > >pfn_valid_within(buddy_pfn) > > test looks very odd. Maybe the pfn of the buddy is valid, but it's not > in the same zone? Then we'd combine the two pages in two different > zones

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 1:39 PM, Linus Torvalds wrote: > > In fact, the whole > >pfn_valid_within(buddy_pfn) > > test looks very odd. Maybe the pfn of the buddy is valid, but it's not > in the same zone? Then we'd combine the two pages in two different > zones into one combined page. It

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 3:08 AM, Tetsuo Handa wrote: > > I needed to bisect between 4.10 and 4.11, and I got plausible culprit. > [...] > git bisect bad b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e > # first bad commit: [b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e]

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 3:08 AM, Tetsuo Handa wrote: > > I needed to bisect between 4.10 and 4.11, and I got plausible culprit. > [...] > git bisect bad b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e > # first bad commit: [b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e] mm, > page_alloc: Add missing check

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-17 Thread Tetsuo Handa
Linus Torvalds wrote: > > It turned out that CONFIG_FLATMEM was irrelevant. I just did not hit it. > > So have you actually been able to see the problem with FLATMEM, or is > this based on the bisect? Because I really think the bisect is pretty > much guaranteed to be wrong. Oops, this "it" is

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-17 Thread Tetsuo Handa
Linus Torvalds wrote: > > It turned out that CONFIG_FLATMEM was irrelevant. I just did not hit it. > > So have you actually been able to see the problem with FLATMEM, or is > this based on the bisect? Because I really think the bisect is pretty > much guaranteed to be wrong. Oops, this "it" is

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-16 Thread Linus Torvalds
On Tue, Jan 16, 2018 at 9:33 AM, Tetsuo Handa wrote: > > Since I got a faster reproducer, I tried full bisection between 4.11 and > 4.12-rc1. > But I have no idea why bisection arrives at c0332694903a37cf. I don't think your reproducer is 100% reliable. And

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-16 Thread Linus Torvalds
On Tue, Jan 16, 2018 at 9:33 AM, Tetsuo Handa wrote: > > Since I got a faster reproducer, I tried full bisection between 4.11 and > 4.12-rc1. > But I have no idea why bisection arrives at c0332694903a37cf. I don't think your reproducer is 100% reliable. And bisection is great because it's very

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-16 Thread Linus Torvalds
On Tue, Jan 16, 2018 at 12:06 AM, Dave Hansen wrote: > On 01/15/2018 06:14 PM, Linus Torvalds wrote: >> But I'm adding Dave Hansen explicitly to the cc, in case he has any >> ideas. Not because I blame him, but he's touched the sparsemem code >> fairly recently, so

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-16 Thread Linus Torvalds
On Tue, Jan 16, 2018 at 12:06 AM, Dave Hansen wrote: > On 01/15/2018 06:14 PM, Linus Torvalds wrote: >> But I'm adding Dave Hansen explicitly to the cc, in case he has any >> ideas. Not because I blame him, but he's touched the sparsemem code >> fairly recently, so maybe he'd have some idea on

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-16 Thread Tetsuo Handa
Linus Torvalds wrote: > On Mon, Jan 15, 2018 at 5:15 PM, Tetsuo Handa > wrote: > > > > I can't reproduce this with CONFIG_FLATMEM=y . But I'm not sure whether > > we are hitting a bug in CONFIG_SPARSEMEM=y code, for the bug is highly > > timing dependent. > >

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-16 Thread Tetsuo Handa
Linus Torvalds wrote: > On Mon, Jan 15, 2018 at 5:15 PM, Tetsuo Handa > wrote: > > > > I can't reproduce this with CONFIG_FLATMEM=y . But I'm not sure whether > > we are hitting a bug in CONFIG_SPARSEMEM=y code, for the bug is highly > > timing dependent. > > Hmm. Maybe. But sparsemem really

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-16 Thread Ingo Molnar
* Dave Hansen wrote: > Did anyone else notice the > > [ 31.068198] ? vmalloc_sync_all+0x150/0x150 > > present in a bunch of the stack traces? That should be pretty uncommon. I thikn that's pretty unusual: > Is it just part of the normal

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-16 Thread Ingo Molnar
* Dave Hansen wrote: > Did anyone else notice the > > [ 31.068198] ? vmalloc_sync_all+0x150/0x150 > > present in a bunch of the stack traces? That should be pretty uncommon. I thikn that's pretty unusual: > Is it just part of the normal do_page_fault() stack and the stack >

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-16 Thread Dave Hansen
On 01/15/2018 06:14 PM, Linus Torvalds wrote: > But I'm adding Dave Hansen explicitly to the cc, in case he has any > ideas. Not because I blame him, but he's touched the sparsemem code > fairly recently, so maybe he'd have some idea on adding sanity > checking to the sparsemem version of

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-16 Thread Dave Hansen
On 01/15/2018 06:14 PM, Linus Torvalds wrote: > But I'm adding Dave Hansen explicitly to the cc, in case he has any > ideas. Not because I blame him, but he's touched the sparsemem code > fairly recently, so maybe he'd have some idea on adding sanity > checking to the sparsemem version of

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-15 Thread Linus Torvalds
On Mon, Jan 15, 2018 at 5:15 PM, Tetsuo Handa wrote: > > I can't reproduce this with CONFIG_FLATMEM=y . But I'm not sure whether > we are hitting a bug in CONFIG_SPARSEMEM=y code, for the bug is highly > timing dependent. Hmm. Maybe. But sparsemem really also

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-15 Thread Linus Torvalds
On Mon, Jan 15, 2018 at 5:15 PM, Tetsuo Handa wrote: > > I can't reproduce this with CONFIG_FLATMEM=y . But I'm not sure whether > we are hitting a bug in CONFIG_SPARSEMEM=y code, for the bug is highly > timing dependent. Hmm. Maybe. But sparsemem really also generates *much* more complex code

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-15 Thread Tetsuo Handa
Linus Torvalds wrote: > On Sun, Jan 14, 2018 at 3:54 AM, Tetsuo Handa > wrote: > > This memory corruption bug occurs even on CONFIG_SMP=n CONFIG_PREEMPT_NONE=y > > kernel. This bug highly depends on timing and thus too difficult to bisect. > > This bug seems to

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-15 Thread Tetsuo Handa
Linus Torvalds wrote: > On Sun, Jan 14, 2018 at 3:54 AM, Tetsuo Handa > wrote: > > This memory corruption bug occurs even on CONFIG_SMP=n CONFIG_PREEMPT_NONE=y > > kernel. This bug highly depends on timing and thus too difficult to bisect. > > This bug seems to exist at least since Linux 4.8