Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-08 Thread Vlastimil Babka
On 06/03/2016 05:10 PM, Andrea Arcangeli wrote: Hello Michal, CC'ed Hugh, On Fri, Jun 03, 2016 at 04:46:00PM +0200, Michal Hocko wrote: What do you think about the external dependencies mentioned above. Do you think this is a sufficient argument wrt. occasional higher latencies? It's a trade

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-07 Thread Michal Hocko
On Fri 03-06-16 17:10:01, Andrea Arcangeli wrote: > Hello Michal, > > CC'ed Hugh, > > On Fri, Jun 03, 2016 at 04:46:00PM +0200, Michal Hocko wrote: > > What do you think about the external dependencies mentioned above. Do > > you think this is a sufficient argument wrt. occasional higher > > late

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-06 Thread Michal Hocko
On Sat 04-06-16 16:51:14, Sergey Senozhatsky wrote: > Hello, > > On (06/03/16 15:49), Michal Hocko wrote: > > __khugepaged_exit is called during the final __mmput and it employs a > > complex synchronization dances to make sure it doesn't race with the > > khugepaged which might be scanning this m

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-04 Thread Sergey Senozhatsky
Hello, On (06/03/16 15:49), Michal Hocko wrote: > __khugepaged_exit is called during the final __mmput and it employs a > complex synchronization dances to make sure it doesn't race with the > khugepaged which might be scanning this mm at the same time. This is > all caused by the fact that khugep

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Andrea Arcangeli
Hello Michal, CC'ed Hugh, On Fri, Jun 03, 2016 at 04:46:00PM +0200, Michal Hocko wrote: > What do you think about the external dependencies mentioned above. Do > you think this is a sufficient argument wrt. occasional higher > latencies? It's a tradeoff and both latencies would be short and unco

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Michal Hocko
On Fri 03-06-16 15:51:54, Andrea Arcangeli wrote: > On Thu, Jun 02, 2016 at 02:21:10PM +0200, Michal Hocko wrote: > > Testing with the patch makes some sense as well, but I would like to > > hear from Andrea whether the approach is good because I am wondering why > > he hasn't done that before - it

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Andrea Arcangeli
On Thu, Jun 02, 2016 at 02:21:10PM +0200, Michal Hocko wrote: > Testing with the patch makes some sense as well, but I would like to > hear from Andrea whether the approach is good because I am wondering why > he hasn't done that before - it feels so much simpler than the current > code. The down_

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Michal Hocko
On Fri 03-06-16 15:45:09, Michal Hocko wrote: > On Fri 03-06-16 22:38:13, Sergey Senozhatsky wrote: > [...] > > Michal, I'll try to test during the weekend (away from the affected box > > now), but in the worst case it can as late as next Thursday (gonna travel > > next week). > > No problem. I wo

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Michal Hocko
On Fri 03-06-16 22:38:13, Sergey Senozhatsky wrote: [...] > Michal, I'll try to test during the weekend (away from the affected box > now), but in the worst case it can as late as next Thursday (gonna travel > next week). No problem. I would really like to hear from Andrea before we give this a se

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Sergey Senozhatsky
On (06/03/16 12:05), Michal Hocko wrote: > > > RIP collect_mm_slot() + 0x42/0x84 > > > khugepaged > > > > So is this really collect_mm_slot called directly from khugepaged or is > > some inlining going on there? inlining I suppose. > > > prepare_to_wait_event > > > maybe_pmd_mkwrite > > >

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Michal Hocko
On Fri 03-06-16 11:55:49, Michal Hocko wrote: > On Fri 03-06-16 17:43:47, Sergey Senozhatsky wrote: > > On (06/03/16 09:25), Michal Hocko wrote: > > > > it's quite hard to trigger the bug (somehow), so I can't > > > > follow up with more information as of now. > > > > either I did something very s

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Michal Hocko
On Fri 03-06-16 17:43:47, Sergey Senozhatsky wrote: > On (06/03/16 09:25), Michal Hocko wrote: > > > it's quite hard to trigger the bug (somehow), so I can't > > > follow up with more information as of now. > > either I did something very silly fixing up the patch, or the > patch may be causing ge

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Sergey Senozhatsky
On (06/03/16 09:25), Michal Hocko wrote: > > it's quite hard to trigger the bug (somehow), so I can't > > follow up with more information as of now. either I did something very silly fixing up the patch, or the patch may be causing general protection faults on my system. RIP collect_mm_slot() + 0

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Michal Hocko
On Fri 03-06-16 16:15:51, Sergey Senozhatsky wrote: > Hello, > > On (06/02/16 11:21), Michal Hocko wrote: > [..] > > @@ -2863,6 +2854,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned > > int pages, > > > > collect_mm_slot(mm_slot); > > } > > + mmput(mm); > > > >

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-03 Thread Sergey Senozhatsky
Hello, On (06/02/16 11:21), Michal Hocko wrote: [..] > @@ -2863,6 +2854,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned > int pages, > > collect_mm_slot(mm_slot); > } > + mmput(mm); > > return progress; > } this possibly sleeping mmput() is called fro

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-02 Thread Sergey Senozhatsky
On (06/03/16 10:29), Sergey Senozhatsky wrote: > > if (allocstall == curr_allocstall && swap != 0) { > > if (!__collapse_huge_page_swapin(mm, vma, address, pmd)) { > > { > > : if (ret & VM_FAULT_RETRY) { > > :

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-02 Thread Sergey Senozhatsky
On (06/03/16 10:00), Sergey Senozhatsky wrote: > a good find by Vlastimil. > > Ebru, can you also re-visit __collapse_huge_page_swapin()? it's called > from collapse_huge_page() under the down_read(&mm->mmap_sem), is there > any reason to do the nested down_read(&mm->mmap_sem)? > > collapse_huge_

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-02 Thread Sergey Senozhatsky
On (06/02/16 21:58), Ebru Akagunduz wrote: [..] > > I think it's this patch: > > > > http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-make-swapin-readahead-under-down_read-of-mmap_sem.patch > > > > Some parts of the code in collapse_huge_page() that were under > > down_write(mmap_sem) are under do

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-02 Thread Ebru Akagunduz
On Thu, Jun 02, 2016 at 03:24:05PM +0200, Vlastimil Babka wrote: > [+CC's] > > On 06/02/2016 03:48 AM, Sergey Senozhatsky wrote: > >On (06/01/16 13:11), Stephen Rothwell wrote: > >>Hi all, > >> > >>Changes since 20160531: > >> > >>My fixes tree contains: > >> > >> of: silence warnings due to max(

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-02 Thread Vlastimil Babka
[+CC's] On 06/02/2016 03:48 AM, Sergey Senozhatsky wrote: On (06/01/16 13:11), Stephen Rothwell wrote: Hi all, Changes since 20160531: My fixes tree contains: of: silence warnings due to max() usage The arm tree gained a conflict against Linus' tree. Non-merge commits (relative to Linus'

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-02 Thread Michal Hocko
On Thu 02-06-16 21:08:57, Sergey Senozhatsky wrote: > Hello Michal, > > On (06/02/16 11:21), Michal Hocko wrote: > [..] > > > [ 2856.323052] INFO: task cc1:4582 blocked for more than 21 seconds. > > > [ 2856.323055] Not tainted > > > 4.7.0-rc1-next-20160601-dbg-00012-g52c180e-dirty #453 > >

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-02 Thread Sergey Senozhatsky
Hello Michal, On (06/02/16 11:21), Michal Hocko wrote: [..] > > [ 2856.323052] INFO: task cc1:4582 blocked for more than 21 seconds. > > [ 2856.323055] Not tainted > > 4.7.0-rc1-next-20160601-dbg-00012-g52c180e-dirty #453 > > [ 2856.323056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"

Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-02 Thread Michal Hocko
[CCing Andrea] On Thu 02-06-16 10:48:35, Sergey Senozhatsky wrote: > On (06/01/16 13:11), Stephen Rothwell wrote: > > Hi all, > > > > Changes since 20160531: > > > > My fixes tree contains: > > > > of: silence warnings due to max() usage > > > > The arm tree gained a conflict against Linus'

[linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup

2016-06-01 Thread Sergey Senozhatsky
On (06/01/16 13:11), Stephen Rothwell wrote: > Hi all, > > Changes since 20160531: > > My fixes tree contains: > > of: silence warnings due to max() usage > > The arm tree gained a conflict against Linus' tree. > > Non-merge commits (relative to Linus' tree): 1100 > 936 files changed, 38159