Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-06-20 Thread Stanislav Meduna
On 19.06.2013 10:06, Peter Zijlstra wrote: >> On 19.06.2013 07:20, Linus Torvalds wrote: >>> There's the fast_tlb race that Peter fixed in commit 29eb77825cc7 >>> ("arch, mm: Remove tlb_fast_mode()"). I'm not seeing how it would >>> cause infinite TLB faults, but it definitely causes potentially >

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-06-19 Thread Peter Zijlstra
On Wed, Jun 19, 2013 at 09:36:39AM +0200, Stanislav Meduna wrote: > On 19.06.2013 07:20, Linus Torvalds wrote: > > >> No crash in 2 days running with preempt none... > > > > Is this UP? > > Yes it is. > > > There's the fast_tlb race that Peter fixed in commit 29eb77825cc7 > > ("arch, mm: Remove

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-06-19 Thread Stanislav Meduna
On 19.06.2013 07:20, Linus Torvalds wrote: >> No crash in 2 days running with preempt none... > > Is this UP? Yes it is. > There's the fast_tlb race that Peter fixed in commit 29eb77825cc7 > ("arch, mm: Remove tlb_fast_mode()"). I'm not seeing how it would > cause infinite TLB faults, but it de

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-06-18 Thread Linus Torvalds
On Tue, Jun 18, 2013 at 9:13 AM, Stanislav Meduna wrote: > > No crash in 2 days running with preempt none... Is this UP? There's the fast_tlb race that Peter fixed in commit 29eb77825cc7 ("arch, mm: Remove tlb_fast_mode()"). I'm not seeing how it would cause infinite TLB faults, but it definitel

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-06-18 Thread Stanislav Meduna
On 16.06.2013 23:34, Stanislav Meduna wrote: > Right now a test with the same kernel with preempt none > is running to see whether the problem also happens with this > application there (due to the timing sensitivity only a positive > result has a significance). No crash in 2 days running with pr

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-06-16 Thread Stanislav Meduna
Hi all, I was able to reproduce the page fault problem with a relatively simple application, for now on the Geode platform. It can be downloaded at http://www.meduna.org/tmp/PageFault.tar.gz Basically the test application does: - 4 threads that do nothing but periodically sleep - 1 thread loo

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-24 Thread Stanislav Meduna
On 24.05.2013 15:55, Stanislav Meduna wrote: >> Just to rule something out, are you using >> transparent huge pages on those systems? > > On my present test system they are configured in, but I am > not using them. Ah, _transparent_ huge pages. No, that is not enabled. --

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-24 Thread Stanislav Meduna
On 24.05.2013 15:06, Rik van Riel wrote: > Just to rule something out, are you using > transparent huge pages on those systems? On my present test system they are configured in, but I am not using them. # cat /proc/meminfo | grep Huge HugePages_Total: 0 HugePages_Free:0 HugePages_R

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-24 Thread Rik van Riel
On 05/24/2013 04:29 AM, Stanislav Meduna wrote: On 23.05.2013 14:19, Rik van Riel wrote: static inline void __native_flush_tlb_single(unsigned long addr) { __flush_tlb(); } I will give it some more testing time. That is a good idea. Still no crash, so this one indeed seems to c

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-24 Thread Stanislav Meduna
On 24.05.2013 10:29, Stanislav Meduna wrote: static inline void __native_flush_tlb_single(unsigned long addr) { __flush_tlb(); } >> >>> I will give it some more testing time. >> >> That is a good idea. > > Still no crash, so this one indeed seems to change things. Ta

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-24 Thread Stanislav Meduna
On 23.05.2013 14:19, Rik van Riel wrote: >>> static inline void __native_flush_tlb_single(unsigned long addr) >>> { >>> __flush_tlb(); >>> } > >> I will give it some more testing time. > > That is a good idea. Still no crash, so this one indeed seems to change things. If I understand

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-23 Thread H. Peter Anvin
On 05/23/2013 10:36 AM, Steven Rostedt wrote: > On Thu, 2013-05-23 at 10:24 -0700, H. Peter Anvin wrote: >> On 05/23/2013 08:27 AM, Steven Rostedt wrote: >>> On Thu, 2013-05-23 at 08:06 -0700, H. Peter Anvin wrote: >>> We don't even need the jump_label infrastructure -- we have static_cpu

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-23 Thread Steven Rostedt
On Thu, 2013-05-23 at 10:24 -0700, H. Peter Anvin wrote: > On 05/23/2013 08:27 AM, Steven Rostedt wrote: > > On Thu, 2013-05-23 at 08:06 -0700, H. Peter Anvin wrote: > > > >> We don't even need the jump_label infrastructure -- we have > >> static_cpu_has*() which actually predates jump_label altho

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-23 Thread H. Peter Anvin
On 05/23/2013 08:27 AM, Steven Rostedt wrote: > On Thu, 2013-05-23 at 08:06 -0700, H. Peter Anvin wrote: > >> We don't even need the jump_label infrastructure -- we have >> static_cpu_has*() which actually predates jump_label although it uses >> the same underlying ideas. > > Ah right. I wonder i

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-23 Thread Steven Rostedt
On Thu, 2013-05-23 at 08:06 -0700, H. Peter Anvin wrote: > We don't even need the jump_label infrastructure -- we have > static_cpu_has*() which actually predates jump_label although it uses > the same underlying ideas. Ah right. I wonder if it would be worth consolidating a lot of these "modifyi

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-23 Thread H. Peter Anvin
On 05/23/2013 06:29 AM, Steven Rostedt wrote: > On Thu, 2013-05-23 at 08:19 -0400, Rik van Riel wrote: > >> We can add a bit in the architecture bits that >> we use to check against other CPU and system >> errata, and conditionally flush the whole TLB >> from __native_flush_tlb_single(). > > If w

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-23 Thread Stanislav Meduna
On 23.05.2013 16:50, Linus Torvalds wrote: > Another question: I'm assuming this is all 32-bit, is it with PAE > enabled? That changes some of the TLB flushing, and we had one bug > related to that, maybe there are others.. 32 bit, no PAE. -- Stano --

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-23 Thread Linus Torvalds
On Thu, May 23, 2013 at 7:45 AM, Linus Torvalds wrote: > > Page faults that don't cause us to map a page (ie a spurious one, or > one that just updates dirty/accessed bits) don't show up as even minor > faults. Thing of the major/minor as "mapping activity" not a page > fault count. Actually, I t

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-23 Thread Linus Torvalds
On Thu, May 23, 2013 at 1:07 AM, Stanislav Meduna wrote: > > It did not crash overnight, but it also does not show any > minor fault counted for the threads Page faults that don't cause us to map a page (ie a spurious one, or one that just updates dirty/accessed bits) don't show up as even minor

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-23 Thread Steven Rostedt
On Thu, 2013-05-23 at 08:19 -0400, Rik van Riel wrote: > We can add a bit in the architecture bits that > we use to check against other CPU and system > errata, and conditionally flush the whole TLB > from __native_flush_tlb_single(). If we find that some CPUs have issues and others do not, and w

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-23 Thread Rik van Riel
On 05/23/2013 04:07 AM, Stanislav Meduna wrote: On 22.05.2013 20:43, Rik van Riel wrote: Some CPUs have had errata when it comes to flushing large pages that have been split into small pages by hardware, e.g. due to MTRR conflicts. In that case, fragments of the large page may have been left i

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-23 Thread Stanislav Meduna
On 22.05.2013 20:43, Rik van Riel wrote: >> Some CPUs have had errata when it comes to flushing large pages that >> have been split into small pages by hardware, e.g. due to MTRR >> conflicts. In that case, fragments of the large page may have been left >> in the TLB. Can I somehow find if this

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-22 Thread Stanislav Meduna
On 22.05.2013 20:35, Rik van Riel wrote: > I'm stumped. > > If the Geode knows how to flush single TLB entries, it > should do that when flush_tlb_page is called. > > If it does not know, it should throw an invalid instruction > exception, and not quietly complete the instruction without > doing

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-22 Thread Rik van Riel
On 05/22/2013 02:42 PM, H. Peter Anvin wrote: On 05/22/2013 11:35 AM, Rik van Riel wrote: On 05/22/2013 02:21 PM, Stanislav Meduna wrote: On 22.05.2013 20:11, Steven Rostedt wrote: Did you apply both patches? Without the first one, this one is meaningless. Sure. BTW, back when I tried to p

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-22 Thread H. Peter Anvin
On 05/22/2013 11:35 AM, Rik van Riel wrote: > On 05/22/2013 02:21 PM, Stanislav Meduna wrote: >> On 22.05.2013 20:11, Steven Rostedt wrote: >> >>> Did you apply both patches? Without the first one, this one is >>> meaningless. >> >> Sure. >> >> BTW, back when I tried to pinpoint it I also tried add

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-22 Thread Rik van Riel
On 05/22/2013 02:21 PM, Stanislav Meduna wrote: On 22.05.2013 20:11, Steven Rostedt wrote: Did you apply both patches? Without the first one, this one is meaningless. Sure. BTW, back when I tried to pinpoint it I also tried adding flush_tlb_page(vma, address) at the beginning of handle_pt

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-22 Thread Stanislav Meduna
On 22.05.2013 20:11, Steven Rostedt wrote: > Did you apply both patches? Without the first one, this one is > meaningless. Sure. BTW, back when I tried to pinpoint it I also tried adding flush_tlb_page(vma, address) at the beginning of handle_pte_fault, which as I read should be basically the

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-22 Thread Steven Rostedt
On Wed, 2013-05-22 at 20:04 +0200, Stanislav Meduna wrote: > On 22.05.2013 19:41, Rik van Riel wrote: > > >> I think you should also remove the > >> > >> if (flags & FAULT_FLAG_WRITE) > > Done > > >>> Can you test the attached patch? > > Nope. Fails with the same symptoms, min_flt skyro

Re: [PATCH] mm: fix up a spurious page fault whenever it happens

2013-05-22 Thread Stanislav Meduna
On 22.05.2013 19:41, Rik van Riel wrote: >> I think you should also remove the >> >> if (flags & FAULT_FLAG_WRITE) Done >>> Can you test the attached patch? Nope. Fails with the same symptoms, min_flt skyrockets, the throttler activates and after 2 seconds all is well again. This is on