Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-23 Thread Benjamin Herrenschmidt
> Thanks for the warning, Ben, but I don't see a problem there: that's > in your separate ioremap_mm, which is rather like init_mm, and won't > ever go through exit_mmap, and doesn't need its page tables freed - > isn't that right? Right. > But it was worth auditing the different architectures f

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-23 Thread Hugh Dickins
On Wed, 23 Mar 2005, Benjamin Herrenschmidt wrote: > On Tue, 2005-03-22 at 16:37 +, Hugh Dickins wrote: > > > > I cannot see those arches doing pte_allocs outside their vmas, > > that of course could cause it. And nr_ptes is initialized to 0 > > once by memset and again by assignment, so it s

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Wed, 23 Mar 2005 13:10:42 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > The ugly thing you get with an inclusive ceiling is that your masking > becomes more difficult I think. Good point. > I might try to attack this from another angle and see if I can come up > with something. Great, let m

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Nick Piggin
David S. Miller wrote: On Tue, 22 Mar 2005 17:10:13 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: Hugh Dickins <[EMAIL PROTECTED]> wrote: On Tue, 22 Mar 2005, Luck, Tony wrote: > > But I'm still confused by all the math on addr/end at each > level. You think the rest of us are not ;-? umm, give

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Wed, 23 Mar 2005 00:51:02 + (GMT) Hugh Dickins <[EMAIL PROTECTED]> wrote: > This actual example helped to focus my mind a lot, thank you. No problem, I needed to work through specific examples to see things clearly too. > > and things seem to behave. I'll try to analyze things > > furthe

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Tue, 22 Mar 2005 17:10:13 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > Hugh Dickins <[EMAIL PROTECTED]> wrote: > > > > On Tue, 22 Mar 2005, Luck, Tony wrote: > > > > > > But I'm still confused by all the math on addr/end at each > > > level. > > > > You think the rest of us are not ;-

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Andrew Morton
Hugh Dickins <[EMAIL PROTECTED]> wrote: > > On Tue, 22 Mar 2005, Luck, Tony wrote: > > > > But I'm still confused by all the math on addr/end at each > > level. > > You think the rest of us are not ;-? umm, given the difficulty which you guys are having with this, I get a bit worried about c

RE: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Hugh Dickins
On Tue, 22 Mar 2005, Luck, Tony wrote: > > But I'm still confused by all the math on addr/end at each > level. You think the rest of us are not ;-? > Rounding up/down at each level should presumably be > based on the size of objects at the next level. So the pgd > code should round using PUD_MA

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Hugh Dickins
On Tue, 22 Mar 2005, David S. Miller wrote: > On Tue, 22 Mar 2005 21:51:39 + (GMT) > Hugh Dickins <[EMAIL PROTECTED]> wrote: > > > I still can't see what's wrong with the code that's already > > there. My brain is seizing up, I'm taking a break. > > Ok, meanwhile I'll do a brain dump of what

RE: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Hugh Dickins
On Tue, 22 Mar 2005, Luck, Tony wrote: > > Alternatively you could modify the use of floor/ceiling as they > are passed down from the top level to indicate the progressively > greater address ranges that have been dealt with ... but I'm not > completely convinced that gives you enough information.

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Wed, 23 Mar 2005 11:19:38 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > > dramatically, shell performance is way down on sparc64. > > I'll post before and after numbers in a bit. Note, this is > > just with Hugh's base patch plus bug fixes. > > > > That's interesting. The only "extra" stuff

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Nick Piggin
David S. Miller wrote: On Wed, 23 Mar 2005 10:32:10 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: I think David's on the right track - I think there's something a bit wrong at the top. In my reply to Andrew in this thread I posted a patch which may at least get things working... We have to do the

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
Ok, this patch, on top of Hugh's original freepgt patch, gets me a working system. It includes Hugh's bug fix, plus the ceiling masking roll-over fix of mine. It should get ppc working too, I bet. --- mm/memory.c.hugh2005-03-22 16:01:07.0 -0800 +++ mm/memory.c 2005-03-22 16:00:08.00

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
Ok, here are (finally, I've been debugging this so much purely to see these things) some lmbench numbers with Hugh's base patch on sparc64. Ignore my previous comments about shell performance getting worse, it's some difference that makes things run more slowly in single user mode compared to a f

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Tue, 22 Mar 2005 15:53:08 -0800 "Luck, Tony" <[EMAIL PROTECTED]> wrote: > But I'm still confused by all the math on addr/end at each > level. Rounding up/down at each level should presumably be > based on the size of objects at the next level. So the pgd > code should round using PUD_MASK, pu

RE: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Luck, Tony
>How it works is that it knows the extent in each direction >where mappings do not exist. > >Once we know we have a clear span up to the next PMD_SIZE >modulo (and PUD_SIZE and so on and so forth) we know we >can liberate the page table chunks covered by such ranges. Ok ... I see that now (I was m

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Wed, 23 Mar 2005 10:32:10 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > I think David's on the right track - I think there's something a > bit wrong at the top. In my reply to Andrew in this thread I > posted a patch which may at least get things working... We have to do the "if (ceiling)" ch

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Benjamin Herrenschmidt
On Tue, 2005-03-22 at 12:21 -0800, David S. Miller wrote: > On Tue, 22 Mar 2005 19:36:46 + (GMT) > Hugh Dickins <[EMAIL PROTECTED]> wrote: > > > I notice that although both i386 and sparc64 use pgtable-nopud.h, the > > i386 pud_clear does nothing at all and the sparc64 pud_clear resets to 0. >

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Nick Piggin
Hugh Dickins wrote: On Tue, 22 Mar 2005, David S. Miller wrote: On Tue, 22 Mar 2005 19:36:46 + (GMT) Hugh Dickins <[EMAIL PROTECTED]> wrote: I notice that although both i386 and sparc64 use pgtable-nopud.h, the i386 pud_clear does nothing at all and the sparc64 pud_clear resets to 0. This was

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Tue, 22 Mar 2005 14:40:55 -0800 "Luck, Tony" <[EMAIL PROTECTED]> wrote: > Then I don't see how we decide when to clear a pointer at each > level. Are there counters of how many entries are active in each > table at all levels (pgd/pud/pmd/pte)? No, there are no counters. How it works is that

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Benjamin Herrenschmidt
On Tue, 2005-03-22 at 16:37 +, Hugh Dickins wrote: > On Tue, 22 Mar 2005, Andrew Morton wrote: > > > > With these six patches the ppc64 is hitting the BUG in exit_mmap(): > > > > BUG_ON(mm->nr_ptes);/* This is just debugging */ > > > > fairly early in boot. > > So ppc64 is in th

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Tue, 22 Mar 2005 21:51:39 + (GMT) Hugh Dickins <[EMAIL PROTECTED]> wrote: > I still can't see what's wrong with the code that's already > there. My brain is seizing up, I'm taking a break. Ok, meanwhile I'll do a brain dump of what I think this code should be doing. Let's take an example

RE: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Luck, Tony
>> be changed to use pgd_addr_end() to gather up all the vma that >> are mapped by a single pgd instead of just spanning out the next >> PMD_SIZE? > >Oh, I don't think so. I suppose it could be done at this level, >but then the lower levels would go back to searching through lots >of unnecessary c

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Hugh Dickins
On Tue, 22 Mar 2005, David S. Miller wrote: > On Tue, 22 Mar 2005 19:36:46 + (GMT) > Hugh Dickins <[EMAIL PROTECTED]> wrote: > > > I notice that although both i386 and sparc64 use pgtable-nopud.h, the > > i386 pud_clear does nothing at all and the sparc64 pud_clear resets to 0. > > This was a

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
Hugh, I got tired of rebooting just to get address walking traces :-) So I wrote a little simulator. Basically, it's free_pgtables() with the page table pointer stuff ripped out. You run it like this: ./simulator vma_file Where vma_file is a text file composed of lines of the form: START END

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Tue, 22 Mar 2005 19:36:46 + (GMT) Hugh Dickins <[EMAIL PROTECTED]> wrote: > I notice that although both i386 and sparc64 use pgtable-nopud.h, the > i386 pud_clear does nothing at all and the sparc64 pud_clear resets to 0. Aha! And ppc does as well via asm-generic/4level-fixup.h which is p

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Tue, 22 Mar 2005 19:36:46 + (GMT) Hugh Dickins <[EMAIL PROTECTED]> wrote: > I notice that although both i386 and sparc64 use pgtable-nopud.h, the > i386 pud_clear does nothing at all and the sparc64 pud_clear resets to 0. This was a dead end. I386 doesn't do anything with pud_clear() in o

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Hugh Dickins
On Tue, 22 Mar 2005, David S. Miller wrote: > On Tue, 22 Mar 2005 11:21:25 -0800 > "David S. Miller" <[EMAIL PROTECTED]> wrote: > > > I'm trying to analyze my traces some more. > > I think I see what's going wrong. On the first > address space traversal (free_pgd_range()), we > clear out the pgd

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Tue, 22 Mar 2005 11:21:25 -0800 "David S. Miller" <[EMAIL PROTECTED]> wrote: > I'm trying to analyze my traces some more. I think I see what's going wrong. On the first address space traversal (free_pgd_range()), we clear out the pgd, even though there are still more PMD's to process in that

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Tue, 22 Mar 2005 11:01:44 -0800 "David S. Miller" <[EMAIL PROTECTED]> wrote: > Hmmm... Thinking some more, one thing that is unique in the PPC64 and SPARC64 cases is that we are executing primarily 32-bit tasks and in such cases one PGD maps the entire address space. I wonder if the free_pgta

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
Ok, here is a full dump of a free_pgtables() run that fails to clear out all the PMD's. It gets called with this VMA list (each entry is a vm_start/vm_end tuple) [0x0001:0x000a4000] [0x000b2000:0x000b8000] [0x000b8000:0x000de000] [0x7000:0x7001a000] [0x70028000:0x7002a000] [0x7002c000:0x

RE: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Hugh Dickins
On Tue, 22 Mar 2005, Luck, Tony wrote: > >> For example, you may have a single page (start,end) address range > >> to free, but if this is enclosed by a large enough (floor,ceiling) > >> then it may free an entire pgd entry. > >> > >> I assume the intention of the API would be to provide the full

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Tue, 22 Mar 2005 16:37:09 + (GMT) Hugh Dickins <[EMAIL PROTECTED]> wrote: > If you and David could try the lame patch below, > it'll at least give us a slight clue of where to be looking - > every mm exiting with nr_ptes 1 means something different from > every mm exiting with nr_ptes -1 me

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Tue, 22 Mar 2005 15:14:54 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > For example, you may have a single page (start,end) address range > to free, but if this is enclosed by a large enough (floor,ceiling) > then it may free an entire pgd entry. > > I assume the intention of the API would be

RE: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Luck, Tony
>> For example, you may have a single page (start,end) address range >> to free, but if this is enclosed by a large enough (floor,ceiling) >> then it may free an entire pgd entry. >> >> I assume the intention of the API would be to provide the full >> pgd width in that case? > >Yes, that is what s

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Tue, 22 Mar 2005 06:08:38 + (GMT) Hugh Dickins <[EMAIL PROTECTED]> wrote: > > It just wants the range of page tables liberated. I guess > > essentially PMD_SIZE is the granularity. > > I _think_ that answer means that my current code is fine in this respect. > But I'm not entirely convinc

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread David S. Miller
On Tue, 22 Mar 2005 05:47:13 + (GMT) Hugh Dickins <[EMAIL PROTECTED]> wrote: > > 1) start --> end straddles sparc64 address space hole > > That's an interesting remark. I hadn't noticed the signed long type. > I believe the vma gathering in free_pgtables will have no problem with > that, but

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Hugh Dickins
On Tue, 22 Mar 2005, Andrew Morton wrote: > > With these six patches the ppc64 is hitting the BUG in exit_mmap(): > > BUG_ON(mm->nr_ptes);/* This is just debugging */ > > fairly early in boot. So ppc64 is in the same boat as sparc64 (yet ia64 okay so far). Sorry, I'm still clueless

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Nick Piggin
Andrew Morton wrote: With these six patches the ppc64 is hitting the BUG in exit_mmap(): BUG_ON(mm->nr_ptes);/* This is just debugging */ fairly early in boot. No doubt Hugh will have this fixed before long... but if you have time to spare, you may just try hitting it on the head and ma

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Andrew Morton
With these six patches the ppc64 is hitting the BUG in exit_mmap(): BUG_ON(mm->nr_ptes);/* This is just debugging */ fairly early in boot. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at h

RE: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-21 Thread Luck, Tony
Builds clean and boots on ia64. I haven't tried any hugetlb operations on it though. -Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-21 Thread Nick Piggin
Hugh Dickins wrote: On Mon, 21 Mar 2005, David S. Miller wrote: On Tue, 22 Mar 2005 15:14:54 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: Question, Dave: flush_tlb_pgtables after Hugh's patch is also possibly not being called with enough range to cover all page tables that have been freed. Good q

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-21 Thread Hugh Dickins
On Mon, 21 Mar 2005, David S. Miller wrote: > On Tue, 22 Mar 2005 15:14:54 +1100 > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > Question, Dave: flush_tlb_pgtables after Hugh's patch is also > > possibly not being called with enough range to cover all page > > tables that have been freed. Good que

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-21 Thread Hugh Dickins
On Mon, 21 Mar 2005, David S. Miller wrote: > > flush_tlb_pgtables() on sparc64 has a BUG() check which > is basically: > > BUG((long)start > (long)end); > > This catches two cases of bogus arguments: > > 1) start --> end straddles sparc64 address space hole That's an interesting remark.

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-21 Thread Hugh Dickins
Many thanks for the testing. On Mon, 21 Mar 2005, David S. Miller wrote: > > This adjustment of addr relative to floor is very > strange, it can advance "addr" (and thus "start") > past the end of the VMA we are unmapping. Not strange, it's just trying to skip a pointless iteration. > In fact,

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-21 Thread David S. Miller
On Tue, 22 Mar 2005 15:14:54 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > Question, Dave: flush_tlb_pgtables after Hugh's patch is also > possibly not being called with enough range to cover all page > tables that have been freed. > > For example, you may have a single page (start,end) address

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-21 Thread Nick Piggin
On Mon, 2005-03-21 at 15:02 -0800, David S. Miller wrote: > Anyways, there's the full analysis, what do you make > of this Hugh? :-) Impressive, and my name isn't even Hugh. Question, Dave: flush_tlb_pgtables after Hugh's patch is also possibly not being called with enough range to cover all pag

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-21 Thread David S. Miller
On Mon, 21 Mar 2005 14:31:36 -0800 "Luck, Tony" <[EMAIL PROTECTED]> wrote: > Builds clean and boots on ia64. > > I haven't tried any hugetlb operations on it though. It works on ia64 because it doesn't actually do anything in flush_tlb_pgtables(), I bet. Hugh, I know the exact trigger case, it'

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-21 Thread David S. Miller
On Mon, 21 Mar 2005 20:52:44 + (GMT) Hugh Dickins <[EMAIL PROTECTED]> wrote: Hugh, I'm getting some problems on sparc64 here: > +static inline void free_pgd_range(struct mmu_gather *tlb, > + unsigned long addr, unsigned long end, > + unsigned long floor