Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-06 Thread Ingo Molnar
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > i've attached an updated version of trace-it.c, which will turn this > > off itself, using a sysctl. I also made WAKEUP_TIMING default-off. > > ok. http://userweb.kernel.org/~akpm/to-ingo.txt is the trace of > > taskset -c 0 ./jakubs-test-a

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-06 Thread Andrew Morton
On Fri, 6 Apr 2007 11:08:22 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote: > * Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > getting a good trace of it is easy: pick up the latest -rt kernel > > > from: > > > > > > http://redhat.com/~mingo/realtime-preempt/ > > > > > > enable EVENT_TRACING

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-06 Thread Ingo Molnar
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > getting a good trace of it is easy: pick up the latest -rt kernel > > from: > > > > http://redhat.com/~mingo/realtime-preempt/ > > > > enable EVENT_TRACING in that kernel, run the workload and do: > > > > scripts/trace-it > to-ingo.txt >

Re: missing madvise functionality

2007-04-05 Thread Nick Piggin
Ulrich Drepper wrote: Nick Piggin wrote: Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's kernels using down_write(mmap_sem) for MADV_DONTNEED is better than mmap/mprotect, which have more fundamental locking requirements, more overhead and no benefits (except debugging, I

Re: missing madvise functionality

2007-04-05 Thread Ulrich Drepper
Nick Piggin wrote: > Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's > kernels using down_write(mmap_sem) for MADV_DONTNEED is better than > mmap/mprotect, which have more fundamental locking requirements, more > overhead and no benefits (except debugging, I suppose). It's a

Re: missing madvise functionality

2007-04-05 Thread Nick Piggin
Ulrich Drepper wrote: In case somebody wants to play around with Rik patch or another madvise-based patch, I have x86-64 glibc binaries which can use it: http://people.redhat.com/drepper/rpms These are based on the latest Fedora rawhide version. They should work on older systems, too, but yo

Re: missing madvise functionality

2007-04-05 Thread Nick Piggin
Rik van Riel wrote: Nick Piggin wrote: Oh, also: something like this patch would help out MADV_DONTNEED, as it means it can run concurrently with page faults. I think the locking will work (but needs forward porting). Ironically, your patch decreases throughput on my quad core test system, w

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Andrew Morton wrote: #if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS I wonder which way you're using, and whether using the other way changes things. I'm using the default Fedora config file, which has NR_CPUS defined to 64 and CONFIG_SPLIT_PTLOCK_CPUS to 4, so I am using the split locks. However,

Re: missing madvise functionality

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 14:38:30 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > Nick Piggin wrote: > > > Oh, also: something like this patch would help out MADV_DONTNEED, as it > > means it can run concurrently with page faults. I think the locking will > > work (but needs forward porting). > > Iro

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-05 Thread Andrew Morton
On Thu, 5 Apr 2007 21:11:29 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * David Howells <[EMAIL PROTECTED]> wrote: > > > But short of recording the lock sequence, I don't think there's anyway > > to find out for sure. printk probably won't cut it as a recording > > mechanism because its

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 13:48:58 +0100 David Howells <[EMAIL PROTECTED]> wrote: > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > > What we effectively have is 32 threads on a single CPU all doing > > > > for (ever) { > > down_write() > > up_write() > > down

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-05 Thread Ingo Molnar
* David Howells <[EMAIL PROTECTED]> wrote: > But short of recording the lock sequence, I don't think there's anyway > to find out for sure. printk probably won't cut it as a recording > mechanism because its overheads are too great. getting a good trace of it is easy: pick up the latest -rt k

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Nick Piggin wrote: Oh, also: something like this patch would help out MADV_DONTNEED, as it means it can run concurrently with page faults. I think the locking will work (but needs forward porting). Ironically, your patch decreases throughput on my quad core test system, with Jakub's test case.

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Jakub Jelinek wrote: + /* FIXME: POSIX says that MADV_DONTNEED cannot throw away data. */ case MADV_DONTNEED: + case MADV_FREE: error = madvise_dontneed(vma, prev, start, end); break; I think you should only use the new behavior for madvise M

Re: missing madvise functionality

2007-04-05 Thread Ulrich Drepper
In case somebody wants to play around with Rik patch or another madvise-based patch, I have x86-64 glibc binaries which can use it: http://people.redhat.com/drepper/rpms These are based on the latest Fedora rawhide version. They should work on older systems, too, but you screw up your updates.

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Andrew Morton wrote: On Thu, 05 Apr 2007 03:39:29 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: Rik van Riel wrote: MADV_DONTNEED, unpatched, 1000 loops real0m13.672s user0m1.217s sys 0m45.712s MADV_DONTNEED, with patch, 1000 loops real0m4.169s user0m2.033s sys 0m3

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-05 Thread David Howells
Andrew Morton <[EMAIL PROTECTED]> wrote: > > What we effectively have is 32 threads on a single CPU all doing > > for (ever) { > down_write() > up_write() > down_read() > up_read(); > } That's not quite so. In that test progra

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Eric Dumazet wrote: Could you please add this patch and see if it helps on your machine ? [PATCH] VM : mm_struct's mmap_cache should be close to mmap_sem Avoids cache line dirtying I could, but I already know it's not going to help much. How do I know this? I already have 66% idle time whe

Re: missing madvise functionality

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 03:39:29 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > Rik van Riel wrote: > > > MADV_DONTNEED, unpatched, 1000 loops > > > > real0m13.672s > > user0m1.217s > > sys 0m45.712s > > > > > > MADV_DONTNEED, with patch, 1000 loops > > > > real0m4.169s > > user

Re: missing madvise functionality

2007-04-05 Thread Eric Dumazet
On Thu, 05 Apr 2007 03:31:24 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > Jakub Jelinek wrote: > > > My guess is that all the page zeroing is pretty expensive as well and > > takes significant time, but I haven't profiled it. > > With the attached patch (Andrew, I'll change the details around

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Rik van Riel wrote: MADV_DONTNEED, unpatched, 1000 loops real0m13.672s user0m1.217s sys 0m45.712s MADV_DONTNEED, with patch, 1000 loops real0m4.169s user0m2.033s sys 0m3.224s I just noticed something fun with these numbers. Without the patch, the system (a quad cor

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Jakub Jelinek wrote: My guess is that all the page zeroing is pretty expensive as well and takes significant time, but I haven't profiled it. With the attached patch (Andrew, I'll change the details around if you want - I just wanted something to test now), your test case run time went down co

Re: missing madvise functionality

2007-04-05 Thread Eric Dumazet
Ulrich Drepper a écrit : Eric Dumazet wrote: Database workload, where the user multi threaded app is constantly accessing GBytes of data, so L2 cache hit is very small. If you want to oprofile it, with say a CPU_CLK_UNHALTED:5000 event, then find_vma() is in the top 5. We did have a workload w

Re: missing madvise functionality

2007-04-05 Thread Jakub Jelinek
On Thu, Apr 05, 2007 at 03:31:24AM -0400, Rik van Riel wrote: > >My guess is that all the page zeroing is pretty expensive as well and > >takes significant time, but I haven't profiled it. > > With the attached patch (Andrew, I'll change the details around > if you want - I just wanted something t

Re: missing madvise functionality

2007-04-05 Thread Eric Dumazet
On Thu, 05 Apr 2007 04:31:55 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > Eric Dumazet wrote: > > > Could you please add this patch and see if it helps on your machine ? > > > > [PATCH] VM : mm_struct's mmap_cache should be close to mmap_sem > > > > Avoids cache line dirtying > > I could, b

Re: missing madvise functionality

2007-04-05 Thread Ulrich Drepper
Eric Dumazet wrote: > Database workload, where the user multi threaded app is constantly > accessing GBytes of data, so L2 cache hit is very small. If you want to > oprofile it, with say a CPU_CLK_UNHALTED:5000 event, then find_vma() is > in the top 5. We did have a workload with lots of Java and

Re: missing madvise functionality

2007-04-05 Thread Eric Dumazet
Nick Piggin a écrit : Eric Dumazet wrote: >> This was not a working patch, just to throw the idea, since the answers I got showed I was not understood. In this case, find_extend_vma() should of course have one struct vm_area_cache * argument, like find_vma() One single cache on one mm is not

Re: missing madvise functionality

2007-04-04 Thread William Lee Irwin III
On Wed, 4 Apr 2007 06:09:18 -0700 William Lee Irwin III <[EMAIL PROTECTED]> wrote: >> Oh dear. On Wed, Apr 04, 2007 at 11:51:05AM -0700, Andrew Morton wrote: > what's all this about? I rewrote Jakub's testcase and included it as a MIME attachment. Current working version inline below. Also at

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Nick Piggin wrote: Jakub Jelinek wrote: On Wed, Apr 04, 2007 at 05:46:12PM +1000, Nick Piggin wrote: Does mmap(PROT_NONE) actually free the memory? Yes. /* Clear old maps */ error = -ENOMEM; munmap_back: vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent)

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Eric Dumazet wrote: On Wed, 04 Apr 2007 20:05:54 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: @@ -1638,7 +1652,7 @@ find_extend_vma(struct mm_struct * mm, u unsigned long start; addr &= PAGE_MASK; - vma = find_vma(mm,addr); + vma = find_vma(mm,addr,¤t->vmacache);

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Hugh Dickins wrote: On Wed, 4 Apr 2007, Rik van Riel wrote: Hugh Dickins wrote: (I didn't understand how Rik would achieve his point 5, _no_ lock contention while repeatedly re-marking these pages, but never mind.) The CPU marks them accessed&dirty when they are reused. The VM only moves

preemption and rwsems (was: Re: missing madvise functionality)

2007-04-04 Thread Andrew Morton
On Tue, 3 Apr 2007 16:29:37 -0400 Jakub Jelinek <[EMAIL PROTECTED]> wrote: > #include > #include > #include > #include > > void * > tf (void *arg) > { > (void) arg; > size_t ps = sysconf (_SC_PAGE_SIZE); > void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE, > MAP_P

Re: missing madvise functionality

2007-04-04 Thread Andrew Morton
On Wed, 04 Apr 2007 14:08:47 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > > There are other ways of doing it - I guess we could use a new page flag to > > indicate that this is one-of-those-pages, and add new code to handle it in > > all the right places. > > That's w

Re: missing madvise functionality

2007-04-04 Thread Andrew Morton
On Wed, 4 Apr 2007 06:09:18 -0700 William Lee Irwin III <[EMAIL PROTECTED]> wrote: > > On Tue, Apr 03, 2007 at 04:29:37PM -0400, Jakub Jelinek wrote: > > void * > > tf (void *arg) > > { > > (void) arg; > > size_t ps = sysconf (_SC_PAGE_SIZE); > > void *p = mmap (NULL, 128 * ps, PROT_READ |

Re: missing madvise functionality

2007-04-04 Thread Anton Blanchard
Hi, > Oh. I was assuming that we'd want to unmap these pages from pagetables and > mark then super-easily-reclaimable. So a later touch would incur a minor > fault. > > But you think that we should leave them mapped into pagetables so no such > fault occurs. That would be very nice. The issue

Re: missing madvise functionality

2007-04-04 Thread Hugh Dickins
On Wed, 4 Apr 2007, Andrew Morton wrote: > > The treatment is identical to clean swapcache pages, with the sole > exception that they don't actually consume any swap space - hence the fake > swapcache entry thing. I see, sneaking through try_to_unmap's anon PageSwapCache assumptions as simply as

Re: missing madvise functionality

2007-04-04 Thread Rik van Riel
Andrew Morton wrote: There are other ways of doing it - I guess we could use a new page flag to indicate that this is one-of-those-pages, and add new code to handle it in all the right places. That's what I did. I'm currently working on the zap_page_range() side of things. One thing which w

Re: missing madvise functionality

2007-04-04 Thread Andrew Morton
On Wed, 4 Apr 2007 10:15:41 +0100 (BST) Hugh Dickins <[EMAIL PROTECTED]> wrote: > On Tue, 3 Apr 2007, Andrew Morton wrote: > > > > All of which indicates that if we can remove the down_write(mmap_sem) from > > this glibc operation, things should get a lot better - there will be no > > additional

Re: missing madvise functionality

2007-04-04 Thread Hugh Dickins
On Wed, 4 Apr 2007, Rik van Riel wrote: > Hugh Dickins wrote: > > > (I didn't understand how Rik would achieve his point 5, _no_ lock > > contention while repeatedly re-marking these pages, but never mind.) > > The CPU marks them accessed&dirty when they are reused. > > The VM only moves the reu

Re: missing madvise functionality

2007-04-04 Thread Rik van Riel
Hugh Dickins wrote: (I didn't understand how Rik would achieve his point 5, _no_ lock contention while repeatedly re-marking these pages, but never mind.) The CPU marks them accessed&dirty when they are reused. The VM only moves the reused pages back to the active list on memory pressure. Th

Re: missing madvise functionality

2007-04-04 Thread Hugh Dickins
On Wed, 4 Apr 2007, Marko Macek wrote: > Ulrich Drepper wrote: > > A solution for this problem is a madvise() operation with the following > > property: > > > > - the content of the address range can be discarded > > > > - if an access to a page in the range happens in the future it must > >

Re: missing madvise functionality

2007-04-04 Thread William Lee Irwin III
On Wed, Apr 04, 2007 at 06:09:18AM -0700, William Lee Irwin III wrote: > for (--i; i >= 0; --i) { > if (pthread_join(th[i], NULL)) { > perror("main: pthread_join failed"); > ret = EXIT_FAILURE; > } > } Obligatory b

Re: missing madvise functionality

2007-04-04 Thread William Lee Irwin III
On Tue, Apr 03, 2007 at 04:29:37PM -0400, Jakub Jelinek wrote: > void * > tf (void *arg) > { > (void) arg; > size_t ps = sysconf (_SC_PAGE_SIZE); > void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE, > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > if (p == MAP_FAILED) > e

Re: missing madvise functionality

2007-04-04 Thread Eric Dumazet
On Wed, 04 Apr 2007 20:05:54 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: > > > @@ -1638,7 +1652,7 @@ find_extend_vma(struct mm_struct * mm, u > > unsigned long start; > > > > addr &= PAGE_MASK; > > - vma = find_vma(mm,addr); > > + vma = find_vma(mm,addr,¤t->vmacache); > > if (!v

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Eric Dumazet wrote: Well, I believe this one is too expensive. I was thinking of a light one : This one seems worse. Passing your vm_area_cache around everywhere, which is just intrusive and dangerous because ot becomes decoupled from the mm struct you are passing around. Watch this: @@ -16

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Eric Dumazet wrote: On Wed, 04 Apr 2007 18:55:18 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: Peter Zijlstra wrote: On Wed, 2007-04-04 at 12:22 +1000, Nick Piggin wrote: Eric Dumazet wrote: I do think such workloads might benefit from a vma_cache not shared by all threads but private

Re: missing madvise functionality

2007-04-04 Thread Eric Dumazet
On Wed, 04 Apr 2007 18:55:18 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: > Peter Zijlstra wrote: > > On Wed, 2007-04-04 at 12:22 +1000, Nick Piggin wrote: > > > >>Eric Dumazet wrote: > > > > > >>>I do think such workloads might benefit from a vma_cache not shared by > >>>all threads but priva

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
William Lee Irwin III wrote: On Wed, Apr 04, 2007 at 06:55:18PM +1000, Nick Piggin wrote: + rcu_read_lock(); + do { + t->vma_cache_sequence = -1; + t = next_thread(t); + } while (t != curr); + rc

Re: missing madvise functionality

2007-04-04 Thread Hugh Dickins
On Tue, 3 Apr 2007, Andrew Morton wrote: > > All of which indicates that if we can remove the down_write(mmap_sem) from > this glibc operation, things should get a lot better - there will be no > additional context switches at all. > > And we can surely do that if all we're doing is looking up pa

Re: missing madvise functionality

2007-04-04 Thread William Lee Irwin III
On Wed, Apr 04, 2007 at 06:55:18PM +1000, Nick Piggin wrote: > + rcu_read_lock(); > + do { > + t->vma_cache_sequence = -1; > + t = next_thread(t); > + } while (t != curr); > + rcu_read_unlock(); LD_ASSUME_KERNE

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Peter Zijlstra wrote: On Wed, 2007-04-04 at 12:22 +1000, Nick Piggin wrote: Eric Dumazet wrote: I do think such workloads might benefit from a vma_cache not shared by all threads but private to each thread. A sequence could invalidate the cache(s). ie instead of a mm->mmap_cache, having

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Jakub Jelinek wrote: On Wed, Apr 04, 2007 at 05:46:12PM +1000, Nick Piggin wrote: Does mmap(PROT_NONE) actually free the memory? Yes. /* Clear old maps */ error = -ENOMEM; munmap_back: vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent); if (vma && v

Re: missing madvise functionality

2007-04-04 Thread Peter Zijlstra
On Wed, 2007-04-04 at 12:22 +1000, Nick Piggin wrote: > Eric Dumazet wrote: > > I do think such workloads might benefit from a vma_cache not shared by > > all threads but private to each thread. A sequence could invalidate the > > cache(s). > > > > ie instead of a mm->mmap_cache, having a mm->s

Re: missing madvise functionality

2007-04-04 Thread Jakub Jelinek
On Wed, Apr 04, 2007 at 05:46:12PM +1000, Nick Piggin wrote: > Does mmap(PROT_NONE) actually free the memory? Yes. /* Clear old maps */ error = -ENOMEM; munmap_back: vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent); if (vma && vma->vm_start < addr + len

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Nick Piggin wrote: Ulrich Drepper wrote: People might remember the thread about mysql not scaling and pointing the finger quite happily at glibc. Well, the situation is not like that. The problem is glibc has to work around kernel limitations. If the malloc implementation detects that a larg

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Ulrich Drepper wrote: People might remember the thread about mysql not scaling and pointing the finger quite happily at glibc. Well, the situation is not like that. The problem is glibc has to work around kernel limitations. If the malloc implementation detects that a large chunk of previously

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Eric Dumazet
On Tue, 03 Apr 2007 23:54:42 -0700 Ulrich Drepper <[EMAIL PROTECTED]> wrote: > Eric Dumazet wrote: > > You were CC on this one, you can find an archive here : > > You cc:ed my gmail account. I don't pick out mails sent to me there. > If you want me to look at something you have to send it to my

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-03 Thread Ulrich Drepper
Eric Dumazet wrote: > You were CC on this one, you can find an archive here : You cc:ed my gmail account. I don't pick out mails sent to me there. If you want me to look at something you have to send it to my @redhat.com address. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain Vi

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-03 Thread Eric Dumazet
Ulrich Drepper a écrit : Nick Piggin wrote: Sad. Although Ulrich did seem interested at one point I think? Ulrich, do you agree at least with the interface that Eric is proposing? I have no idea what you're talking about. You were CC on this one, you can find an archive here : http://lkml.

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-03 Thread Nick Piggin
Ulrich Drepper wrote: Nick Piggin wrote: Sad. Although Ulrich did seem interested at one point I think? Ulrich, do you agree at least with the interface that Eric is proposing? I have no idea what you're talking about. Private futexes. -- SUSE Labs, Novell Inc. - To unsubscribe from this

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-03 Thread Ulrich Drepper
Nick Piggin wrote: > Sad. Although Ulrich did seem interested at one point I think? Ulrich, > do you agree at least with the interface that Eric is proposing? I have no idea what you're talking about. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Desc

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-03 Thread Nick Piggin
(sorry to change the subjet, I was initially going to send the threaded vma cache patches on list, but then decided they didn't have enough changelog!) Andrew Morton wrote: On Wed, 04 Apr 2007 16:09:40 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: Andrew, do you have any objections to putting

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-03 Thread Andrew Morton
On Wed, 04 Apr 2007 16:09:40 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: > Andrew, do you have any objections to putting Eric's fairly > important patch at least into -mm? you know what to do ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to

[patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-03 Thread Nick Piggin
Eric Dumazet wrote: Nick Piggin a écrit : Eric Dumazet wrote: I do think such workloads might benefit from a vma_cache not shared by all threads but private to each thread. A sequence could invalidate the cache(s). ie instead of a mm->mmap_cache, having a mm->sequence, and each thread h

Re: missing madvise functionality

2007-04-03 Thread Eric Dumazet
Nick Piggin a écrit : Eric Dumazet wrote: I do think such workloads might benefit from a vma_cache not shared by all threads but private to each thread. A sequence could invalidate the cache(s). ie instead of a mm->mmap_cache, having a mm->sequence, and each thread having a current->mmap_c

Re: missing madvise functionality

2007-04-03 Thread Rik van Riel
Marko Macek wrote: Ulrich Drepper wrote: A solution for this problem is a madvise() operation with the following property: - the content of the address range can be discarded - if an access to a page in the range happens in the future it must succeed. The old page content can be provi

Re: missing madvise functionality

2007-04-03 Thread Marko Macek
Ulrich Drepper wrote: A solution for this problem is a madvise() operation with the following property: - the content of the address range can be discarded - if an access to a page in the range happens in the future it must succeed. The old page content can be provided or a new, empty

Re: missing madvise functionality

2007-04-03 Thread Nick Piggin
Eric Dumazet wrote: Andrew Morton a écrit : On Tue, 3 Apr 2007 16:29:37 -0400 Jakub Jelinek <[EMAIL PROTECTED]> wrote: On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote: Andrew Morton wrote: Ulrich, could you suggest a little test app which would demonstrate this behaviour?

Re: missing madvise functionality

2007-04-03 Thread Andrew Morton
On Tue, 3 Apr 2007 14:49:48 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote: > > int > > main (void) > > { > > pthread_t th[32]; > > int i; > > for (i = 0; i < 32; i++) > > if (pthread_create (&th[i], NULL, tf, NULL)) > > exit (4); > > for (i = 0; i < 32; i++) > > pthread_join

Re: missing madvise functionality

2007-04-03 Thread Ulrich Drepper
Andi Kleen wrote: > If you know in advance you need them it might be possible to > batch that. e.g. MADV_WILLNEED could be extended to > work on anonymous memory and establish the mappings in the syscall. > Would that be useful? Not in the exact way you think. The problem is that not all pages

Re: missing madvise functionality

2007-04-03 Thread Eric Dumazet
Andrew Morton a écrit : On Tue, 3 Apr 2007 16:29:37 -0400 Jakub Jelinek <[EMAIL PROTECTED]> wrote: On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote: Andrew Morton wrote: Ulrich, could you suggest a little test app which would demonstrate this behaviour? It's not really reliably

Re: missing madvise functionality

2007-04-03 Thread Andrew Morton
On Tue, 3 Apr 2007 14:49:48 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote: > > int > > main (void) > > { > > pthread_t th[32]; > > int i; > > for (i = 0; i < 32; i++) > > if (pthread_create (&th[i], NULL, tf, NULL)) > > exit (4); > > for (i = 0; i < 32; i++) > > pthread_join

Re: missing madvise functionality

2007-04-03 Thread Andi Kleen
On Tue, Apr 03, 2007 at 02:46:09PM -0700, Ulrich Drepper wrote: > Eric Dumazet wrote: > > A page fault is not that expensive. But clearing N*PAGE_SIZE bytes is, > > because it potentially evicts a large part of CPU cache. > > *A* page fault is not that expensive. The problem is that you get a > p

Re: missing madvise functionality

2007-04-03 Thread Ulrich Drepper
Arnd Bergmann wrote: > I thought this is what the read_zero_pagealigned hack [1] was used > for (read from /dev/zero replaces target pages with empty_zero_page). But that's not what we want. If I understand that code correctly it's the same as the current MADV_DONTNEED. It will simply remove the

Re: missing madvise functionality

2007-04-03 Thread Jörn Engel
On Tue, 3 April 2007 23:10:14 +0200, Eric Dumazet wrote: > > mmap()/brk() must give fresh NULL pages, but maybe madvise(MADV_DONTNEED) > can relax this requirement (if the pages were reclaimed, then a page fault > could bring a new page with random content) ...provided that it doesn't leak info

Re: missing madvise functionality

2007-04-03 Thread Arnd Bergmann
On Tuesday 03 April 2007, Ulrich Drepper wrote: > The problem is glibc has to work around kernel limitations.  If the > malloc implementation detects that a large chunk of previously allocated > memory is now free and unused it wants to return the memory to the > system.  What we currently have to

Re: missing madvise functionality

2007-04-03 Thread Andrew Morton
On Tue, 3 Apr 2007 16:29:37 -0400 Jakub Jelinek <[EMAIL PROTECTED]> wrote: > On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote: > > Andrew Morton wrote: > > > Ulrich, could you suggest a little test app which would demonstrate this > > > behaviour? > > > > It's not really reliably po

Re: missing madvise functionality

2007-04-03 Thread Ulrich Drepper
Eric Dumazet wrote: > A page fault is not that expensive. But clearing N*PAGE_SIZE bytes is, > because it potentially evicts a large part of CPU cache. *A* page fault is not that expensive. The problem is that you get a page fault for every single page. For 200k allocated you get 50 page faults.

Re: missing madvise functionality

2007-04-03 Thread Eric Dumazet
Rik van Riel a écrit : Eric Dumazet wrote: Rik van Riel a écrit : Andrew Morton wrote: Oh. I was assuming that we'd want to unmap these pages from pagetables and mark then super-easily-reclaimable. So a later touch would incur a minor fault. But you think that we should leave them mapped

Re: missing madvise functionality

2007-04-03 Thread Rik van Riel
Jeremy Fitzhardinge wrote: Eric Dumazet wrote: mmap()/brk() must give fresh NULL pages, but maybe madvise(MADV_DONTNEED) can relax this requirement (if the pages were reclaimed, then a page fault could bring a new page with random content) Only if those pages were originally from that process

Re: missing madvise functionality

2007-04-03 Thread Jeremy Fitzhardinge
Eric Dumazet wrote: > mmap()/brk() must give fresh NULL pages, but maybe > madvise(MADV_DONTNEED) can relax this requirement (if the pages were > reclaimed, then a page fault could bring a new page with random content) Only if those pages were originally from that process. Otherwise you've got a

Re: missing madvise functionality

2007-04-03 Thread Andrew Morton
On Tue, 03 Apr 2007 17:00:09 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > > Oh. I was assuming that we'd want to unmap these pages from pagetables and > > mark then super-easily-reclaimable. So a later touch would incur a minor > > fault. > > > > But you think that

Re: missing madvise functionality

2007-04-03 Thread Rik van Riel
Eric Dumazet wrote: Rik van Riel a écrit : Andrew Morton wrote: Oh. I was assuming that we'd want to unmap these pages from pagetables and mark then super-easily-reclaimable. So a later touch would incur a minor fault. But you think that we should leave them mapped into pagetables so no

Re: missing madvise functionality

2007-04-03 Thread Eric Dumazet
Rik van Riel a écrit : Andrew Morton wrote: Oh. I was assuming that we'd want to unmap these pages from pagetables and mark then super-easily-reclaimable. So a later touch would incur a minor fault. But you think that we should leave them mapped into pagetables so no such fault occurs. L

Re: missing madvise functionality

2007-04-03 Thread Rik van Riel
Andrew Morton wrote: Oh. I was assuming that we'd want to unmap these pages from pagetables and mark then super-easily-reclaimable. So a later touch would incur a minor fault. But you think that we should leave them mapped into pagetables so no such fault occurs. Leaving the pages mapped i

Re: missing madvise functionality

2007-04-03 Thread Ulrich Drepper
Andrew Morton wrote: > But whatever we do, with the current MM design we need to at least take the > mmap_sem for reading so we can descend the vma tree and locate the > pageframes. And if that locking is the main problem then none of this is > likely to help. At least it's done only once for the

Re: missing madvise functionality

2007-04-03 Thread Andrew Morton
On Tue, 03 Apr 2007 13:17:09 -0700 Ulrich Drepper <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > Ulrich, could you suggest a little test app which would demonstrate this > > behaviour? > > It's not really reliably possible to demonstrate this with a small > program using malloc. You'd nee

Re: missing madvise functionality

2007-04-03 Thread Rik van Riel
Jakub Jelinek wrote: My guess is that all the page zeroing is pretty expensive as well and takes significant time, but I haven't profiled it. I'm pretty sure that page freeing, reallocating and zeroing is more expensive than just letting the page sit there and only reclaim it lazily when we ne

Re: missing madvise functionality

2007-04-03 Thread Jakub Jelinek
On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote: > Andrew Morton wrote: > > Ulrich, could you suggest a little test app which would demonstrate this > > behaviour? > > It's not really reliably possible to demonstrate this with a small > program using malloc. You'd need something li

Re: missing madvise functionality

2007-04-03 Thread Ulrich Drepper
Andrew Morton wrote: > Ulrich, could you suggest a little test app which would demonstrate this > behaviour? It's not really reliably possible to demonstrate this with a small program using malloc. You'd need something like this mysql test case which Rik said is not hard to run by yourself. If s

Re: missing madvise functionality

2007-04-03 Thread Andi Kleen
> It might, a bit. Both mmap() and mprotect() currently take mmap_sem() for > writing. If we're careful, we could probably arrange for MADV_ULRICH to > take it for reading, which will help a little bit, hopefully. The cache line bounces would be still there. Not sure that would help MySQL all th

Re: missing madvise functionality

2007-04-03 Thread Andrew Morton
On Tue, 3 Apr 2007 19:28:41 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote: > On Tue, Apr 03, 2007 at 10:20:02AM -0700, Ulrich Drepper wrote: > > Andi Kleen wrote: > > > Why do you need a lock for that? I don't see any problem with > > > two threads doing that in parallel. The kernel would > > > seria

Re: missing madvise functionality

2007-04-03 Thread Rik van Riel
Ulrich Drepper wrote: Rik van Riel wrote: I already started looking into implementing this. Basically: [...] Sounds good. Except: 1) on MADV_DONTNEED, mark pages clean, not accessed and move them to some "dontneed" LRU list. LRU is likely the wrong answer. The longer a page has not

Re: missing madvise functionality

2007-04-03 Thread Andi Kleen
On Tue, Apr 03, 2007 at 10:20:02AM -0700, Ulrich Drepper wrote: > Andi Kleen wrote: > > Why do you need a lock for that? I don't see any problem with > > two threads doing that in parallel. The kernel would > > serialize it internally and one would fail, but that shouldn't > > be a problem. > > T

Re: missing madvise functionality

2007-04-03 Thread Ulrich Drepper
Andi Kleen wrote: > Why do you need a lock for that? I don't see any problem with > two threads doing that in parallel. The kernel would > serialize it internally and one would fail, but that shouldn't > be a problem. There is no lock at all at userlevel. I'm talking about locks in the kernel.

Re: missing madvise functionality

2007-04-03 Thread Andi Kleen
Ulrich Drepper <[EMAIL PROTECTED]> writes: > to free: mmap(PROT_NONE) over the area Why do you need a lock for that? I don't see any problem with two threads doing that in parallel. The kernel would serialize it internally and one would fail, but that shouldn't be a problem. Of course ha

Re: missing madvise functionality

2007-04-03 Thread Ulrich Drepper
Rik van Riel wrote: > I already started looking into implementing this. > > Basically: > [...] Sounds good. Except: > 1) on MADV_DONTNEED, mark pages clean, not accessed and move them >to some "dontneed" LRU list. LRU is likely the wrong answer. The longer a page has not been reused the

Re: missing madvise functionality

2007-04-03 Thread Rik van Riel
Ulrich Drepper wrote: That's it. The current MADV_DONTNEED doesn't cut it because it zaps the pages, causing *all* future reuses to create page faults. This is what I guess happens in the mysql test case where the pages where unused and freed but then almost immediately reused. The page fault

missing madvise functionality

2007-04-03 Thread Ulrich Drepper
People might remember the thread about mysql not scaling and pointing the finger quite happily at glibc. Well, the situation is not like that. The problem is glibc has to work around kernel limitations. If the malloc implementation detects that a large chunk of previously allocated memory is now