* Andrew Morton <[EMAIL PROTECTED]> wrote:
> > i've attached an updated version of trace-it.c, which will turn this
> > off itself, using a sysctl. I also made WAKEUP_TIMING default-off.
>
> ok. http://userweb.kernel.org/~akpm/to-ingo.txt is the trace of
>
> taskset -c 0 ./jakubs-test-a
On Fri, 6 Apr 2007 11:08:22 +0200
Ingo Molnar <[EMAIL PROTECTED]> wrote:
> * Andrew Morton <[EMAIL PROTECTED]> wrote:
>
> > > getting a good trace of it is easy: pick up the latest -rt kernel
> > > from:
> > >
> > > http://redhat.com/~mingo/realtime-preempt/
> > >
> > > enable EVENT_TRACING
* Andrew Morton <[EMAIL PROTECTED]> wrote:
> > getting a good trace of it is easy: pick up the latest -rt kernel
> > from:
> >
> > http://redhat.com/~mingo/realtime-preempt/
> >
> > enable EVENT_TRACING in that kernel, run the workload and do:
> >
> > scripts/trace-it > to-ingo.txt
>
Ulrich Drepper wrote:
Nick Piggin wrote:
Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's
kernels using down_write(mmap_sem) for MADV_DONTNEED is better than
mmap/mprotect, which have more fundamental locking requirements, more
overhead and no benefits (except debugging, I
Nick Piggin wrote:
> Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's
> kernels using down_write(mmap_sem) for MADV_DONTNEED is better than
> mmap/mprotect, which have more fundamental locking requirements, more
> overhead and no benefits (except debugging, I suppose).
It's a
Ulrich Drepper wrote:
In case somebody wants to play around with Rik patch or another
madvise-based patch, I have x86-64 glibc binaries which can use it:
http://people.redhat.com/drepper/rpms
These are based on the latest Fedora rawhide version. They should work
on older systems, too, but yo
Rik van Riel wrote:
Nick Piggin wrote:
Oh, also: something like this patch would help out MADV_DONTNEED, as it
means it can run concurrently with page faults. I think the locking will
work (but needs forward porting).
Ironically, your patch decreases throughput on my quad core
test system, w
Andrew Morton wrote:
#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
I wonder which way you're using, and whether using the other way changes
things.
I'm using the default Fedora config file, which has
NR_CPUS defined to 64 and CONFIG_SPLIT_PTLOCK_CPUS
to 4, so I am using the split locks.
However,
On Thu, 05 Apr 2007 14:38:30 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:
> Nick Piggin wrote:
>
> > Oh, also: something like this patch would help out MADV_DONTNEED, as it
> > means it can run concurrently with page faults. I think the locking will
> > work (but needs forward porting).
>
> Iro
On Thu, 5 Apr 2007 21:11:29 +0200
Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> * David Howells <[EMAIL PROTECTED]> wrote:
>
> > But short of recording the lock sequence, I don't think there's anyway
> > to find out for sure. printk probably won't cut it as a recording
> > mechanism because its
On Thu, 05 Apr 2007 13:48:58 +0100
David Howells <[EMAIL PROTECTED]> wrote:
> Andrew Morton <[EMAIL PROTECTED]> wrote:
>
> >
> > What we effectively have is 32 threads on a single CPU all doing
> >
> > for (ever) {
> > down_write()
> > up_write()
> > down
* David Howells <[EMAIL PROTECTED]> wrote:
> But short of recording the lock sequence, I don't think there's anyway
> to find out for sure. printk probably won't cut it as a recording
> mechanism because its overheads are too great.
getting a good trace of it is easy: pick up the latest -rt k
Nick Piggin wrote:
Oh, also: something like this patch would help out MADV_DONTNEED, as it
means it can run concurrently with page faults. I think the locking will
work (but needs forward porting).
Ironically, your patch decreases throughput on my quad core
test system, with Jakub's test case.
Jakub Jelinek wrote:
+ /* FIXME: POSIX says that MADV_DONTNEED cannot throw away data. */
case MADV_DONTNEED:
+ case MADV_FREE:
error = madvise_dontneed(vma, prev, start, end);
break;
I think you should only use the new behavior for madvise M
In case somebody wants to play around with Rik patch or another
madvise-based patch, I have x86-64 glibc binaries which can use it:
http://people.redhat.com/drepper/rpms
These are based on the latest Fedora rawhide version. They should work
on older systems, too, but you screw up your updates.
Andrew Morton wrote:
On Thu, 05 Apr 2007 03:39:29 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote:
Rik van Riel wrote:
MADV_DONTNEED, unpatched, 1000 loops
real0m13.672s
user0m1.217s
sys 0m45.712s
MADV_DONTNEED, with patch, 1000 loops
real0m4.169s
user0m2.033s
sys 0m3
Andrew Morton <[EMAIL PROTECTED]> wrote:
>
> What we effectively have is 32 threads on a single CPU all doing
>
> for (ever) {
> down_write()
> up_write()
> down_read()
> up_read();
> }
That's not quite so. In that test progra
Eric Dumazet wrote:
Could you please add this patch and see if it helps on your machine ?
[PATCH] VM : mm_struct's mmap_cache should be close to mmap_sem
Avoids cache line dirtying
I could, but I already know it's not going to help much.
How do I know this? I already have 66% idle time whe
On Thu, 05 Apr 2007 03:39:29 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote:
> Rik van Riel wrote:
>
> > MADV_DONTNEED, unpatched, 1000 loops
> >
> > real0m13.672s
> > user0m1.217s
> > sys 0m45.712s
> >
> >
> > MADV_DONTNEED, with patch, 1000 loops
> >
> > real0m4.169s
> > user
On Thu, 05 Apr 2007 03:31:24 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:
> Jakub Jelinek wrote:
>
> > My guess is that all the page zeroing is pretty expensive as well and
> > takes significant time, but I haven't profiled it.
>
> With the attached patch (Andrew, I'll change the details around
Rik van Riel wrote:
MADV_DONTNEED, unpatched, 1000 loops
real0m13.672s
user0m1.217s
sys 0m45.712s
MADV_DONTNEED, with patch, 1000 loops
real0m4.169s
user0m2.033s
sys 0m3.224s
I just noticed something fun with these numbers.
Without the patch, the system (a quad cor
Jakub Jelinek wrote:
My guess is that all the page zeroing is pretty expensive as well and
takes significant time, but I haven't profiled it.
With the attached patch (Andrew, I'll change the details around
if you want - I just wanted something to test now), your test
case run time went down co
Ulrich Drepper a écrit :
Eric Dumazet wrote:
Database workload, where the user multi threaded app is constantly
accessing GBytes of data, so L2 cache hit is very small. If you want to
oprofile it, with say a CPU_CLK_UNHALTED:5000 event, then find_vma() is
in the top 5.
We did have a workload w
On Thu, Apr 05, 2007 at 03:31:24AM -0400, Rik van Riel wrote:
> >My guess is that all the page zeroing is pretty expensive as well and
> >takes significant time, but I haven't profiled it.
>
> With the attached patch (Andrew, I'll change the details around
> if you want - I just wanted something t
On Thu, 05 Apr 2007 04:31:55 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:
> Eric Dumazet wrote:
>
> > Could you please add this patch and see if it helps on your machine ?
> >
> > [PATCH] VM : mm_struct's mmap_cache should be close to mmap_sem
> >
> > Avoids cache line dirtying
>
> I could, b
Eric Dumazet wrote:
> Database workload, where the user multi threaded app is constantly
> accessing GBytes of data, so L2 cache hit is very small. If you want to
> oprofile it, with say a CPU_CLK_UNHALTED:5000 event, then find_vma() is
> in the top 5.
We did have a workload with lots of Java and
Nick Piggin a écrit :
Eric Dumazet wrote:
>> This was not a working patch, just to throw the idea, since the
answers I got showed I was not understood.
In this case, find_extend_vma() should of course have one struct
vm_area_cache * argument, like find_vma()
One single cache on one mm is not
On Wed, 4 Apr 2007 06:09:18 -0700 William Lee Irwin III <[EMAIL PROTECTED]>
wrote:
>> Oh dear.
On Wed, Apr 04, 2007 at 11:51:05AM -0700, Andrew Morton wrote:
> what's all this about?
I rewrote Jakub's testcase and included it as a MIME attachment.
Current working version inline below. Also at
Nick Piggin wrote:
Jakub Jelinek wrote:
On Wed, Apr 04, 2007 at 05:46:12PM +1000, Nick Piggin wrote:
Does mmap(PROT_NONE) actually free the memory?
Yes.
/* Clear old maps */
error = -ENOMEM;
munmap_back:
vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent)
Eric Dumazet wrote:
On Wed, 04 Apr 2007 20:05:54 +1000
Nick Piggin <[EMAIL PROTECTED]> wrote:
@@ -1638,7 +1652,7 @@ find_extend_vma(struct mm_struct * mm, u
unsigned long start;
addr &= PAGE_MASK;
- vma = find_vma(mm,addr);
+ vma = find_vma(mm,addr,¤t->vmacache);
Hugh Dickins wrote:
On Wed, 4 Apr 2007, Rik van Riel wrote:
Hugh Dickins wrote:
(I didn't understand how Rik would achieve his point 5, _no_ lock
contention while repeatedly re-marking these pages, but never mind.)
The CPU marks them accessed&dirty when they are reused.
The VM only moves
On Tue, 3 Apr 2007 16:29:37 -0400
Jakub Jelinek <[EMAIL PROTECTED]> wrote:
> #include
> #include
> #include
> #include
>
> void *
> tf (void *arg)
> {
> (void) arg;
> size_t ps = sysconf (_SC_PAGE_SIZE);
> void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE,
> MAP_P
On Wed, 04 Apr 2007 14:08:47 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:
> Andrew Morton wrote:
>
> > There are other ways of doing it - I guess we could use a new page flag to
> > indicate that this is one-of-those-pages, and add new code to handle it in
> > all the right places.
>
> That's w
On Wed, 4 Apr 2007 06:09:18 -0700 William Lee Irwin III <[EMAIL PROTECTED]>
wrote:
>
> On Tue, Apr 03, 2007 at 04:29:37PM -0400, Jakub Jelinek wrote:
> > void *
> > tf (void *arg)
> > {
> > (void) arg;
> > size_t ps = sysconf (_SC_PAGE_SIZE);
> > void *p = mmap (NULL, 128 * ps, PROT_READ |
Hi,
> Oh. I was assuming that we'd want to unmap these pages from pagetables and
> mark then super-easily-reclaimable. So a later touch would incur a minor
> fault.
>
> But you think that we should leave them mapped into pagetables so no such
> fault occurs.
That would be very nice. The issue
On Wed, 4 Apr 2007, Andrew Morton wrote:
>
> The treatment is identical to clean swapcache pages, with the sole
> exception that they don't actually consume any swap space - hence the fake
> swapcache entry thing.
I see, sneaking through try_to_unmap's anon PageSwapCache assumptions
as simply as
Andrew Morton wrote:
There are other ways of doing it - I guess we could use a new page flag to
indicate that this is one-of-those-pages, and add new code to handle it in
all the right places.
That's what I did. I'm currently working on the
zap_page_range() side of things.
One thing which w
On Wed, 4 Apr 2007 10:15:41 +0100 (BST) Hugh Dickins <[EMAIL PROTECTED]> wrote:
> On Tue, 3 Apr 2007, Andrew Morton wrote:
> >
> > All of which indicates that if we can remove the down_write(mmap_sem) from
> > this glibc operation, things should get a lot better - there will be no
> > additional
On Wed, 4 Apr 2007, Rik van Riel wrote:
> Hugh Dickins wrote:
>
> > (I didn't understand how Rik would achieve his point 5, _no_ lock
> > contention while repeatedly re-marking these pages, but never mind.)
>
> The CPU marks them accessed&dirty when they are reused.
>
> The VM only moves the reu
Hugh Dickins wrote:
(I didn't understand how Rik would achieve his point 5, _no_ lock
contention while repeatedly re-marking these pages, but never mind.)
The CPU marks them accessed&dirty when they are reused.
The VM only moves the reused pages back to the active list
on memory pressure. Th
On Wed, 4 Apr 2007, Marko Macek wrote:
> Ulrich Drepper wrote:
> > A solution for this problem is a madvise() operation with the following
> > property:
> >
> > - the content of the address range can be discarded
> >
> > - if an access to a page in the range happens in the future it must
> >
On Wed, Apr 04, 2007 at 06:09:18AM -0700, William Lee Irwin III wrote:
> for (--i; i >= 0; --i) {
> if (pthread_join(th[i], NULL)) {
> perror("main: pthread_join failed");
> ret = EXIT_FAILURE;
> }
> }
Obligatory b
On Tue, Apr 03, 2007 at 04:29:37PM -0400, Jakub Jelinek wrote:
> void *
> tf (void *arg)
> {
> (void) arg;
> size_t ps = sysconf (_SC_PAGE_SIZE);
> void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> if (p == MAP_FAILED)
> e
On Wed, 04 Apr 2007 20:05:54 +1000
Nick Piggin <[EMAIL PROTECTED]> wrote:
>
> > @@ -1638,7 +1652,7 @@ find_extend_vma(struct mm_struct * mm, u
> > unsigned long start;
> >
> > addr &= PAGE_MASK;
> > - vma = find_vma(mm,addr);
> > + vma = find_vma(mm,addr,¤t->vmacache);
> > if (!v
Eric Dumazet wrote:
Well, I believe this one is too expensive. I was thinking of a light one :
This one seems worse. Passing your vm_area_cache around everywhere, which
is just intrusive and dangerous because ot becomes decoupled from the mm
struct you are passing around. Watch this:
@@ -16
Eric Dumazet wrote:
On Wed, 04 Apr 2007 18:55:18 +1000
Nick Piggin <[EMAIL PROTECTED]> wrote:
Peter Zijlstra wrote:
On Wed, 2007-04-04 at 12:22 +1000, Nick Piggin wrote:
Eric Dumazet wrote:
I do think such workloads might benefit from a vma_cache not shared by
all threads but private
On Wed, 04 Apr 2007 18:55:18 +1000
Nick Piggin <[EMAIL PROTECTED]> wrote:
> Peter Zijlstra wrote:
> > On Wed, 2007-04-04 at 12:22 +1000, Nick Piggin wrote:
> >
> >>Eric Dumazet wrote:
> >
> >
> >>>I do think such workloads might benefit from a vma_cache not shared by
> >>>all threads but priva
William Lee Irwin III wrote:
On Wed, Apr 04, 2007 at 06:55:18PM +1000, Nick Piggin wrote:
+ rcu_read_lock();
+ do {
+ t->vma_cache_sequence = -1;
+ t = next_thread(t);
+ } while (t != curr);
+ rc
On Tue, 3 Apr 2007, Andrew Morton wrote:
>
> All of which indicates that if we can remove the down_write(mmap_sem) from
> this glibc operation, things should get a lot better - there will be no
> additional context switches at all.
>
> And we can surely do that if all we're doing is looking up pa
On Wed, Apr 04, 2007 at 06:55:18PM +1000, Nick Piggin wrote:
> + rcu_read_lock();
> + do {
> + t->vma_cache_sequence = -1;
> + t = next_thread(t);
> + } while (t != curr);
> + rcu_read_unlock();
LD_ASSUME_KERNE
Peter Zijlstra wrote:
On Wed, 2007-04-04 at 12:22 +1000, Nick Piggin wrote:
Eric Dumazet wrote:
I do think such workloads might benefit from a vma_cache not shared by
all threads but private to each thread. A sequence could invalidate the
cache(s).
ie instead of a mm->mmap_cache, having
Jakub Jelinek wrote:
On Wed, Apr 04, 2007 at 05:46:12PM +1000, Nick Piggin wrote:
Does mmap(PROT_NONE) actually free the memory?
Yes.
/* Clear old maps */
error = -ENOMEM;
munmap_back:
vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent);
if (vma && v
On Wed, 2007-04-04 at 12:22 +1000, Nick Piggin wrote:
> Eric Dumazet wrote:
> > I do think such workloads might benefit from a vma_cache not shared by
> > all threads but private to each thread. A sequence could invalidate the
> > cache(s).
> >
> > ie instead of a mm->mmap_cache, having a mm->s
On Wed, Apr 04, 2007 at 05:46:12PM +1000, Nick Piggin wrote:
> Does mmap(PROT_NONE) actually free the memory?
Yes.
/* Clear old maps */
error = -ENOMEM;
munmap_back:
vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent);
if (vma && vma->vm_start < addr + len
Nick Piggin wrote:
Ulrich Drepper wrote:
People might remember the thread about mysql not scaling and pointing
the finger quite happily at glibc. Well, the situation is not like that.
The problem is glibc has to work around kernel limitations. If the
malloc implementation detects that a larg
Ulrich Drepper wrote:
People might remember the thread about mysql not scaling and pointing
the finger quite happily at glibc. Well, the situation is not like that.
The problem is glibc has to work around kernel limitations. If the
malloc implementation detects that a large chunk of previously
On Tue, 03 Apr 2007 23:54:42 -0700
Ulrich Drepper <[EMAIL PROTECTED]> wrote:
> Eric Dumazet wrote:
> > You were CC on this one, you can find an archive here :
>
> You cc:ed my gmail account. I don't pick out mails sent to me there.
> If you want me to look at something you have to send it to my
Eric Dumazet wrote:
> You were CC on this one, you can find an archive here :
You cc:ed my gmail account. I don't pick out mails sent to me there.
If you want me to look at something you have to send it to my
@redhat.com address.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain Vi
Ulrich Drepper a écrit :
Nick Piggin wrote:
Sad. Although Ulrich did seem interested at one point I think? Ulrich,
do you agree at least with the interface that Eric is proposing?
I have no idea what you're talking about.
You were CC on this one, you can find an archive here :
http://lkml.
Ulrich Drepper wrote:
Nick Piggin wrote:
Sad. Although Ulrich did seem interested at one point I think? Ulrich,
do you agree at least with the interface that Eric is proposing?
I have no idea what you're talking about.
Private futexes.
--
SUSE Labs, Novell Inc.
-
To unsubscribe from this
Nick Piggin wrote:
> Sad. Although Ulrich did seem interested at one point I think? Ulrich,
> do you agree at least with the interface that Eric is proposing?
I have no idea what you're talking about.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
signature.asc
Desc
(sorry to change the subjet, I was initially going to send the
threaded vma cache patches on list, but then decided they didn't
have enough changelog!)
Andrew Morton wrote:
On Wed, 04 Apr 2007 16:09:40 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:
Andrew, do you have any objections to putting
On Wed, 04 Apr 2007 16:09:40 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:
> Andrew, do you have any objections to putting Eric's fairly
> important patch at least into -mm?
you know what to do ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Eric Dumazet wrote:
Nick Piggin a écrit :
Eric Dumazet wrote:
I do think such workloads might benefit from a vma_cache not shared
by all threads but private to each thread. A sequence could
invalidate the cache(s).
ie instead of a mm->mmap_cache, having a mm->sequence, and each
thread h
Nick Piggin a écrit :
Eric Dumazet wrote:
I do think such workloads might benefit from a vma_cache not shared by
all threads but private to each thread. A sequence could invalidate
the cache(s).
ie instead of a mm->mmap_cache, having a mm->sequence, and each thread
having a current->mmap_c
Marko Macek wrote:
Ulrich Drepper wrote:
A solution for this problem is a madvise() operation with the following
property:
- the content of the address range can be discarded
- if an access to a page in the range happens in the future it must
succeed. The old page content can be provi
Ulrich Drepper wrote:
A solution for this problem is a madvise() operation with the following
property:
- the content of the address range can be discarded
- if an access to a page in the range happens in the future it must
succeed. The old page content can be provided or a new, empty
Eric Dumazet wrote:
Andrew Morton a écrit :
On Tue, 3 Apr 2007 16:29:37 -0400
Jakub Jelinek <[EMAIL PROTECTED]> wrote:
On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote:
Andrew Morton wrote:
Ulrich, could you suggest a little test app which would demonstrate
this
behaviour?
On Tue, 3 Apr 2007 14:49:48 -0700
Andrew Morton <[EMAIL PROTECTED]> wrote:
> > int
> > main (void)
> > {
> > pthread_t th[32];
> > int i;
> > for (i = 0; i < 32; i++)
> > if (pthread_create (&th[i], NULL, tf, NULL))
> > exit (4);
> > for (i = 0; i < 32; i++)
> > pthread_join
Andi Kleen wrote:
> If you know in advance you need them it might be possible to
> batch that. e.g. MADV_WILLNEED could be extended to
> work on anonymous memory and establish the mappings in the syscall.
> Would that be useful?
Not in the exact way you think. The problem is that not all pages
Andrew Morton a écrit :
On Tue, 3 Apr 2007 16:29:37 -0400
Jakub Jelinek <[EMAIL PROTECTED]> wrote:
On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote:
Andrew Morton wrote:
Ulrich, could you suggest a little test app which would demonstrate this
behaviour?
It's not really reliably
On Tue, 3 Apr 2007 14:49:48 -0700
Andrew Morton <[EMAIL PROTECTED]> wrote:
> > int
> > main (void)
> > {
> > pthread_t th[32];
> > int i;
> > for (i = 0; i < 32; i++)
> > if (pthread_create (&th[i], NULL, tf, NULL))
> > exit (4);
> > for (i = 0; i < 32; i++)
> > pthread_join
On Tue, Apr 03, 2007 at 02:46:09PM -0700, Ulrich Drepper wrote:
> Eric Dumazet wrote:
> > A page fault is not that expensive. But clearing N*PAGE_SIZE bytes is,
> > because it potentially evicts a large part of CPU cache.
>
> *A* page fault is not that expensive. The problem is that you get a
> p
Arnd Bergmann wrote:
> I thought this is what the read_zero_pagealigned hack [1] was used
> for (read from /dev/zero replaces target pages with empty_zero_page).
But that's not what we want. If I understand that code correctly it's
the same as the current MADV_DONTNEED. It will simply remove the
On Tue, 3 April 2007 23:10:14 +0200, Eric Dumazet wrote:
>
> mmap()/brk() must give fresh NULL pages, but maybe madvise(MADV_DONTNEED)
> can relax this requirement (if the pages were reclaimed, then a page fault
> could bring a new page with random content)
...provided that it doesn't leak info
On Tuesday 03 April 2007, Ulrich Drepper wrote:
> The problem is glibc has to work around kernel limitations. If the
> malloc implementation detects that a large chunk of previously allocated
> memory is now free and unused it wants to return the memory to the
> system. What we currently have to
On Tue, 3 Apr 2007 16:29:37 -0400
Jakub Jelinek <[EMAIL PROTECTED]> wrote:
> On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote:
> > Andrew Morton wrote:
> > > Ulrich, could you suggest a little test app which would demonstrate this
> > > behaviour?
> >
> > It's not really reliably po
Eric Dumazet wrote:
> A page fault is not that expensive. But clearing N*PAGE_SIZE bytes is,
> because it potentially evicts a large part of CPU cache.
*A* page fault is not that expensive. The problem is that you get a
page fault for every single page. For 200k allocated you get 50 page
faults.
Rik van Riel a écrit :
Eric Dumazet wrote:
Rik van Riel a écrit :
Andrew Morton wrote:
Oh. I was assuming that we'd want to unmap these pages from
pagetables and
mark then super-easily-reclaimable. So a later touch would incur a
minor
fault.
But you think that we should leave them mapped
Jeremy Fitzhardinge wrote:
Eric Dumazet wrote:
mmap()/brk() must give fresh NULL pages, but maybe
madvise(MADV_DONTNEED) can relax this requirement (if the pages were
reclaimed, then a page fault could bring a new page with random content)
Only if those pages were originally from that process
Eric Dumazet wrote:
> mmap()/brk() must give fresh NULL pages, but maybe
> madvise(MADV_DONTNEED) can relax this requirement (if the pages were
> reclaimed, then a page fault could bring a new page with random content)
Only if those pages were originally from that process. Otherwise you've
got a
On Tue, 03 Apr 2007 17:00:09 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:
> Andrew Morton wrote:
>
> > Oh. I was assuming that we'd want to unmap these pages from pagetables and
> > mark then super-easily-reclaimable. So a later touch would incur a minor
> > fault.
> >
> > But you think that
Eric Dumazet wrote:
Rik van Riel a écrit :
Andrew Morton wrote:
Oh. I was assuming that we'd want to unmap these pages from
pagetables and
mark then super-easily-reclaimable. So a later touch would incur a
minor
fault.
But you think that we should leave them mapped into pagetables so no
Rik van Riel a écrit :
Andrew Morton wrote:
Oh. I was assuming that we'd want to unmap these pages from
pagetables and
mark then super-easily-reclaimable. So a later touch would incur a minor
fault.
But you think that we should leave them mapped into pagetables so no such
fault occurs.
L
Andrew Morton wrote:
Oh. I was assuming that we'd want to unmap these pages from pagetables and
mark then super-easily-reclaimable. So a later touch would incur a minor
fault.
But you think that we should leave them mapped into pagetables so no such
fault occurs.
Leaving the pages mapped i
Andrew Morton wrote:
> But whatever we do, with the current MM design we need to at least take the
> mmap_sem for reading so we can descend the vma tree and locate the
> pageframes. And if that locking is the main problem then none of this is
> likely to help.
At least it's done only once for the
On Tue, 03 Apr 2007 13:17:09 -0700
Ulrich Drepper <[EMAIL PROTECTED]> wrote:
> Andrew Morton wrote:
> > Ulrich, could you suggest a little test app which would demonstrate this
> > behaviour?
>
> It's not really reliably possible to demonstrate this with a small
> program using malloc. You'd nee
Jakub Jelinek wrote:
My guess is that all the page zeroing is pretty expensive as well and
takes significant time, but I haven't profiled it.
I'm pretty sure that page freeing, reallocating and zeroing
is more expensive than just letting the page sit there and
only reclaim it lazily when we ne
On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote:
> Andrew Morton wrote:
> > Ulrich, could you suggest a little test app which would demonstrate this
> > behaviour?
>
> It's not really reliably possible to demonstrate this with a small
> program using malloc. You'd need something li
Andrew Morton wrote:
> Ulrich, could you suggest a little test app which would demonstrate this
> behaviour?
It's not really reliably possible to demonstrate this with a small
program using malloc. You'd need something like this mysql test case
which Rik said is not hard to run by yourself.
If s
> It might, a bit. Both mmap() and mprotect() currently take mmap_sem() for
> writing. If we're careful, we could probably arrange for MADV_ULRICH to
> take it for reading, which will help a little bit, hopefully.
The cache line bounces would be still there. Not sure that would help MySQL
all th
On Tue, 3 Apr 2007 19:28:41 +0200
Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Tue, Apr 03, 2007 at 10:20:02AM -0700, Ulrich Drepper wrote:
> > Andi Kleen wrote:
> > > Why do you need a lock for that? I don't see any problem with
> > > two threads doing that in parallel. The kernel would
> > > seria
Ulrich Drepper wrote:
Rik van Riel wrote:
I already started looking into implementing this.
Basically:
[...]
Sounds good. Except:
1) on MADV_DONTNEED, mark pages clean, not accessed and move them
to some "dontneed" LRU list.
LRU is likely the wrong answer. The longer a page has not
On Tue, Apr 03, 2007 at 10:20:02AM -0700, Ulrich Drepper wrote:
> Andi Kleen wrote:
> > Why do you need a lock for that? I don't see any problem with
> > two threads doing that in parallel. The kernel would
> > serialize it internally and one would fail, but that shouldn't
> > be a problem.
>
> T
Andi Kleen wrote:
> Why do you need a lock for that? I don't see any problem with
> two threads doing that in parallel. The kernel would
> serialize it internally and one would fail, but that shouldn't
> be a problem.
There is no lock at all at userlevel. I'm talking about locks in the
kernel.
Ulrich Drepper <[EMAIL PROTECTED]> writes:
> to free: mmap(PROT_NONE) over the area
Why do you need a lock for that? I don't see any problem with
two threads doing that in parallel. The kernel would
serialize it internally and one would fail, but that shouldn't
be a problem.
Of course ha
Rik van Riel wrote:
> I already started looking into implementing this.
>
> Basically:
> [...]
Sounds good. Except:
> 1) on MADV_DONTNEED, mark pages clean, not accessed and move them
>to some "dontneed" LRU list.
LRU is likely the wrong answer. The longer a page has not been reused
the
Ulrich Drepper wrote:
That's it. The current MADV_DONTNEED doesn't cut it because it zaps the
pages, causing *all* future reuses to create page faults. This is what
I guess happens in the mysql test case where the pages where unused and
freed but then almost immediately reused. The page fault
People might remember the thread about mysql not scaling and pointing
the finger quite happily at glibc. Well, the situation is not like that.
The problem is glibc has to work around kernel limitations. If the
malloc implementation detects that a large chunk of previously allocated
memory is now
99 matches
Mail list logo