On Fri, May 16, 2008 at 01:52:03AM +0200, Nick Piggin wrote:
On Thu, May 15, 2008 at 10:33:57AM -0700, Christoph Lameter wrote:
On Thu, 15 May 2008, Nick Piggin wrote:
Oh, I get that confused because of the mixed up naming conventions
there: unmap_page_range should actually be called
On Fri, May 16, 2008 at 06:23:06AM -0500, Robin Holt wrote:
On Fri, May 16, 2008 at 01:52:03AM +0200, Nick Piggin wrote:
On Thu, May 15, 2008 at 10:33:57AM -0700, Christoph Lameter wrote:
On Thu, 15 May 2008, Nick Piggin wrote:
Oh, I get that confused because of the mixed up naming
On Wed, May 14, 2008 at 06:26:25AM -0500, Robin Holt wrote:
On Wed, May 14, 2008 at 06:11:22AM +0200, Nick Piggin wrote:
I guess that you have found a way to perform TLB flushing within coherent
domains over the numalink interconnect without sleeping. I'm sure it would
be possible to
We are pursuing Linus' suggestion currently. This discussion is
completely unrelated to that work.
On Thu, May 15, 2008 at 09:57:47AM +0200, Nick Piggin wrote:
I'm not sure if you're thinking about what I'm thinking of. With the
scheme I'm imagining, all you will need is some way to raise an
Robin Holt wrote:
Then we need to deposit the information needed to do the invalidate.
Lastly, we would need to interrupt. Unfortunately, here we have a
thundering herd. There could be up to 16256 processors interrupting the
same processor. That will be a lot of work. It will need to look
On Thu, 15 May 2008, Nick Piggin wrote:
Oh, I get that confused because of the mixed up naming conventions
there: unmap_page_range should actually be called zap_page_range. But
at any rate, yes we can easily zap pagetables without holding mmap_sem.
How is that synchronized with code that
On Thu, May 15, 2008 at 10:33:57AM -0700, Christoph Lameter wrote:
On Thu, 15 May 2008, Nick Piggin wrote:
Oh, I get that confused because of the mixed up naming conventions
there: unmap_page_range should actually be called zap_page_range. But
at any rate, yes we can easily zap pagetables
On Wed, May 14, 2008 at 06:11:22AM +0200, Nick Piggin wrote:
On Tue, May 13, 2008 at 10:32:38AM -0500, Robin Holt wrote:
On Tue, May 13, 2008 at 10:06:44PM +1000, Nick Piggin wrote:
On Thursday 08 May 2008 10:38, Robin Holt wrote:
In order to invalidate the remote page table entries, we
On Wed, 14 May 2008, Robin Holt wrote:
Are you suggesting the sending side would not need to sleep or the
receiving side?
One thing to realize is that most of the time (read: pretty much *always*)
when we have the problem of wanting to sleep inside a spinlock, the
solution is actually to
On Wed, May 14, 2008 at 08:18:21AM -0700, Linus Torvalds wrote:
On Wed, 14 May 2008, Robin Holt wrote:
Are you suggesting the sending side would not need to sleep or the
receiving side?
One thing to realize is that most of the time (read: pretty much *always*)
when we have the problem
On Wed, 14 May 2008, Robin Holt wrote:
Would it be acceptable to always put a sleepable stall in even if the
code path did not require the pages be unwritable prior to continuing?
If we did that, I would be freed from having a pool of invalidate
threads ready for XPMEM to use for that
On Wed, 14 May 2008, Linus Torvalds wrote:
One thing to realize is that most of the time (read: pretty much *always*)
when we have the problem of wanting to sleep inside a spinlock, the
solution is actually to just move the sleeping to outside the lock, and
then have something else that
On Wed, 14 May 2008, Christoph Lameter wrote:
The problem is that the code in rmap.c try_to_umap() and friends loops
over reverse maps after taking a spinlock. The mm_struct is only known
after the rmap has been acccessed. This means *inside* the spinlock.
So you queue them. That's what
On Thursday 08 May 2008 10:38, Robin Holt wrote:
On Wed, May 07, 2008 at 02:36:57PM -0700, Linus Torvalds wrote:
On Wed, 7 May 2008, Andrea Arcangeli wrote:
I think the spinlock-rwsem conversion is ok under config option, as
you can see I complained myself to various of those patches and
On Thursday 08 May 2008 11:34, Andrea Arcangeli wrote:
Sorry for not having completely answered to this. I initially thought
stop_machine could work when you mentioned it, but I don't think it
can even removing xpmem block-inside-mmu-notifier-method requirements.
For stop_machine to solve
On Tue, May 13, 2008 at 10:06:44PM +1000, Nick Piggin wrote:
On Thursday 08 May 2008 10:38, Robin Holt wrote:
In order to invalidate the remote page table entries, we need to message
(uses XPC) to the remote side. The remote side needs to acquire the
importing process's mmap_sem and call
On Tue, May 13, 2008 at 10:32:38AM -0500, Robin Holt wrote:
On Tue, May 13, 2008 at 10:06:44PM +1000, Nick Piggin wrote:
On Thursday 08 May 2008 10:38, Robin Holt wrote:
In order to invalidate the remote page table entries, we need to message
(uses XPC) to the remote side. The remote
On Tue, 2008-05-13 at 22:14 +1000, Nick Piggin wrote:
ea.
I don't see why you're bending over so far backwards to accommodate
this GRU thing that we don't even have numbers for and could actually
potentially be batched up in other ways (eg. using mmu_gather or
mmu_gather-like idea).
I
On Fri, May 09, 2008 at 08:37:29PM +0200, Peter Zijlstra wrote:
Another possibility, would something like this work?
/*
* null out the begin function, no new begin calls can be made
*/
rcu_assing_pointer(my_notifier.invalidate_start_begin, NULL);
/*
* lock/unlock all rmap
On Fri, 2008-05-09 at 20:55 +0200, Andrea Arcangeli wrote:
On Fri, May 09, 2008 at 08:37:29PM +0200, Peter Zijlstra wrote:
Another possibility, would something like this work?
/*
* null out the begin function, no new begin calls can be made
*/
On Thu, 8 May 2008, Andrea Arcangeli wrote:
Actually I looked both at the struct and at the slab alignment just in
case it was changed recently. Now after reading your mail I also
compiled it just in case.
Put the flag after the spinlock, not after the list_head.
Also, we'd need to make
On Thu, May 08, 2008 at 09:11:33AM -0700, Linus Torvalds wrote:
Btw, this is an issue only on 32-bit x86, because on 64-bit one we already
have the padding due to the alignment of the 64-bit pointers in the
list_head (so there's already empty space there).
On 32-bit, the alignment of
# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1210115136 -7200
# Node ID 6b384bb988786aa78ef07440180e4b2948c4c6a2
# Parent 58f716ad4d067afb6bdd1b5f7042e19d854aae0d
anon-vma-rwsem
Convert the anon_vma spinlock to a rw semaphore. This allows concurrent
traversal of reverse
On Wed, 7 May 2008, Andrea Arcangeli wrote:
Convert the anon_vma spinlock to a rw semaphore. This allows concurrent
traversal of reverse maps for try_to_unmap() and page_mkclean(). It also
allows the calling of sleeping functions from reverse map traversal as
needed for the notifier
On Wed, May 07, 2008 at 01:56:23PM -0700, Linus Torvalds wrote:
This also looks very debatable indeed. The only performance numbers quoted
are:
This results in f.e. the Aim9 brk performance test to got down by 10-15%.
which just seems like a total disaster.
The whole series looks
On Wed, 7 May 2008, Andrea Arcangeli wrote:
I think the spinlock-rwsem conversion is ok under config option, as
you can see I complained myself to various of those patches and I'll
take care they're in a mergeable state the moment I submit them. What
XPMEM requires are different semantics
On Wed, May 07, 2008 at 02:36:57PM -0700, Linus Torvalds wrote:
had to do any blocking I/O during vmtruncate before, now we have to.
I really suspect we don't really have to, and that it would be better to
just fix the code that does that.
I'll let you discuss with Christoph and Robin
On Thu, 8 May 2008 00:22:05 +0200
Andrea Arcangeli [EMAIL PROTECTED] wrote:
No, the simple solution is to just make up a whole new upper-level lock,
and get that lock *first*. You can then take all the multiple locks at a
lower level in any order you damn well please.
Unfortunately
And I don't see a problem in making the conversion from
spinlock-rwsem only if CONFIG_XPMEM=y as I doubt XPMEM works on
anything but ia64.
That is currently true but we are also working on XPMEM for x86_64.
The new XPMEM code should be posted within a few weeks.
--- jack
On Wed, May 07, 2008 at 03:31:03PM -0700, Andrew Morton wrote:
Nope. We only need to take the global lock before taking *two or more* of
the per-vma locks.
I really wish I'd thought of that.
I don't see how you can avoid taking the system-wide-global lock
before every single
On Thu, 8 May 2008 00:44:06 +0200
Andrea Arcangeli [EMAIL PROTECTED] wrote:
On Wed, May 07, 2008 at 03:31:03PM -0700, Andrew Morton wrote:
Nope. We only need to take the global lock before taking *two or more* of
the per-vma locks.
I really wish I'd thought of that.
I don't see how
On Wed, May 07, 2008 at 03:44:24PM -0700, Linus Torvalds wrote:
On Thu, 8 May 2008, Andrea Arcangeli wrote:
Unfortunately the lock you're talking about would be:
static spinlock_t global_lock = ...
There's no way to make it more granular.
Right. So what?
It's still about
To remove mm_lock without adding an horrible system-wide lock before
every i_mmap_lock etc.. we've to remove
invalidate_range_begin/end. Then we can return to an older approach of
doing only invalidate_page and serializing it with the PT lock against
get_user_pages. That works fine for KVM but GRU
On Thu, 8 May 2008, Andrea Arcangeli wrote:
mmu_notifier_register only runs when windows or linux or macosx
boots. Who could ever care of the msec spent in mm_lock compared to
the time it takes to linux to boot?
Andrea, you're *this* close to going to my list of people who it is not
worth
On Thu, 2008-05-08 at 00:44 +0200, Andrea Arcangeli wrote:
Please note, we can't allow a thread to be in the middle of
zap_page_range while mmu_notifier_register runs.
You said yourself that mmu_notifier_register can be as slow as you
want ... what about you use stop_machine for it ? I'm not
On Wed, 7 May 2008, Linus Torvalds wrote:
The code that can take many locks, will have to get the global lock *and*
order the types, but that's still trivial. It's something like
spin_lock(global_lock);
for (vma = mm-mmap; vma; vma = vma-vm_next) {
if
Hi Andrew,
On Wed, May 07, 2008 at 03:59:14PM -0700, Andrew Morton wrote:
CPU0: CPU1:
spin_lock(global_lock)
spin_lock(a-lock); spin_lock(b-lock);
== mmu_notifier_register()
spin_lock(b-lock);
On Thu, May 08, 2008 at 09:28:38AM +1000, Benjamin Herrenschmidt wrote:
On Thu, 2008-05-08 at 00:44 +0200, Andrea Arcangeli wrote:
Please note, we can't allow a thread to be in the middle of
zap_page_range while mmu_notifier_register runs.
You said yourself that mmu_notifier_register
On Wed, 7 May 2008, Christoph Lameter wrote:
Multiple vmas may share the same mapping or refer to the same anonymous
vma. The above code will deadlock since we may take some locks multiple
times.
Ok, so that actually _is_ a problem. It would be easy enough to also add
just a flag to the
On Wed, May 07, 2008 at 02:36:57PM -0700, Linus Torvalds wrote:
On Wed, 7 May 2008, Andrea Arcangeli wrote:
I think the spinlock-rwsem conversion is ok under config option, as
you can see I complained myself to various of those patches and I'll
take care they're in a mergeable state the
On Wed, May 07, 2008 at 05:03:30PM -0700, Linus Torvalds wrote:
On Wed, 7 May 2008, Christoph Lameter wrote:
Multiple vmas may share the same mapping or refer to the same anonymous
vma. The above code will deadlock since we may take some locks multiple
times.
Ok, so that
On Wed, 7 May 2008, Linus Torvalds wrote:
On Wed, 7 May 2008, Christoph Lameter wrote:
Multiple vmas may share the same mapping or refer to the same anonymous
vma. The above code will deadlock since we may take some locks multiple
times.
Ok, so that actually _is_ a problem. It
On Wed, 7 May 2008, Robin Holt wrote:
In order to invalidate the remote page table entries, we need to message
(uses XPC) to the remote side. The remote side needs to acquire the
importing process's mmap_sem and call zap_page_range(). Between the
messaging and the acquiring a sleeping
On Thu, 8 May 2008, Andrea Arcangeli wrote:
Hi Andrew,
On Wed, May 07, 2008 at 03:59:14PM -0700, Andrew Morton wrote:
CPU0: CPU1:
spin_lock(global_lock)
spin_lock(a-lock); spin_lock(b-lock);
==
On Wed, 7 May 2008, Christoph Lameter wrote:
Set the vma flag when we locked it and then skip when we find it locked
right? This would be in addition to the global lock?
Yes. And clear it before unlocking (and again, testing if it's already
clear - you mustn't unlock twice, so you must
On Wed, 7 May 2008, Linus Torvalds wrote:
and you're now done. You have your mm_lock() (which still needs to be
renamed - it should be a mmu_notifier_lock() or something like that),
but you don't need the insane sorting. At most you apparently need a way
to recognize duplicates (so that
On Wed, May 07, 2008 at 06:02:49PM -0700, Linus Torvalds wrote:
You replace mm_lock() with the sequence that Andrew gave you (and I
described):
spin_lock(global_lock)
.. get all locks UNORDERED ..
spin_unlock(global_lock)
and you're now done. You have your mm_lock()
On Wed, 7 May 2008, Christoph Lameter wrote:
On Wed, 7 May 2008, Linus Torvalds wrote:
and you're now done. You have your mm_lock() (which still needs to be
renamed - it should be a mmu_notifier_lock() or something like that),
but you don't need the insane sorting. At most you
Sorry for not having completely answered to this. I initially thought
stop_machine could work when you mentioned it, but I don't think it
can even removing xpmem block-inside-mmu-notifier-method requirements.
For stop_machine to solve this (besides being slower and potentially
not more safe as
On Wed, 7 May 2008, Christoph Lameter wrote:
(That said, we're not running out of vm flags yet, and if we were, we
could just add another word. We're already wasting that space right now on
64-bit by calling it unsigned long).
We sure have enough flags.
Oh, btw, I was wrong - we
On Wed, May 07, 2008 at 06:39:48PM -0700, Linus Torvalds wrote:
On Wed, 7 May 2008, Christoph Lameter wrote:
(That said, we're not running out of vm flags yet, and if we were, we
could just add another word. We're already wasting that space right now
on
64-bit by calling it
On Thu, 8 May 2008, Andrea Arcangeli wrote:
So because the bitflag can't prevent taking the same lock twice on two
different vmas in the same mm, we still can't remove the sorting
Andrea.
Take five minutes. Take a deep breadth. And *think* about actually reading
what I wrote.
The
On Wed, May 07, 2008 at 06:57:05PM -0700, Linus Torvalds wrote:
Take five minutes. Take a deep breadth. And *think* about actually reading
what I wrote.
The bitflag *can* prevent taking the same lock twice. It just needs to be
in the right place.
It's not that I didn't read it, but to do
Andrea, I'm not interested. I've stated my standpoint: the code being
discussed is crap. We're not doing that. Not in the core VM.
I gave solutions that I think aren't crap, but I already also stated that
I have no problems not merging it _ever_ if no solution can be found. The
whole issue
On Wed, May 07, 2008 at 06:12:32PM -0700, Christoph Lameter wrote:
Andrea's mm_lock could have wider impact. It is the first effective
way that I have seen of temporarily holding off reclaim from an address
space. It sure is a brute force approach.
The only improvement I can imagine on
On Thu, 8 May 2008, Andrea Arcangeli wrote:
to the sort function to break the loop. After that we remove the 512
vma cap and mm_lock is free to run as long as it wants like
/dev/urandom, nobody can care less how long it will run before
returning as long as it reacts to signals.
Look Linus
On Wed, May 07, 2008 at 08:10:33PM -0700, Christoph Lameter wrote:
On Thu, 8 May 2008, Andrea Arcangeli wrote:
to the sort function to break the loop. After that we remove the 512
vma cap and mm_lock is free to run as long as it wants like
/dev/urandom, nobody can care less how long it
On Thu, 8 May 2008, Andrea Arcangeli wrote:
But removing sort isn't worth it if it takes away ram from the VM even
when global_mm_lock will never be called.
Andrea, you really are a piece of work. Your arguments have been bogus
crap that didn't even understand what was going on from the
On Wed, May 07, 2008 at 09:14:45PM -0700, Linus Torvalds wrote:
IOW, you didn't even look at it, did you?
Actually I looked both at the struct and at the slab alignment just in
case it was changed recently. Now after reading your mail I also
compiled it just in case.
2.6.26-rc1
# name
On Thu, May 8, 2008 at 8:20 AM, Andrea Arcangeli [EMAIL PROTECTED] wrote:
Actually I looked both at the struct and at the slab alignment just in
case it was changed recently. Now after reading your mail I also
compiled it just in case.
@@ -27,6 +27,7 @@ struct anon_vma {
struct
On Thu, May 8, 2008 at 8:27 AM, Pekka Enberg [EMAIL PROTECTED] wrote:
You might want to read carefully what Linus wrote:
The one that already has a 4 byte padding thing on x86-64 just after the
spinlock? And that on 32-bit x86 (with less than 256 CPU's) would have two
bytes of padding
On Thu, May 08, 2008 at 08:30:20AM +0300, Pekka Enberg wrote:
On Thu, May 8, 2008 at 8:27 AM, Pekka Enberg [EMAIL PROTECTED] wrote:
You might want to read carefully what Linus wrote:
The one that already has a 4 byte padding thing on x86-64 just after the
spinlock? And that on 32-bit
# HG changeset patch
# User Andrea Arcangeli [EMAIL PROTECTED]
# Date 1209740186 -7200
# Node ID 0be678c52e540d5f5d5fd9af549b57b9bb018d32
# Parent de28c85baef11b90c993047ca851a2f52c85a5be
anon-vma-rwsem
Convert the anon_vma spinlock to a rw semaphore. This allows concurrent
traversal of reverse
63 matches
Mail list logo