On Tue, 4 Mar 2008, Nick Piggin wrote:
Then put it into the arch code for TLB invalidation. Paravirt ops gives
good examples on how to do that.
Put what into arch code?
The mmu notifier code.
What about a completely different approach... XPmem runs over NUMAlink,
right? Why not
On Wednesday 05 March 2008 05:58, Christoph Lameter wrote:
On Tue, 4 Mar 2008, Nick Piggin wrote:
Then put it into the arch code for TLB invalidation. Paravirt ops gives
good examples on how to do that.
Put what into arch code?
The mmu notifier code.
It isn't arch specific.
On Mon, 3 Mar 2008, Nick Piggin wrote:
Your skeleton is just registering notifiers and saying
/* you fill the hard part in */
If somebody needs a skeleton in order just to register the notifiers,
then almost by definition they are unqualified to write the hard
part ;)
Its also providing
On Thursday 28 February 2008 09:35, Christoph Lameter wrote:
On Wed, 20 Feb 2008, Nick Piggin wrote:
On Friday 15 February 2008 17:49, Christoph Lameter wrote:
Also, what we are going to need here are not skeleton drivers
that just do all the *easy* bits (of registering their callbacks),
On Thu, Feb 28, 2008 at 04:59:59PM -0800, Christoph Lameter wrote:
And thus the device driver may stop receiving data on a UP system? It will
never get the ack.
Not sure to follow, sorry.
My idea was:
post the invalidate in the mmio region of the device
smp_call_function()
while
On Fri, 29 Feb 2008, Andrea Arcangeli wrote:
On Thu, Feb 28, 2008 at 04:59:59PM -0800, Christoph Lameter wrote:
And thus the device driver may stop receiving data on a UP system? It will
never get the ack.
Not sure to follow, sorry.
My idea was:
post the invalidate in the mmio
On Fri, Feb 29, 2008 at 11:55:17AM -0800, Christoph Lameter wrote:
post the invalidate in the mmio region of the device
smp_call_function()
while (mmio device wait-bitflag is on);
So the device driver on UP can only operate through interrupts? If you are
hogging the only cpu
On Fri, 29 Feb 2008, Andrea Arcangeli wrote:
Agreed. I just thought xpmem needed an invalidate-by-page, but
I'm glad if xpmem can go in sync with the KVM/GRU/DRI model in this
regard.
That means we need both the anon_vma locks and the i_mmap_lock to become
semaphores. I think semaphores are
On Fri, 29 Feb 2008, Andrea Arcangeli wrote:
On Fri, Feb 29, 2008 at 01:03:16PM -0800, Christoph Lameter wrote:
That means we need both the anon_vma locks and the i_mmap_lock to become
semaphores. I think semaphores are better than mutexes. Rik and Lee saw
some performance improvements
On Fri, Feb 29, 2008 at 01:34:34PM -0800, Christoph Lameter wrote:
On Fri, 29 Feb 2008, Andrea Arcangeli wrote:
On Fri, Feb 29, 2008 at 01:03:16PM -0800, Christoph Lameter wrote:
That means we need both the anon_vma locks and the i_mmap_lock to become
semaphores. I think semaphores are
On Fri, Feb 29, 2008 at 01:03:16PM -0800, Christoph Lameter wrote:
That means we need both the anon_vma locks and the i_mmap_lock to become
semaphores. I think semaphores are better than mutexes. Rik and Lee saw
some performance improvements because list can be traversed in parallel
when
On Fri, Feb 29, 2008 at 02:12:57PM -0800, Christoph Lameter wrote:
On Fri, 29 Feb 2008, Andrea Arcangeli wrote:
AFAICT The rw semaphore fastpath is similar in performance to a rw
spinlock.
read side is taken in the slow path.
Slowpath meaning VM slowpath or lock slow path? Its
On Thu, Feb 28, 2008 at 01:52:50AM +0100, Andrea Arcangeli wrote:
On Wed, Feb 27, 2008 at 04:14:08PM -0800, Christoph Lameter wrote:
Erm. This would also be needed by RDMA etc.
The only RDMA I know is Quadrics, and Quadrics apparently doesn't need
to schedule inside the invalidate methods
On Thu, 28 Feb 2008, Andrea Arcangeli wrote:
On Wed, Feb 27, 2008 at 05:03:21PM -0800, Christoph Lameter wrote:
RDMA works across a network and I would assume that it needs confirmation
that a connection has been torn down before pages can be unmapped.
Depends on the latency of the
On Thu, Feb 28, 2008 at 10:43:54AM -0800, Christoph Lameter wrote:
What about invalidate_page()?
That would just spin waiting an ack (just like the smp-tlb-flushing
invalidates in numa already does).
Thinking more about this, we could also parallelize it with an
invalidate_page_before/end. If
On Fri, 29 Feb 2008, Andrea Arcangeli wrote:
On Thu, Feb 28, 2008 at 10:43:54AM -0800, Christoph Lameter wrote:
What about invalidate_page()?
That would just spin waiting an ack (just like the smp-tlb-flushing
invalidates in numa already does).
And thus the device driver may stop
On Tue, 19 Feb 2008, Andrea Arcangeli wrote:
Yes, that's why I kept maintaining my patch and I posted the last
revision to Andrew. I use pte/tlb locking of the core VM, it's
unintrusive and obviously safe. Furthermore it can be extended with
Christoph's stuff in a 100% backwards compatible
On Wed, 20 Feb 2008, Nick Piggin wrote:
On Friday 15 February 2008 17:49, Christoph Lameter wrote:
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.
If invalidate_range_begin() is called with locks held then we
On Wed, 20 Feb 2008, Andrea Arcangeli wrote:
Well, xpmem requirements are complex. As as side effect of the
simplicity of my approach, my patch is 100% safe since #v1. Now it
also works for GRU and it cluster invalidates.
The patch has to satisfy RDMA, XPMEM, GRU and KVM. I keep hearing that
Also, what we are going to need here are not skeleton drivers
that just do all the *easy* bits (of registering their callbacks),
but actual fully working examples that do everything that any
real driver will need to do. If not for the sanity of the driver
writer, then for the sanity
On Wed, Feb 27, 2008 at 02:23:29PM -0800, Christoph Lameter wrote:
How would that work? You rely on the pte locking. Thus calls are all in an
I don't rely on the pte locking in #v7, exactly to satisfy GRU
(so far purely theoretical) performance complains.
atomic context. I think we need a
On Wed, 27 Feb 2008, Christoph Lameter wrote:
Could you be specific? This refers to page migration? Hmmm... Guess we
would need to inc the refcount there instead?
Argh. No its the callback list scanning. Yuck. No one noticed.
On Wed, Feb 27, 2008 at 02:35:59PM -0800, Christoph Lameter wrote:
Could you be specific? This refers to page migration? Hmmm... Guess we
If the reader schedule, the synchronize_rcu will return in the other
cpu and the objects in the list will be freed and overwritten, and
when the task is
On Thu, 28 Feb 2008, Andrea Arcangeli wrote:
3. Keep the refcount elevated until pages are freed in another execution
context.
Page refcount is not enough (the mmu_notifier_release will run in
another cpu the moment after i_mmap_lock is unlocked) but mm_users may
prevent us to change
On Wed, Feb 27, 2008 at 02:39:46PM -0800, Christoph Lameter wrote:
On Wed, 20 Feb 2008, Andrea Arcangeli wrote:
Well, xpmem requirements are complex. As as side effect of the
simplicity of my approach, my patch is 100% safe since #v1. Now it
also works for GRU and it cluster invalidates.
On Wed, Feb 27, 2008 at 04:14:08PM -0800, Christoph Lameter wrote:
Erm. This would also be needed by RDMA etc.
The only RDMA I know is Quadrics, and Quadrics apparently doesn't need
to schedule inside the invalidate methods AFIK, so I doubt the above
is true. It'd be interesting to know if IB is
On Thu, 28 Feb 2008, Andrea Arcangeli wrote:
On Wed, Feb 27, 2008 at 04:14:08PM -0800, Christoph Lameter wrote:
Erm. This would also be needed by RDMA etc.
The only RDMA I know is Quadrics, and Quadrics apparently doesn't need
to schedule inside the invalidate methods AFIK, so I doubt the
On Wed, Feb 27, 2008 at 05:03:21PM -0800, Christoph Lameter wrote:
RDMA works across a network and I would assume that it needs confirmation
that a connection has been torn down before pages can be unmapped.
Depends on the latency of the network, for example with page pinning
it can even try
On Friday 15 February 2008 17:49, Christoph Lameter wrote:
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.
If invalidate_range_begin() is called with locks held then we
pass a flag into invalidate_range() to indicate
On Tue, Feb 19, 2008 at 07:54:14PM +1100, Nick Piggin wrote:
As far as sleeping inside callbacks goes... I think there are big
problems with the patch (the sleeping patch and the external rmap
patch). I don't think it is workable in its current state. Either
we have to make some big changes to
On Friday 15 February 2008 17:49, Christoph Lameter wrote:
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.
If invalidate_range_begin() is called with locks held then we
pass a flag into invalidate_range() to indicate
On Wed, Feb 20, 2008 at 10:08:49AM +1100, Nick Piggin wrote:
You can't sleep inside rcu_read_lock()!
I must say that for a patch that is up to v8 or whatever and is
posted twice a week to such a big cc list, it is kind of slack to
not even test it and expect other people to review it.
Well,
On Wed, Feb 20, 2008 at 02:11:41PM +1100, Nick Piggin wrote:
On Wednesday 20 February 2008 14:00, Robin Holt wrote:
On Wed, Feb 20, 2008 at 02:00:38AM +0100, Andrea Arcangeli wrote:
On Wed, Feb 20, 2008 at 10:08:49AM +1100, Nick Piggin wrote:
Also, how to you resolve the case where you
On Fri, 15 Feb 2008, Andrew Morton wrote:
On Thu, 14 Feb 2008 22:49:01 -0800 Christoph Lameter [EMAIL PROTECTED]
wrote:
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.
hm. Do they? Why? If I'm in the process
On Thu, 14 Feb 2008 22:49:01 -0800 Christoph Lameter [EMAIL PROTECTED] wrote:
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.
hm. Do they? Why? If I'm in the process of zero-copy writing a hunk of
memory out to
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.
If invalidate_range_begin() is called with locks held then we
pass a flag into invalidate_range() to indicate that no sleeping is
possible. Locks are only held for truncate
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.
If invalidate_range_begin() is called with locks held then we
pass a flag into invalidate_range() to indicate that no sleeping is
possible. Locks are only held for truncate
On Wed, Jan 30, 2008 at 05:46:21PM -0800, Christoph Lameter wrote:
Well the GRU uses follow_page() instead of get_user_pages. Performance is
a major issue for the GRU.
GRU is a external TLB, we have to allocate RAM instead but we do it
through the regular userland paging mechanism.
On Tue, Jan 29, 2008 at 06:28:05PM -0600, Jack Steiner wrote:
On Tue, Jan 29, 2008 at 04:20:50PM -0800, Christoph Lameter wrote:
On Wed, 30 Jan 2008, Andrea Arcangeli wrote:
invalidate_range after populate allows access to memory for which ptes
were zapped and the refcount was
Robin, if you don't mind, could you please post or upload somewhere
your GPLv2 code that registers itself in Christoph's V2 notifiers? Or
is it top secret? I wouldn't mind to have a look so I can better
understand what's the exact reason you're sleeping besides attempting
GFP_KERNEL
On Wed, Jan 30, 2008 at 10:11:24AM -0600, Robin Holt wrote:
Robin, if you don't mind, could you please post or upload somewhere
your GPLv2 code that registers itself in Christoph's V2 notifiers? Or
is it top secret? I wouldn't mind to have a look so I can better
understand what's the exact
On Wed, Jan 30, 2008 at 06:04:52PM +0100, Andrea Arcangeli wrote:
On Wed, Jan 30, 2008 at 10:11:24AM -0600, Robin Holt wrote:
...
The three issues we need to simultaneously solve is revoking the remote
page table/tlb information while still in a sleepable context and not
having the remote
On Wed, 30 Jan 2008, Robin Holt wrote:
I think I need to straighten this discussion out in my head a little bit.
Am I correct in assuming Andrea's original patch set did not have any SMP
race conditions for KVM? If so, then we need to start looking at how to
implement Christoph's and my
On Wed, Jan 30, 2008 at 11:41:29AM -0800, Christoph Lameter wrote:
On Wed, 30 Jan 2008, Jack Steiner wrote:
I see what you mean. I need to review to mail to see why this changed
but in the original discussions with Christoph, the invalidate_range
callouts were suppose to be made BEFORE
On Wed, 30 Jan 2008, Jack Steiner wrote:
Seems that we cannot rely on the invalidate_ranges for correctness at all?
We need to have invalidate_page() always. invalidate_range() is only an
optimization.
I don't understand your point an optimization. How would invalidate_range
as
On Wed, Jan 30, 2008 at 11:50:26AM -0800, Christoph Lameter wrote:
On Wed, 30 Jan 2008, Andrea Arcangeli wrote:
XPMEM requires with invalidate_range (sleepy) +
before_invalidate_range (sleepy). invalidate_all should also be called
before_release (both sleepy).
It sounds we need full
On Wed, Jan 30, 2008 at 11:50:26AM -0800, Christoph Lameter wrote:
Then we have
invalidate_range_start(mm)
and
invalidate_range_finish(mm, start, end)
in addition to the invalidate rmap_notifier?
---
include/linux/mmu_notifier.h |7 +--
1 file changed, 5 insertions(+),
On Thu, 31 Jan 2008, Andrea Arcangeli wrote:
- void (*invalidate_range)(struct mmu_notifier *mn,
+ void (*invalidate_range_begin)(struct mmu_notifier *mn,
struct mm_struct *mm,
-unsigned long start, unsigned long end,
On Thu, 31 Jan 2008, Andrea Arcangeli wrote:
On Wed, Jan 30, 2008 at 04:01:31PM -0800, Christoph Lameter wrote:
How we offload that? Before the scan of the rmaps we do not have the
mmstruct. So we'd need another notifier_rmap_callback.
My assumption is that that int lock exists just
Patch to
1. Remove sync on notifier_release. Must be called when only a
single process remain.
2. Add invalidate_range_start/end. This should allow safe removal
of ranges of external ptes without having to resort to a callback
for every individual page.
This must be able to nest so
On Wed, 30 Jan 2008, Robin Holt wrote:
Well the GRU uses follow_page() instead of get_user_pages. Performance is
a major issue for the GRU.
Worse, the GRU takes its TLB faults from within an interrupt so we
use follow_page to prevent going to sleep. That said, I think we
could
On Wed, Jan 30, 2008 at 06:08:14PM -0800, Christoph Lameter wrote:
hlist_for_each_entry_safe_rcu(mn, n, t,
mm-mmu_notifier.head, hlist) {
hlist_del_rcu(mn-hlist);
On Thu, 31 Jan 2008, Andrea Arcangeli wrote:
On Wed, Jan 30, 2008 at 06:08:14PM -0800, Christoph Lameter wrote:
hlist_for_each_entry_safe_rcu(mn, n, t,
mm-mmu_notifier.head, hlist) {
Well the GRU uses follow_page() instead of get_user_pages. Performance is
a major issue for the GRU.
Worse, the GRU takes its TLB faults from within an interrupt so we
use follow_page to prevent going to sleep. That said, I think we
could probably use follow_page() with FOLL_GET set to
On Mon, Jan 28, 2008 at 12:28:42PM -0800, Christoph Lameter wrote:
Index: linux-2.6/mm/fremap.c
===
--- linux-2.6.orig/mm/fremap.c2008-01-25 19:31:05.0 -0800
+++ linux-2.6/mm/fremap.c 2008-01-25
Christoph, the below patch should fix the current leak of the pinned
pages. I hope the page-pin that should be dropped by the
invalidate_range op, is enough to prevent the physical page mapped
on that mm+address to change before invalidate_range returns. If
that would ever happen, there would be a
On Tue, Jan 29, 2008 at 11:55:10AM -0800, Christoph Lameter wrote:
I am not sure. AFAICT you wrote that code.
Actually I didn't need to change a single line in do_wp_page because
ptep_clear_flush was already doing everything transparently for
me. This was the memory.c part of my last patch I
On Tue, Jan 29, 2008 at 12:30:06PM -0800, Christoph Lameter wrote:
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
diff --git a/mm/fremap.c b/mm/fremap.c
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -212,8 +212,8 @@ asmlinkage long sys_remap_file_pages(uns
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
diff --git a/mm/fremap.c b/mm/fremap.c
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -212,8 +212,8 @@ asmlinkage long sys_remap_file_pages(uns
spin_unlock(mapping-i_mmap_lock);
}
+ err = populate_range(mm, vma, start, size,
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
+ mmu_notifier(invalidate_range, mm, address,
+ address + PAGE_SIZE - 1, 0);
page_table = pte_offset_map_lock(mm, pmd, address, ptl);
if (likely(pte_same(*page_table, orig_pte))) {
if (old_page)
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
It seems to be okay to invalidate range if you hold mmap_sem writably. In
that case no additional faults can happen that would create new ptes.
In that place the mmap_sem is taken but in readonly mode. I never rely
on the mmap_sem in the mmu
On Tue, Jan 29, 2008 at 01:53:05PM -0800, Christoph Lameter wrote:
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
We invalidate the range *after* populating it? Isnt it okay to establish
references while populate_range() runs?
It's not ok because that function can very well overwrite
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
We invalidate the range *after* populating it? Isnt it okay to establish
references while populate_range() runs?
It's not ok because that function can very well overwrite existing and
present ptes (it's actually the nonlinear common case fast
n Tue, 29 Jan 2008, Andrea Arcangeli wrote:
hmm, there where? When I said it was taken in readonly mode I meant
for the quoted code (it would be at the top if it wasn't cut), so I
quote below again:
+ mmu_notifier(invalidate_range, mm, address,
+ address +
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
But now I think there may be an issue with a third thread that may
show unsafe the removal of invalidate_page from ptep_clear_flush.
A third thread writing to a page through the linux-pte and the guest
VM writing to the same page through the
On Tue, Jan 29, 2008 at 02:55:56PM -0800, Christoph Lameter wrote:
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
But now I think there may be an issue with a third thread that may
show unsafe the removal of invalidate_page from ptep_clear_flush.
A third thread writing to a page through
On Tue, Jan 29, 2008 at 02:39:00PM -0800, Christoph Lameter wrote:
If it does not run in write mode then concurrent faults are permissible
while we remap pages. Weird. Maybe we better handle this like individual
page operations? Put the invalidate_page back into zap_pte. But then there
would
On Wed, 30 Jan 2008, Andrea Arcangeli wrote:
On Wed, Jan 30, 2008 at 01:00:39AM +0100, Andrea Arcangeli wrote:
get_user_pages, regular linux writes don't fault unless it's
explicitly writeprotect, which is mandatory in a few archs, x86 not).
actually get_user_pages doesn't fault either
On Tue, Jan 29, 2008 at 04:20:50PM -0800, Christoph Lameter wrote:
On Wed, 30 Jan 2008, Andrea Arcangeli wrote:
invalidate_range after populate allows access to memory for which ptes
were zapped and the refcount was released.
The last refcount is released by the invalidate_range
On Wed, Jan 30, 2008 at 01:00:39AM +0100, Andrea Arcangeli wrote:
get_user_pages, regular linux writes don't fault unless it's
explicitly writeprotect, which is mandatory in a few archs, x86 not).
actually get_user_pages doesn't fault either but it calls into
set_page_dirty, however
On Wed, 30 Jan 2008, Andrea Arcangeli wrote:
A user space spinlock plays into this??? That is irrelevant to the kernel.
And we are discussing your placement of the invalidate_range not mine.
With my code, invalidate_range wasn't placed there at all, my
modification to ptep_clear_flush
On Tue, 29 Jan 2008, Jack Steiner wrote:
That is true for your implementation and to address Robin's issues. Jack:
Is that true for the GRU?
I'm not sure I understand the question. The GRU never (currently) takes
a reference on a page. It has no mechanism for tracking pages that
were
On Wed, 30 Jan 2008, Andrea Arcangeli wrote:
invalidate_range after populate allows access to memory for which ptes
were zapped and the refcount was released.
The last refcount is released by the invalidate_range itself.
That is true for your implementation and to address Robin's issues.
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.
Most of the VM address space changes can use the range invalidate
callback.
invalidate_range() is generally called with mmap_sem held but
no spinlocks are active. If
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.
Most of the VM address space changes can use the range invalidate
callback.
invalidate_range() is generally called with mmap_sem held but
no spinlocks are active. If
75 matches
Mail list logo