Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Paul Jackson
Good explanation, Robin. Thanks. See y'all on linux-mm. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "uns

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Robin Holt
On Wed, Feb 16, 2005 at 08:58:19AM +1100, Peter Chubb wrote: > > "Robin" == Robin Holt <[EMAIL PROTECTED]> writes: > > Robin> On Tue, Feb 15, 2005 at 08:35:29AM -0800, Paul Jackson wrote: > >> What about the suggestion I had that you sort of skipped over, > >> which amounted to changing the sy

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Paul Jackson
Dr Peter Chubb writes: > Can page migration be done lazily, instead of all at once? That might be a useful option. Not my area to comment on. We would also require, at least as an option, to be able to force the migration on demand. Some of our big honkin iron parallel jobs run with a high degr

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Dave Hansen
In the interest of the size of everyone's inboxes, I mentioned to Ray that we might move this discussion to a smaller forum while we resolve some of the outstanding issues. Ray's going to post a followup to to linux-mm, and trim the cc list down. So, if you're still interested, keep your eyes on

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Robin Holt
On Tue, Feb 15, 2005 at 08:35:29AM -0800, Paul Jackson wrote: > What about the suggestion I had that you sort of skipped over, which > amounted to changing the system call from a node array to just one > node: > > sys_page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes); > > to: >

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Ray Bryant
Dave Hansen wrote: On Tue, 2005-02-15 at 04:50 -0600, Robin Holt wrote: What is the fundamental opposition to an array from from-to node mappings? They are not that difficult to follow. They make the expensive traversal of ptes the single pass operation. The time to scan the list of from nodes to

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Dave Hansen
On Tue, 2005-02-15 at 04:50 -0600, Robin Holt wrote: > What is the fundamental opposition to an array from from-to node mappings? > They are not that difficult to follow. They make the expensive traversal > of ptes the single pass operation. The time to scan the list of from nodes > to locate the

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Paul Jackson
Robin wrote: > That seems like it is insane! Thank-you, thank-you. What about the suggestion I had that you sort of skipped over, which amounted to changing the system call from a node array to just one node: sys_page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes); to: s

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Robin Holt
On Tue, Feb 15, 2005 at 07:49:06AM -0800, Paul Jackson wrote: > Robin wrote: > > Then how do you handle overlapping nodes. If I am doing a 5->4, 4->3, > > 3->2, 2->1 shift ... > > Then do the shifts in the other order, first 2->1, then 3->2, ... > > So now you ask, what if you are doing a rotati

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Paul Jackson
Robin wrote: > Then how do you handle overlapping nodes. If I am doing a 5->4, 4->3, > 3->2, 2->1 shift ... Then do the shifts in the other order, first 2->1, then 3->2, ... So now you ask, what if you are doing a rotation? Use a temporary node: 2->tmp, 3->2, ..., N->(N-1), tmp->N. So now you

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Paul Jackson
Robin wrote: > Requiring that the process is stopped will somewhat limit the use of > this API outside of the HPC space where so much control can be had over > the processes. Good point. Hopefully we can find a way to design this system call so that it does not require suspension. Some uses of

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Paul Jackson
Robin wrote: > Given that the first user of this may place in onto a 256 node system, > the chances that they use the same node in the source and destination node > array are very good. Am I parsing this sentence correctly when I read it as stating that we need to handle the case where the source

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-15 Thread Robin Holt
On Mon, Feb 14, 2005 at 02:22:54PM -0800, Dave Hansen wrote: > On Mon, 2005-02-14 at 16:01 -0600, Robin Holt wrote: > > On Mon, Feb 14, 2005 at 10:50:42AM -0800, Dave Hansen wrote: > > > On Mon, 2005-02-14 at 07:52 -0600, Robin Holt wrote: > > > > The node mask is a list of allowed. This is intend

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-14 Thread Dave Hansen
On Mon, 2005-02-14 at 16:01 -0600, Robin Holt wrote: > On Mon, Feb 14, 2005 at 10:50:42AM -0800, Dave Hansen wrote: > > On Mon, 2005-02-14 at 07:52 -0600, Robin Holt wrote: > > > The node mask is a list of allowed. This is intended to be as near > > > to a one-to-one migration path as possible. >

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-14 Thread Robin Holt
On Mon, Feb 14, 2005 at 10:50:42AM -0800, Dave Hansen wrote: > On Mon, 2005-02-14 at 07:52 -0600, Robin Holt wrote: > > The node mask is a list of allowed. This is intended to be as near > > to a one-to-one migration path as possible. > > If that's the case, it would make the kernel internals a b

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-14 Thread Dave Hansen
On Mon, 2005-02-14 at 07:52 -0600, Robin Holt wrote: > The node mask is a list of allowed. This is intended to be as near > to a one-to-one migration path as possible. If that's the case, it would make the kernel internals a bit simpler to only take a "from" and "to" node, instead of those maps.

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-14 Thread Robin Holt
On Sat, Feb 12, 2005 at 01:04:22PM -0800, Dave Hansen wrote: > On Fri, 2005-02-11 at 19:26 -0800, Ray Bryant wrote: > > This patch introduces the sys_page_migrate() system call: > > > > sys_page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes); > > > > Its intent is to cause the pages

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-12 Thread Paul Jackson
Dave wrote: > Might it be useful to use nodemasks instead of those arrays? I don't think he can. A nodemask represents an unorderd set of nodes. He needs (or wants) to pass a map, mapping the node that each page might be on, to the node to which it should migrate. A bitmask doesn't contain eno

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-12 Thread Dave Hansen
On Fri, 2005-02-11 at 19:26 -0800, Ray Bryant wrote: > This patch introduces the sys_page_migrate() system call: > > sys_page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes); > > Its intent is to cause the pages in the range given that are found on > old_nodes[i] to be moved to new_no

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-12 Thread Paul Jackson
Andi wrote: > They're already exposed through mbind/set_mempolicy/get_mempolicy and sysfs > of course. And soon I hope through cpusets ;). -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-12 Thread Andi Kleen
On Sat, Feb 12, 2005 at 07:34:32AM -0500, Arjan van de Ven wrote: > On Fri, 2005-02-11 at 19:26 -0800, Ray Bryant wrote: > > This patch introduces the sys_page_migrate() system call: > > > > sys_page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes); > > are you really sure you want to

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-12 Thread Arjan van de Ven
On Fri, 2005-02-11 at 19:26 -0800, Ray Bryant wrote: > This patch introduces the sys_page_migrate() system call: > > sys_page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes); are you really sure you want to expose nodes to userspace via an ABI this solid and never changing? To me that

Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate

2005-02-12 Thread Paul Jackson
Minor comments ... nothing profound. Ray wrote: > once we agree on what the authority model should be. Are the usual kill-like permissions sufficient? You can migrate the pages of a process if you can kill it. === In the following routine, tighten up some vertical spacing, add { ... } , ... Th