Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-22 Thread Ray Bryant
Andi Kleen wrote: OK, so what is the alternative? Well, if we had a va_start and va_end (or a va_start and length) we could move the shared object once using a call of the form migrate_pages(pid, va_start, va_end, count, old_node_list, new_node_list); with old_node_list = 0 1 2 ... 31

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-22 Thread Andi Kleen
On Tue, Feb 22, 2005 at 12:45:21PM -0600, Ray Bryant wrote: > Andi Kleen wrote: > > > > >How about you add the va_start, va_end but only accept them > >when pid is 0 (= current process). Otherwise enforce with EINVAL > >that they are both 0. This way you could map the > >shared object into the ba

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-22 Thread Ray Bryant
Andi Kleen wrote: How about you add the va_start, va_end but only accept them when pid is 0 (= current process). Otherwise enforce with EINVAL that they are both 0. This way you could map the shared object into the batch manager, migrate it there, then mark it somehow to not be migrated further, a

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-22 Thread Andi Kleen
On Mon, Feb 21, 2005 at 11:12:14AM -0600, Ray Bryant wrote: > Andi Kleen wrote: > > > > > >I wouldn't bother fixing up VMA policies. > > > > > > How would these policies get changed so that they represent the > reality of the new node location(s) then? Doesn't this have to > happen as part of

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-22 Thread Andi Kleen
> OK, so what is the alternative? Well, if we had a va_start and > va_end (or a va_start and length) we could move the shared object > once using a call of the form > >migrate_pages(pid, va_start, va_end, count, old_node_list, > new_node_list); > > with old_node_list = 0 1 2 ... 31 >

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-21 Thread Ray Bryant
Andi, Oops. It's late. The pargraph below in my previous note confused cpus and nodes. It should have read as follows: Let's suppose that nodes 0-1 of a 64 node [was: CPU] system have graphics pipes. To keep it simple, we will assume that there are 2 cpus per node like an Altix [128 CPUS in thi

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-21 Thread Ray Bryant
Andi, I went back and did some digging on one the issues that has dropped off the list here: the case where the set of old nodes and new nodes overlap in some way. No one could provide me with a specific example, but the thread was that "This did happen in certain scenarios". Part of these scenari

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-21 Thread Ray Bryant
Andi Kleen wrote: I wouldn't bother fixing up VMA policies. How would these policies get changed so that they represent the reality of the new node location(s) then? Doesn't this have to happen as part of migrate_pages()? -- Best Regards, Ray ---

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-21 Thread Andi Kleen
On Mon, Feb 21, 2005 at 02:42:16AM -0600, Ray Bryant wrote: > All, > > Just an update on the idea of migrating a process without suspending > it. > > The hard part of the problem here is to make sure that the page_migrate() > system call sees all of the pages to migrate. If the process that is >

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-21 Thread Paul Jackson
Ray wrote: > As I understood it, we were converging on the following: > (1) ... > (2) ... > (3) ... > This is different than your reply above, which seems to imply that: > (A) ... > (B) ... Andi reacted to various details of (A) and (B). Any chance, Andi, of you directly stating whethe

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-21 Thread Andi Kleen
On Mon, Feb 21, 2005 at 01:29:41AM -0600, Ray Bryant wrote: > This is different than your reply above, which seems to imply that: > > (A) Step 1 is to migrate mapped files using mbind(). I don't understand > how to do this in general, because: > (a) I don't know how to make a non-racy

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-21 Thread Ray Bryant
All, Just an update on the idea of migrating a process without suspending it. The hard part of the problem here is to make sure that the page_migrate() system call sees all of the pages to migrate. If the process that is being migrated can still allocate pages, then the page_migrate() call may mis

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-20 Thread Ray Bryant
Paul Jackson wrote: You have to walk to full node mapping for each array, but even with hundreds of nodes that should not be that costly I presume if you knew that the job only had pages on certain nodes, perhaps due to aggressive use of cpusets, that you would only have to walk those nodes, right

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-20 Thread Ray Bryant
Andi Kleen wrote: Do you have any better way to suggest, Andi, for a batch manager to relocate a job? The typical scenario, as Ray explained it to me, is - Give the shared libraries and any other files a suitable policy (by mapping them and applying mbind) - Then execute migrate_pages() for the

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-20 Thread Ray Bryant
Andi Kleen wrote: But we are least at the level of agreeing that the new system call looks something like the following: migrate_pages(pid, count, old_list, new_list); right? For the external case probably yes. For internal (process does this on its own address space) it should be hooked into mbin

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-20 Thread Paul Jackson
> - Give the shared libraries and any other files a suitable policy > (by mapping them and applying mbind) Ah - I think you've said this before, and I'm being a bit retarded. You're saying that one could horse around with the physical placement of existing files mapped into another tasks space b

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-20 Thread Andi Kleen
> Do you have any better way to suggest, Andi, for a batch manager to > relocate a job? The typical scenario, as Ray explained it to me, is - Give the shared libraries and any other files a suitable policy (by mapping them and applying mbind) - Then execute migrate_pages() for the anonymous pag

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-20 Thread Paul Jackson
Andi wrote: > I still think it's fundamentally unclean and racy. External processes > shouldn't mess with virtual addresses of other processes. It's not really messing with (changing) the virtual addresses of another process. It's messing with the physical placement. It's using the virtual addre

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-20 Thread Andi Kleen
> >Perhaps node masks would be better and teaching the kernel to handle > >relative distances inside the masks transparently while migrating? > >Not sure how complicated this would be to implement though. > > > >Supporting interleaving on the new nodes may be also useful, that would > >need a polic

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-18 Thread Ray Bryant
Andi Kleen wrote: [Enjoy your vacation] [I am thanks -- or I was -- I go home tomorrow] I assume they would allow marking arbitary segments with specific policies, so it should be possible. An alternative way to handle shared libraries BTW would be to add the ELF headers Steve did in his patch. And

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-18 Thread Ray Bryant
Andi Kleen wrote: You and Robin mentioned some problems with "double migration" with that, but it's still not completely clear to me what problem you're solving here. Perhaps that needs to be reexamined. There is one other case where Robin and I have talked about double migration. That is the cas

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-18 Thread Ray Bryant
Andi, et al: I see that several messages have been sent in the interim. I apologize for being "out of sync", but today is my last day to go skiing and it is gorgeous outside. I'll try to catch up and digest everthing later. -- --- Ray Bryant 512-453-967

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-18 Thread Ray Bryant
Here's an interface proposal that may be a middle ground and should satisfy both small and large system requirements: The system call interface would be: page_migrate(pid, va_start, va_end, count, old_node_list, new_node_list); (e. g. same as before, but please keep reading): The following rest

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-18 Thread Paul Jackson
Andi wrote: > Problem is what happens > when some memory is in some other node due to memory pressure fallbacks. > Your scheme would not migrate this memory at all. The arrays of old and new nodes handle this fine. Include that 'other node' in the array of old nodes, and the corresponding new nod

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-18 Thread Paul Jackson
Andi wrote: > e.g. job runs threads on nodes 0,1,2,3 and you want it to move > to nodes 4,5,6,7 with all memory staying staying in the same > distance from the new CPUs as it were from the old CPUs, right? > > It explains why you want old_node, you would do > (assuming node mask arguments) Yu

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-18 Thread Paul Jackson
Andi wrote: > I don't like old_node* very much because it's imho unreliable > (because you can usually never fully know on which nodes the old > process was and there can be good reasons to just migrate everything) That's one way that the arrays of old and new nodes pays off. You can list any old

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-18 Thread Paul Jackson
Andi - what does this line mean: + node mask length. I guess its the names of the parameters in a proposed migration system call. Length of what, mask of what, what's the node mean, huh? -- I won't rest till it's the best ... Programmer, Linux Scalability

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-18 Thread Andi Kleen
[Enjoy your vacation] On Fri, Feb 18, 2005 at 02:38:42AM -0600, Ray Bryant wrote: > > Let's start off with at least one thing we can agree on. If xattrs > are already part of XFS, then it seems reasonable to use an extended > attribute to mark certain files as non-migratable. (Some further > t

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-18 Thread Ray Bryant
Andi Kleen wrote: [Sorry for the late answer.] No problem, remember, I'm supposed to be on vacation, anyway. :-) Let's start off with at least one thing we can agree on. If xattrs are already part of XFS, then it seems reasonable to use an extended attribute to mark certain files as non-migratabl

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-17 Thread Andi Kleen
[Sorry for the late answer.] On Tue, Feb 15, 2005 at 09:44:41PM -0600, Ray Bryant wrote: > > > > > >Sorry, but the only real difference between your API and mbind is that > >yours has a pid argument. > > > > That may be true, but the internals of the implementations have got > to be pretty diffe

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-15 Thread Ray Bryant
Andi Kleen wrote: Making memory migration a subset of page migration is not a general solution. It only works for programs that are using memory policy to control placement. As I've tried to point out multiple times before, most programs that I am aware of use placement based on first-touch. Wh

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-15 Thread Paul Jackson
Thanks Andi for your effort to present your case more completely. I agree that there is some 'talking by each other' going on. Dave Hansen has publically (and Ray privately) sought to move this discussion to linux-mm (or more specifically, off lkml for now). Any chance, Andi, that you could repos

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-15 Thread Andi Kleen
> Making memory migration a subset of page migration is not a general > solution. It only works for programs that are using memory policy > to control placement. As I've tried to point out multiple times > before, most programs that I am aware of use placement based on > first-touch. When we mi

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-15 Thread Ray Bryant
Andi Kleen wrote: [Sorry, didn't answer to everything in your mail the first time. See previous mail for beginning] On Mon, Feb 14, 2005 at 06:29:45PM -0600, Ray Bryant wrote: migrating, and figure out from that what portions of which pid's address spaces need to migrated so that we satisfy the c

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-15 Thread Andi Kleen
> I really don't see how that is relevant to the current discussion, which, > as AFAIK, is that the kernel interface should be "migrate an entire process" > versus what I have proposed. What we are trying to avoid here for shared > libraries is two things: (1) don't migrate them needlessly, and (

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-15 Thread Ray Bryant
Andi Kleen wrote: (1) You really don't want to migrate the code pages of shared libraries that are mapped into the process address space. This causes a useless shuffling of pages which really doesn't help system performance. On the other hand, if a shared library is some private

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-15 Thread Ray Bryant
Robin Holt wrote: On Mon, Feb 14, 2005 at 06:29:45PM -0600, Ray Bryant wrote: which is what you are asking for, I think. The library's job (in addition to suspending all of the processes in the list for the duration of the migration operation, plus do some other things that are specific to sn2 har

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-15 Thread Paul Jackson
Ray wrote: > The exact ordering of when a task is moved to a new cpuset and when the > migration occurs doesn't matter, AFAIK, if we accept the notion that > a migrated task is in suspended state until after everything associated > with it (including the new cpuset definition) is done. The existan

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-15 Thread Paul Jackson
Would it work to have the migration system call take exactly two node numbers, the old and the new? Have it migrate all pages in the address space specified that are on the old node to the new node. Leave any other pages alone. For one thing, this avoids passing a long list of nodes, for an N-wa

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-15 Thread Paul Jackson
Robin wrote: > for the second process and then from node 8 to node 4 for the second. "for the second ... for the second" I couldn't make sense of this statement. Should one of those seconds be a first; what word(s) are elided after the second "second"? -- I won't rest till it

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-15 Thread Robin Holt
On Tue, Feb 15, 2005 at 12:53:03PM +0100, Andi Kleen wrote: > > (2) You really only want to migrate pages once. If a file is mapped > > into several of the pid's that are being migrated, then you want > > to figure this out and issue one call to have it moved wrt one of > > the pid

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview II

2005-02-15 Thread Andi Kleen
[Sorry, didn't answer to everything in your mail the first time. See previous mail for beginning] On Mon, Feb 14, 2005 at 06:29:45PM -0600, Ray Bryant wrote: > migrating, and figure out from that what portions of which pid's > address spaces need to migrated so that we satisfy the constraints > g

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-15 Thread Andi Kleen
> (1) You really don't want to migrate the code pages of shared libraries > that are mapped into the process address space. This causes a > useless shuffling of pages which really doesn't help system > performance. On the other hand, if a shared library is some > private thin

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-15 Thread Robin Holt
On Mon, Feb 14, 2005 at 06:29:45PM -0600, Ray Bryant wrote: > which is what you are asking for, I think. The library's job > (in addition to suspending all of the processes in the list for > the duration of the migration operation, plus do some other things > that are specific to sn2 hardware) wou

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-15 Thread Ray Bryant
Paul Jackson wrote: Ray wrote: [Thus the disclaimer in the overview note that we have figured all the interaction with memory policy stuff yet.] Does the same disclaimer apply to cpusets? Unless it causes some undo pain, I would think that page migration should _not_ violate a tasks cpuset. I gue

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-14 Thread Paul Jackson
Ray wrote: > [Thus the disclaimer in > the overview note that we have figured all the interaction with > memory policy stuff yet.] Does the same disclaimer apply to cpusets? Unless it causes some undo pain, I would think that page migration should _not_ violate a tasks cpuset. I guess this means

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-14 Thread Steve Longerbeam
Andi Kleen wrote: For our use, the batch scheduler will give an intermediary program a list of processes and a series of from-to node pairs. That process would then ensure all the processes are stopped, scan their VMAs to determine what regions are mapped by more than one process, which are mapped

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-14 Thread Ray Bryant
Andi Kleen wrote: Ray Bryant <[EMAIL PROTECTED]> writes: set of pages associated with a particular process need to be moved. The kernel interface that we are proposing is the following: page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes); [Only commenting on the interface, haven't rea

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-14 Thread Ray Bryant
Andi Kleen wrote: But how do you use mbind() to change the memory placement for an anonymous private mapping used by a vendor provided executable with mbind()? For that you use set_mempolicy. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to [EMAIL PROTECTED] For

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-14 Thread Andi Kleen
> For our use, the batch scheduler will give an intermediary program a > list of processes and a series of from-to node pairs. That process would > then ensure all the processes are stopped, scan their VMAs to determine > what regions are mapped by more than one process, which are mapped > by addi

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-14 Thread Andi Kleen
> But how do you use mbind() to change the memory placement for an anonymous > private mapping used by a vendor provided executable with mbind()? For that you use set_mempolicy. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROT

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-14 Thread Robin Holt
On Sat, Feb 12, 2005 at 10:29:14PM +0100, Andi Kleen wrote: > On Sat, Feb 12, 2005 at 01:54:26PM -0200, Marcelo Tosatti wrote: > > On Sat, Feb 12, 2005 at 12:17:25PM +0100, Andi Kleen wrote: > > > Ray Bryant <[EMAIL PROTECTED]> writes: > > > > set of pages associated with a particular process need

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-14 Thread Robin Holt
On Sat, Feb 12, 2005 at 12:17:25PM +0100, Andi Kleen wrote: > Ray Bryant <[EMAIL PROTECTED]> writes: > > set of pages associated with a particular process need to be moved. > > The kernel interface that we are proposing is the following: > > > > page_migrate(pid, va_start, va_end, count, old_nodes,

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-12 Thread Andi Kleen
On Sat, Feb 12, 2005 at 01:54:26PM -0200, Marcelo Tosatti wrote: > On Sat, Feb 12, 2005 at 12:17:25PM +0100, Andi Kleen wrote: > > Ray Bryant <[EMAIL PROTECTED]> writes: > > > set of pages associated with a particular process need to be moved. > > > The kernel interface that we are proposing is the

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-12 Thread Marcelo Tosatti
On Sat, Feb 12, 2005 at 01:54:26PM -0200, Marcelo Tosatti wrote: > On Sat, Feb 12, 2005 at 12:17:25PM +0100, Andi Kleen wrote: > > Ray Bryant <[EMAIL PROTECTED]> writes: > > > set of pages associated with a particular process need to be moved. > > > The kernel interface that we are proposing is the

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-12 Thread Marcelo Tosatti
On Sat, Feb 12, 2005 at 12:17:25PM +0100, Andi Kleen wrote: > Ray Bryant <[EMAIL PROTECTED]> writes: > > set of pages associated with a particular process need to be moved. > > The kernel interface that we are proposing is the following: > > > > page_migrate(pid, va_start, va_end, count, old_nodes,

Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-12 Thread Andi Kleen
Ray Bryant <[EMAIL PROTECTED]> writes: > set of pages associated with a particular process need to be moved. > The kernel interface that we are proposing is the following: > > page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes); [Only commenting on the interface, haven't read your pat

[RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

2005-02-11 Thread Ray Bryant
Overview The purpose of this set of patches is to introduce (one part of) the necessary kernel infrastructure to support "manual page migration". That phrase is intended to describe a facility whereby some user program (most likely a batch scheduler) is given the responsibility of managin