Andi Kleen wrote:
OK, so what is the alternative? Well, if we had a va_start and
va_end (or a va_start and length) we could move the shared object
once using a call of the form
migrate_pages(pid, va_start, va_end, count, old_node_list,
new_node_list);
with old_node_list = 0 1 2 ... 31
On Tue, Feb 22, 2005 at 12:45:21PM -0600, Ray Bryant wrote:
> Andi Kleen wrote:
>
> >
> >How about you add the va_start, va_end but only accept them
> >when pid is 0 (= current process). Otherwise enforce with EINVAL
> >that they are both 0. This way you could map the
> >shared object into the ba
Andi Kleen wrote:
How about you add the va_start, va_end but only accept them
when pid is 0 (= current process). Otherwise enforce with EINVAL
that they are both 0. This way you could map the
shared object into the batch manager, migrate it there, then
mark it somehow to not be migrated further, a
On Mon, Feb 21, 2005 at 11:12:14AM -0600, Ray Bryant wrote:
> Andi Kleen wrote:
>
>
> >
> >I wouldn't bother fixing up VMA policies.
> >
> >
>
> How would these policies get changed so that they represent the
> reality of the new node location(s) then? Doesn't this have to
> happen as part of
> OK, so what is the alternative? Well, if we had a va_start and
> va_end (or a va_start and length) we could move the shared object
> once using a call of the form
>
>migrate_pages(pid, va_start, va_end, count, old_node_list,
> new_node_list);
>
> with old_node_list = 0 1 2 ... 31
>
Andi,
Oops. It's late. The pargraph below in my previous note confused
cpus and nodes. It should have read as follows:
Let's suppose that nodes 0-1 of a 64 node [was: CPU] system have graphics
pipes. To keep it simple, we will assume that there are 2 cpus
per node like an Altix [128 CPUS in thi
Andi,
I went back and did some digging on one the issues that has dropped
off the list here: the case where the set of old nodes and new
nodes overlap in some way. No one could provide me with a specific
example, but the thread was that "This did happen in certain scenarios".
Part of these scenari
Andi Kleen wrote:
I wouldn't bother fixing up VMA policies.
How would these policies get changed so that they represent the
reality of the new node location(s) then? Doesn't this have to
happen as part of migrate_pages()?
--
Best Regards,
Ray
---
On Mon, Feb 21, 2005 at 02:42:16AM -0600, Ray Bryant wrote:
> All,
>
> Just an update on the idea of migrating a process without suspending
> it.
>
> The hard part of the problem here is to make sure that the page_migrate()
> system call sees all of the pages to migrate. If the process that is
>
Ray wrote:
> As I understood it, we were converging on the following:
> (1) ...
> (2) ...
> (3) ...
> This is different than your reply above, which seems to imply that:
> (A) ...
> (B) ...
Andi reacted to various details of (A) and (B).
Any chance, Andi, of you directly stating whethe
On Mon, Feb 21, 2005 at 01:29:41AM -0600, Ray Bryant wrote:
> This is different than your reply above, which seems to imply that:
>
> (A) Step 1 is to migrate mapped files using mbind(). I don't understand
> how to do this in general, because:
> (a) I don't know how to make a non-racy
All,
Just an update on the idea of migrating a process without suspending
it.
The hard part of the problem here is to make sure that the page_migrate()
system call sees all of the pages to migrate. If the process that is
being migrated can still allocate pages, then the page_migrate() call
may mis
Paul Jackson wrote:
You have to walk to full node mapping for each array, but
even with hundreds of nodes that should not be that costly
I presume if you knew that the job only had pages on certain nodes,
perhaps due to aggressive use of cpusets, that you would only have to
walk those nodes, right
Andi Kleen wrote:
Do you have any better way to suggest, Andi, for a batch manager to
relocate a job? The typical scenario, as Ray explained it to me, is
- Give the shared libraries and any other files a suitable policy
(by mapping them and applying mbind)
- Then execute migrate_pages() for the
Andi Kleen wrote:
But we are least at the level of agreeing that the new system
call looks something like the following:
migrate_pages(pid, count, old_list, new_list);
right?
For the external case probably yes. For internal (process does this
on its own address space) it should be hooked into mbin
> - Give the shared libraries and any other files a suitable policy
> (by mapping them and applying mbind)
Ah - I think you've said this before, and I'm being a bit retarded.
You're saying that one could horse around with the physical placement of
existing files mapped into another tasks space b
> Do you have any better way to suggest, Andi, for a batch manager to
> relocate a job? The typical scenario, as Ray explained it to me, is
- Give the shared libraries and any other files a suitable policy
(by mapping them and applying mbind)
- Then execute migrate_pages() for the anonymous pag
Andi wrote:
> I still think it's fundamentally unclean and racy. External processes
> shouldn't mess with virtual addresses of other processes.
It's not really messing with (changing) the virtual addresses of
another process. It's messing with the physical placement. It's
using the virtual addre
> >Perhaps node masks would be better and teaching the kernel to handle
> >relative distances inside the masks transparently while migrating?
> >Not sure how complicated this would be to implement though.
> >
> >Supporting interleaving on the new nodes may be also useful, that would
> >need a polic
Andi Kleen wrote:
[Enjoy your vacation]
[I am thanks -- or I was -- I go home tomorrow]
I assume they would allow marking arbitary segments with specific
policies, so it should be possible.
An alternative way to handle shared libraries BTW would be to add the ELF
headers Steve did in his patch. And
Andi Kleen wrote:
You and Robin mentioned some problems with "double migration"
with that, but it's still not completely clear to me what
problem you're solving here. Perhaps that needs to be reexamined.
There is one other case where Robin and I have talked about double
migration. That is the cas
Andi, et al:
I see that several messages have been sent in the interim.
I apologize for being "out of sync", but today is my last
day to go skiing and it is gorgeous outside. I'll try
to catch up and digest everthing later.
--
---
Ray Bryant
512-453-967
Here's an interface proposal that may be a middle ground and
should satisfy both small and large system requirements:
The system call interface would be:
page_migrate(pid, va_start, va_end, count, old_node_list, new_node_list);
(e. g. same as before, but please keep reading):
The following rest
Andi wrote:
> Problem is what happens
> when some memory is in some other node due to memory pressure fallbacks.
> Your scheme would not migrate this memory at all.
The arrays of old and new nodes handle this fine.
Include that 'other node' in the array of old nodes,
and the corresponding new nod
Andi wrote:
> e.g. job runs threads on nodes 0,1,2,3 and you want it to move
> to nodes 4,5,6,7 with all memory staying staying in the same
> distance from the new CPUs as it were from the old CPUs, right?
>
> It explains why you want old_node, you would do
> (assuming node mask arguments)
Yu
Andi wrote:
> I don't like old_node* very much because it's imho unreliable
> (because you can usually never fully know on which nodes the old
> process was and there can be good reasons to just migrate everything)
That's one way that the arrays of old and new nodes pays off.
You can list any old
Andi - what does this line mean:
+ node mask length.
I guess its the names of the parameters in a proposed
migration system call. Length of what, mask of what,
what's the node mean, huh?
--
I won't rest till it's the best ...
Programmer, Linux Scalability
[Enjoy your vacation]
On Fri, Feb 18, 2005 at 02:38:42AM -0600, Ray Bryant wrote:
>
> Let's start off with at least one thing we can agree on. If xattrs
> are already part of XFS, then it seems reasonable to use an extended
> attribute to mark certain files as non-migratable. (Some further
> t
Andi Kleen wrote:
[Sorry for the late answer.]
No problem, remember, I'm supposed to be on vacation, anyway. :-)
Let's start off with at least one thing we can agree on. If xattrs
are already part of XFS, then it seems reasonable to use an extended
attribute to mark certain files as non-migratabl
[Sorry for the late answer.]
On Tue, Feb 15, 2005 at 09:44:41PM -0600, Ray Bryant wrote:
> >
> >
> >Sorry, but the only real difference between your API and mbind is that
> >yours has a pid argument.
> >
>
> That may be true, but the internals of the implementations have got
> to be pretty diffe
Andi Kleen wrote:
Making memory migration a subset of page migration is not a general
solution. It only works for programs that are using memory policy
to control placement. As I've tried to point out multiple times
before, most programs that I am aware of use placement based on
first-touch. Wh
Thanks Andi for your effort to present your case more completely.
I agree that there is some 'talking by each other' going on.
Dave Hansen has publically (and Ray privately) sought to
move this discussion to linux-mm (or more specifically,
off lkml for now).
Any chance, Andi, that you could repos
> Making memory migration a subset of page migration is not a general
> solution. It only works for programs that are using memory policy
> to control placement. As I've tried to point out multiple times
> before, most programs that I am aware of use placement based on
> first-touch. When we mi
Andi Kleen wrote:
[Sorry, didn't answer to everything in your mail the first time.
See previous mail for beginning]
On Mon, Feb 14, 2005 at 06:29:45PM -0600, Ray Bryant wrote:
migrating, and figure out from that what portions of which pid's
address spaces need to migrated so that we satisfy the c
> I really don't see how that is relevant to the current discussion, which,
> as AFAIK, is that the kernel interface should be "migrate an entire process"
> versus what I have proposed. What we are trying to avoid here for shared
> libraries is two things: (1) don't migrate them needlessly, and (
Andi Kleen wrote:
(1) You really don't want to migrate the code pages of shared libraries
that are mapped into the process address space. This causes a
useless shuffling of pages which really doesn't help system
performance. On the other hand, if a shared library is some
private
Robin Holt wrote:
On Mon, Feb 14, 2005 at 06:29:45PM -0600, Ray Bryant wrote:
which is what you are asking for, I think. The library's job
(in addition to suspending all of the processes in the list for
the duration of the migration operation, plus do some other things
that are specific to sn2 har
Ray wrote:
> The exact ordering of when a task is moved to a new cpuset and when the
> migration occurs doesn't matter, AFAIK, if we accept the notion that
> a migrated task is in suspended state until after everything associated
> with it (including the new cpuset definition) is done.
The existan
Would it work to have the migration system call take exactly two node
numbers, the old and the new? Have it migrate all pages in the address
space specified that are on the old node to the new node. Leave any
other pages alone. For one thing, this avoids passing a long list of
nodes, for an N-wa
Robin wrote:
> for the second process and then from node 8 to node 4 for the second.
"for the second ... for the second"
I couldn't make sense of this statement. Should one of those
seconds be a first; what word(s) are elided after the second
"second"?
--
I won't rest till it
On Tue, Feb 15, 2005 at 12:53:03PM +0100, Andi Kleen wrote:
> > (2) You really only want to migrate pages once. If a file is mapped
> > into several of the pid's that are being migrated, then you want
> > to figure this out and issue one call to have it moved wrt one of
> > the pid
[Sorry, didn't answer to everything in your mail the first time.
See previous mail for beginning]
On Mon, Feb 14, 2005 at 06:29:45PM -0600, Ray Bryant wrote:
> migrating, and figure out from that what portions of which pid's
> address spaces need to migrated so that we satisfy the constraints
> g
> (1) You really don't want to migrate the code pages of shared libraries
> that are mapped into the process address space. This causes a
> useless shuffling of pages which really doesn't help system
> performance. On the other hand, if a shared library is some
> private thin
On Mon, Feb 14, 2005 at 06:29:45PM -0600, Ray Bryant wrote:
> which is what you are asking for, I think. The library's job
> (in addition to suspending all of the processes in the list for
> the duration of the migration operation, plus do some other things
> that are specific to sn2 hardware) wou
Paul Jackson wrote:
Ray wrote:
[Thus the disclaimer in
the overview note that we have figured all the interaction with
memory policy stuff yet.]
Does the same disclaimer apply to cpusets?
Unless it causes some undo pain, I would think that page migration
should _not_ violate a tasks cpuset. I gue
Ray wrote:
> [Thus the disclaimer in
> the overview note that we have figured all the interaction with
> memory policy stuff yet.]
Does the same disclaimer apply to cpusets?
Unless it causes some undo pain, I would think that page migration
should _not_ violate a tasks cpuset. I guess this means
Andi Kleen wrote:
For our use, the batch scheduler will give an intermediary program a
list of processes and a series of from-to node pairs. That process would
then ensure all the processes are stopped, scan their VMAs to determine
what regions are mapped by more than one process, which are mapped
Andi Kleen wrote:
Ray Bryant <[EMAIL PROTECTED]> writes:
set of pages associated with a particular process need to be moved.
The kernel interface that we are proposing is the following:
page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes);
[Only commenting on the interface, haven't rea
Andi Kleen wrote:
But how do you use mbind() to change the memory placement for an anonymous
private mapping used by a vendor provided executable with mbind()?
For that you use set_mempolicy.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [EMAIL PROTECTED] For
> For our use, the batch scheduler will give an intermediary program a
> list of processes and a series of from-to node pairs. That process would
> then ensure all the processes are stopped, scan their VMAs to determine
> what regions are mapped by more than one process, which are mapped
> by addi
> But how do you use mbind() to change the memory placement for an anonymous
> private mapping used by a vendor provided executable with mbind()?
For that you use set_mempolicy.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROT
On Sat, Feb 12, 2005 at 10:29:14PM +0100, Andi Kleen wrote:
> On Sat, Feb 12, 2005 at 01:54:26PM -0200, Marcelo Tosatti wrote:
> > On Sat, Feb 12, 2005 at 12:17:25PM +0100, Andi Kleen wrote:
> > > Ray Bryant <[EMAIL PROTECTED]> writes:
> > > > set of pages associated with a particular process need
On Sat, Feb 12, 2005 at 12:17:25PM +0100, Andi Kleen wrote:
> Ray Bryant <[EMAIL PROTECTED]> writes:
> > set of pages associated with a particular process need to be moved.
> > The kernel interface that we are proposing is the following:
> >
> > page_migrate(pid, va_start, va_end, count, old_nodes,
On Sat, Feb 12, 2005 at 01:54:26PM -0200, Marcelo Tosatti wrote:
> On Sat, Feb 12, 2005 at 12:17:25PM +0100, Andi Kleen wrote:
> > Ray Bryant <[EMAIL PROTECTED]> writes:
> > > set of pages associated with a particular process need to be moved.
> > > The kernel interface that we are proposing is the
On Sat, Feb 12, 2005 at 01:54:26PM -0200, Marcelo Tosatti wrote:
> On Sat, Feb 12, 2005 at 12:17:25PM +0100, Andi Kleen wrote:
> > Ray Bryant <[EMAIL PROTECTED]> writes:
> > > set of pages associated with a particular process need to be moved.
> > > The kernel interface that we are proposing is the
On Sat, Feb 12, 2005 at 12:17:25PM +0100, Andi Kleen wrote:
> Ray Bryant <[EMAIL PROTECTED]> writes:
> > set of pages associated with a particular process need to be moved.
> > The kernel interface that we are proposing is the following:
> >
> > page_migrate(pid, va_start, va_end, count, old_nodes,
Ray Bryant <[EMAIL PROTECTED]> writes:
> set of pages associated with a particular process need to be moved.
> The kernel interface that we are proposing is the following:
>
> page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes);
[Only commenting on the interface, haven't read your pat
Overview
The purpose of this set of patches is to introduce (one part of) the
necessary kernel infrastructure to support "manual page migration".
That phrase is intended to describe a facility whereby some user program
(most likely a batch scheduler) is given the responsibility of managin
58 matches
Mail list logo