Hi,
I'm now working on porting to x86_64 with help from Nauman Rafique.
Here is the preliminary patch. If there is someone who is interested
in x86_64 support, please join.
This patch is to support x86_64 on Oren's checkpoint/restart patchset
(v12 on December 29th). His patchset is well implement
Hi Nikanth,
On Fri, Jan 23, 2009 at 6:56 AM, Nikanth Karthikesan wrote:
>
> From: Nikanth Karthikesan
>
> Cgroup based OOM killer controller
>
> Signed-off-by: Nikanth Karthikesan
>
> ---
>
> This is a container group based approach to override the oom killer selection
> without losing all the
On Thu, Jan 22, 2009 at 5:21 AM, Evgeniy Polyakov wrote:
> Having userspace to decide which task to kill may not work in some cases
> at all (when task is swapped and we need to kill someone to get the mem
> to swap out the task, which will make that decision).
That's true in the case of a global
Quoting Andrew Morton (a...@linux-foundation.org):
> On Fri, 16 Jan 2009 20:03:32 -0600
> "Serge E. Hallyn" wrote:
>
> > Implement multiple mounts of the mqueue file system, and
> > link it to usage of CLONE_NEWIPC.
> >
> > Each ipc ns has a corresponding mqueuefs superblock. When
> > a user do
On Tue, Jan 27, 2009 at 12:41:03PM -0800, David Rientjes (rient...@google.com)
wrote:
> > Having some special application which will monitor /dev/mem_notify and
> > kill processes based on its own hueristics is a good idea, but when it
> > fails to do its work (or does not exist) system has to hav
On Tue, Jan 27, 2009 at 09:10:53PM +0530, Balbir Singh
(bal...@linux.vnet.ibm.com) wrote:
> > Having some special application which will monitor /dev/mem_notify and
> > kill processes based on its own hueristics is a good idea, but when it
> > fails to do its work (or does not exist) system has to
On Tue, Jan 27, 2009 at 12:37:21PM -0800, David Rientjes (rient...@google.com)
wrote:
> > Well, oom-killer can, since it drops unkillable state from the process
> > mask, that may be not enough though, but it tries more than userspace.
> >
>
> The only thing it does is send a SIGKILL and gives t
When seeing a CR_FD_PIPE file type, we create a new pipe and thus
have two file pointers (read- and write- ends). We only use one of
them, depending on which side was checkpointed first. We register the
file pointer of the other end in the hash table, with the 'objref'
given for this pipe from the
The patches can pulled from the git tree (branch 'ckpt-v13-dev')
git://git.ncl.cs.columbia.edu/pub/git/linux-cr.git
Oren Laadan wrote:
> Adds support for chekcpoint-restart of open pipes.
>
> This patchset applies on top of 'ckpt-v13' recent patchset.
>
> A new test program (test4.c in
This patchset adds support for chekcpoint-restart of open pipes.
It applies on top of 'ckpt-v13' patch set.
A new test program (test4.c in userspace tools) provides a test
case for this new functionality.
Oren.
___
Containers mailing list
contain...@
While file pointers are shared objects, they may share an underlying
object themselves. For instance, file pointers of both ends of a pipe
that share the same pipe inode. In this case, the shared entity to
handle is the inode that is shared among two file pointers (e.g read-
and write- ends). In th
A pipe is essentially a double-headed inode with a buffer attached to
it. We checkpoint the pipe buffer only once, as soon as we hit one
side of the pipe, regardless whether it is read- or write- end.
To checkpoint a file descriptor that refers to a pipe (either end), we
first lookup the inode in
Adds support for chekcpoint-restart of open pipes.
This patchset applies on top of 'ckpt-v13' recent patchset.
A new test program (test4.c in userspace tools) provides a test
case for this new functionality.
Oren.
___
Containers mailing list
contain.
Quoting Andrew Morton (a...@linux-foundation.org):
> On Fri, 16 Jan 2009 20:02:48 -0600
> "Serge E. Hallyn" wrote:
>
> > IPC namespaces are completely disjoint id->object mappings.
> > A task can pass CLONE_NEWIPC to unshare and clone to get
> > a new, empty, IPC namespace. Until now this has su
On Tue, 27 Jan 2009, Evgeniy Polyakov wrote:
> Having some special application which will monitor /dev/mem_notify and
> kill processes based on its own hueristics is a good idea, but when it
> fails to do its work (or does not exist) system has to have ability to
> make a progress and invoke a mai
On Tue, 27 Jan 2009, Evgeniy Polyakov wrote:
> > There is no additional oom killer limitation imposed here, nor can the oom
> > killer kill a task hung in D state any better than userspace.
>
> Well, oom-killer can, since it drops unkillable state from the process
> mask, that may be not enough
On Tue, 27 Jan 2009, Nikanth Karthikesan wrote:
> > That's certainly idealistic, but cannot be done in an inexpensive way that
> > would scale with the large systems that clients of cpusets typically use.
>
> If we kill only the tasks for which cpuset_mems_allowed_intersects() is true
> on the f
For each VMA, there is a 'struct cr_vma'; if the VMA is file-mapped,
it will be followed by the file name. Then comes the actual contents,
in one or more chunk: each chunk begins with a header that specifies
how many pages it holds, then the virtual addresses of all the dumped
pages in that chunk,
Covers application checkpoint/restart, overall design, interfaces,
usage, shared objects, and and checkpoint image format.
Changelog[v8]:
- Split into multiple files in Documentation/checkpoint/...
- Extend documentation, fix typos and comments from feedback
Signed-off-by: Oren Laadan
Acked-
Add those interfaces, as well as helpers needed to easily manage the
file format. The code is roughly broken out as follows:
checkpoint/sys.c - user/kernel data transfer, as well as setup of the
CR context (a per-checkpoint data structure for housekeeping)
checkpoint/checkpoint.c - output wrappe
Add logic to save and restore architecture specific state, including
thread-specific state, CPU registers and FPU state.
In addition, architecture capabilities are saved in an architecure
specific extension of the header (cr_hdr_head_arch); Currently this
includes only FPU capabilities.
Currently
Restarting of multiple processes expects all restarting tasks to call
sys_restart(). Once inside the system call, each task will restart
itself at the same order that they were saved. The internals of the
syscall will take care of in-kernel synchronization bewteen tasks.
This patch does _not_ crea
Dump the files_struct of a task with 'struct cr_hdr_files', followed by
all open file descriptors. Because the 'struct file' corresponding to an
FD can be shared, each they are assigned an objref and registered in the
object hash. A reference to the 'file *' is kept for as long as it lives
in the h
Infrastructure to handle objects that may be shared and referenced by
multiple tasks or other objects, e..g open files, memory address space
etc.
The state of shared objects is saved once. On the first encounter, the
state is dumped and the object is assigned a unique identifier (objref)
and also
Restore open file descriptors: for each FD read 'struct cr_hdr_fd_ent'
and lookup objref in the hash table; if not found (first occurence), read
in 'struct cr_hdr_fd_data', create a new FD and register in the hash.
Otherwise attach the file pointer from the hash as an FD.
This patch only handles b
Checkpoint-restart (c/r): a couple of fixes in preparation for 64bit
architectures, and a couple of fixes for bugss (comments from Serge
Hallyn, Sudakvev Bhattiprolu and Nathan Lynch). Updated and tested
against v2.6.28.
Aiming for -mm.
The git tree tracking v13, branch 'ckpt-v13' (and older vers
Create trivial sys_checkpoint and sys_restore system calls. They will
enable to checkpoint and restart an entire container, to and from a
checkpoint image file descriptor.
The syscalls take a file descriptor (for the image file) and flags as
arguments. For sys_checkpoint the first argument identif
Now we can do "external" checkpoint, i.e. act on another task.
sys_checkpoint() now looks up the target pid (in our namespace) and
checkpoints that corresponding task. That task should be the root of
a container.
sys_restart() remains the same, as the restart is always done in the
context of the
These two are used in the next patch when calling vfs_read/write()
---
fs/read_write.c| 10 --
include/linux/fs.h | 10 ++
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 969a6d9..dda4eab 100644
--- a/fs/read_write.c
From: Dave Hansen
Suggested by Ingo.
Checkpoint/restart is going to be a long effort to get things working.
We're going to have a lot of things that we know just don't work for
a long time. That doesn't mean that it will be useless, it just means
that there's some complicated features that we a
Oren Laadan wrote:
> Changelog[v5]:
> - Config is 'def_bool n' by default
That's true by default; it doesn't have to be written/typed.
> Signed-off-by: Oren Laadan
> Acked-by: Serge Hallyn
> Signed-off-by: Dave Hansen
> ---
> arch/x86/include/asm/unistd_32.h |2 +
> arch/x86/kernel/s
Serge E. Hallyn wrote:
> Implement the s390 arch-specific checkpoint/restart helpers. This
Thanks for the patch.
I will assume that the s390 specifics are correct...
> is on top of Oren Laadan's c/r code (which so far was x86_32-only)
> submitted here: http://lkml.org/lkml/2008/12/29/38, plus
Restoring the memory address space begins with nuking the existing one
of the current process, and then reading the VMA state and contents.
Call do_mmap_pgoffset() for each VMA and then read in the data.
Changelog[v13]:
- Avoid access to hh->vma_type after the header is freed
- Test for no vma
Checkpointing of multiple processes works by recording the tasks tree
structure below a given task (usually this task is the container init).
For a given task, do a DFS scan of the tasks tree and collect them
into an array (keeping a reference to each task). Using DFS simplifies
the recreation of
Quoting Ralph-Gordon Paul (ralph-gordon.p...@uni-duesseldorf.de):
> Hello,
>
> i'm searching for the right Mailing List for Linux checkpoint / restart.
>
> I'm working for the XtreemOS Project (http://www.xtreemos.eu). We want
> to include the linux native checkpoint / restart, but it seems to
* Evgeniy Polyakov [2009-01-27 16:45:59]:
> Hi.
>
> On Tue, Jan 27, 2009 at 07:40:58PM +0900, KOSAKI Motohiro
> (kosaki.motoh...@jp.fujitsu.com) wrote:
> > I'd like to respect your requiremnt. but I also would like to know
> > why you like deterministic hierarchy oom than notification.
> >
> >
Hi.
On Tue, Jan 27, 2009 at 07:40:58PM +0900, KOSAKI Motohiro
(kosaki.motoh...@jp.fujitsu.com) wrote:
> I'd like to respect your requiremnt. but I also would like to know
> why you like deterministic hierarchy oom than notification.
>
> I think one of problem is, current patch description is a b
Hi David.
On Tue, Jan 27, 2009 at 01:37:55AM -0800, David Rientjes (rient...@google.com)
wrote:
> > /dev/mem_notify is a great idea, but please do not limit existing
> > oom-killer in its ability to do the job and do not rely on application's
> > ability to send a SIGKILL which will not kill task
On Tuesday 27 January 2009 16:51:26 David Rientjes wrote:
> On Tue, 27 Jan 2009, Nikanth Karthikesan wrote:
> > > I don't understand what you're arguing for here. Are you suggesting
> > > that we should not prefer tasks that intersect the set of allowable
> > > nodes? That makes no sense if the go
On Tue, 27 Jan 2009, Nikanth Karthikesan wrote:
> > I don't understand what you're arguing for here. Are you suggesting that
> > we should not prefer tasks that intersect the set of allowable nodes?
> > That makes no sense if the goal is to allow for future memory freeing.
> >
>
> No. Actually I
On Tuesday 27 January 2009 16:23:00 David Rientjes wrote:
> On Tue, 27 Jan 2009, Nikanth Karthikesan wrote:
> > > As previously stated, I think the heuristic to penalize tasks for not
> > > having an intersection with the set of allowable nodes of the oom
> > > triggering task could be made slightl
On Tue, 27 Jan 2009, Nikanth Karthikesan wrote:
> > As previously stated, I think the heuristic to penalize tasks for not
> > having an intersection with the set of allowable nodes of the oom
> > triggering task could be made slightly more severe. That's irrelevant to
> > your patch, though.
> >
Hi Evgeniy,
> On Mon, Jan 26, 2009 at 11:51:27PM -0800, David Rientjes
> (rient...@google.com) wrote:
> > Yeah, I proposed /dev/mem_notify being made as a client of cgroups there
> > in http://marc.info/?l=linux-kernel&m=123200623628685
> >
> > How do you replace the oom killer's capability of
On Saturday 24 January 2009 02:14:59 David Rientjes wrote:
> On Fri, 23 Jan 2009, Nikanth Karthikesan wrote:
> > In other instances, It can actually also kill some innocent tasks unless
> > the administrator tunes oom_adj, say something like kvm which would have
> > a huge memory accounted, but mig
On Tue, 27 Jan 2009, Evgeniy Polyakov wrote:
> /dev/mem_notify is a great idea, but please do not limit existing
> oom-killer in its ability to do the job and do not rely on application's
> ability to send a SIGKILL which will not kill tasks in unkillable state
> contrary to oom-killer.
>
You're
On Mon, Jan 26, 2009 at 11:51:27PM -0800, David Rientjes (rient...@google.com)
wrote:
> Yeah, I proposed /dev/mem_notify being made as a client of cgroups there
> in http://marc.info/?l=linux-kernel&m=123200623628685
>
> How do you replace the oom killer's capability of giving a killed task
> a
46 matches
Mail list logo