[Devel] [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)

2009-01-27 Thread Masahiko Takahashi
Hi, I'm now working on porting to x86_64 with help from Nauman Rafique. Here is the preliminary patch. If there is someone who is interested in x86_64 support, please join. This patch is to support x86_64 on Oren's checkpoint/restart patchset (v12 on December 29th). His patchset is well implement

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread Paul Menage
Hi Nikanth, On Fri, Jan 23, 2009 at 6:56 AM, Nikanth Karthikesan wrote: > > From: Nikanth Karthikesan > > Cgroup based OOM killer controller > > Signed-off-by: Nikanth Karthikesan > > --- > > This is a container group based approach to override the oom killer selection > without losing all the

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread Paul Menage
On Thu, Jan 22, 2009 at 5:21 AM, Evgeniy Polyakov wrote: > Having userspace to decide which task to kill may not work in some cases > at all (when task is swapped and we need to kill someone to get the mem > to swap out the task, which will make that decision). That's true in the case of a global

[Devel] Re: [PATCH 2/3] ipc namespaces: implement support for posix msqueues

2009-01-27 Thread Serge E. Hallyn
Quoting Andrew Morton (a...@linux-foundation.org): > On Fri, 16 Jan 2009 20:03:32 -0600 > "Serge E. Hallyn" wrote: > > > Implement multiple mounts of the mqueue file system, and > > link it to usage of CLONE_NEWIPC. > > > > Each ipc ns has a corresponding mqueuefs superblock. When > > a user do

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread Evgeniy Polyakov
On Tue, Jan 27, 2009 at 12:41:03PM -0800, David Rientjes (rient...@google.com) wrote: > > Having some special application which will monitor /dev/mem_notify and > > kill processes based on its own hueristics is a good idea, but when it > > fails to do its work (or does not exist) system has to hav

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread Evgeniy Polyakov
On Tue, Jan 27, 2009 at 09:10:53PM +0530, Balbir Singh (bal...@linux.vnet.ibm.com) wrote: > > Having some special application which will monitor /dev/mem_notify and > > kill processes based on its own hueristics is a good idea, but when it > > fails to do its work (or does not exist) system has to

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread Evgeniy Polyakov
On Tue, Jan 27, 2009 at 12:37:21PM -0800, David Rientjes (rient...@google.com) wrote: > > Well, oom-killer can, since it drops unkillable state from the process > > mask, that may be not enough though, but it tries more than userspace. > > > > The only thing it does is send a SIGKILL and gives t

[Devel] [RFC cr-pipe-v13][PATCH 3/3] Restore open pipes

2009-01-27 Thread Oren Laadan
When seeing a CR_FD_PIPE file type, we create a new pipe and thus have two file pointers (read- and write- ends). We only use one of them, depending on which side was checkpointed first. We register the file pointer of the other end in the hash table, with the 'objref' given for this pipe from the

[Devel] Re: [RFC cr-pipe-v13][PATCH 0/3] c/r of open pipes

2009-01-27 Thread Oren Laadan
The patches can pulled from the git tree (branch 'ckpt-v13-dev') git://git.ncl.cs.columbia.edu/pub/git/linux-cr.git Oren Laadan wrote: > Adds support for chekcpoint-restart of open pipes. > > This patchset applies on top of 'ckpt-v13' recent patchset. > > A new test program (test4.c in

[Devel] [RFC cr-pipe-v13][PATCH 0/3] c/r of open pipes

2009-01-27 Thread Oren Laadan
This patchset adds support for chekcpoint-restart of open pipes. It applies on top of 'ckpt-v13' patch set. A new test program (test4.c in userspace tools) provides a test case for this new functionality. Oren. ___ Containers mailing list contain...@

[Devel] [RFC cr-pipe-v13][PATCH 1/3] A new file type (CR_FD_OBJREF) for a file descriptor already setup

2009-01-27 Thread Oren Laadan
While file pointers are shared objects, they may share an underlying object themselves. For instance, file pointers of both ends of a pipe that share the same pipe inode. In this case, the shared entity to handle is the inode that is shared among two file pointers (e.g read- and write- ends). In th

[Devel] [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes

2009-01-27 Thread Oren Laadan
A pipe is essentially a double-headed inode with a buffer attached to it. We checkpoint the pipe buffer only once, as soon as we hit one side of the pipe, regardless whether it is read- or write- end. To checkpoint a file descriptor that refers to a pipe (either end), we first lookup the inode in

[Devel] [RFC cr-pipe-v13][PATCH 0/3] c/r of open pipes

2009-01-27 Thread Oren Laadan
Adds support for chekcpoint-restart of open pipes. This patchset applies on top of 'ckpt-v13' recent patchset. A new test program (test4.c in userspace tools) provides a test case for this new functionality. Oren. ___ Containers mailing list contain.

[Devel] Re: [Patch 0/3] posix mqueue namespace (v14)

2009-01-27 Thread Serge E. Hallyn
Quoting Andrew Morton (a...@linux-foundation.org): > On Fri, 16 Jan 2009 20:02:48 -0600 > "Serge E. Hallyn" wrote: > > > IPC namespaces are completely disjoint id->object mappings. > > A task can pass CLONE_NEWIPC to unshare and clone to get > > a new, empty, IPC namespace. Until now this has su

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread David Rientjes
On Tue, 27 Jan 2009, Evgeniy Polyakov wrote: > Having some special application which will monitor /dev/mem_notify and > kill processes based on its own hueristics is a good idea, but when it > fails to do its work (or does not exist) system has to have ability to > make a progress and invoke a mai

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread David Rientjes
On Tue, 27 Jan 2009, Evgeniy Polyakov wrote: > > There is no additional oom killer limitation imposed here, nor can the oom > > killer kill a task hung in D state any better than userspace. > > Well, oom-killer can, since it drops unkillable state from the process > mask, that may be not enough

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread David Rientjes
On Tue, 27 Jan 2009, Nikanth Karthikesan wrote: > > That's certainly idealistic, but cannot be done in an inexpensive way that > > would scale with the large systems that clients of cpusets typically use. > > If we kill only the tasks for which cpuset_mems_allowed_intersects() is true > on the f

[Devel] [RFC v13][PATCH 06/14] Dump memory address space

2009-01-27 Thread Oren Laadan
For each VMA, there is a 'struct cr_vma'; if the VMA is file-mapped, it will be followed by the file name. Then comes the actual contents, in one or more chunk: each chunk begins with a header that specifies how many pages it holds, then the virtual addresses of all the dumped pages in that chunk,

[Devel] [RFC v13][PATCH 02/14] Checkpoint/restart: initial documentation

2009-01-27 Thread Oren Laadan
Covers application checkpoint/restart, overall design, interfaces, usage, shared objects, and and checkpoint image format. Changelog[v8]: - Split into multiple files in Documentation/checkpoint/... - Extend documentation, fix typos and comments from feedback Signed-off-by: Oren Laadan Acked-

[Devel] [RFC v13][PATCH 04/14] General infrastructure for checkpoint restart

2009-01-27 Thread Oren Laadan
Add those interfaces, as well as helpers needed to easily manage the file format. The code is roughly broken out as follows: checkpoint/sys.c - user/kernel data transfer, as well as setup of the CR context (a per-checkpoint data structure for housekeeping) checkpoint/checkpoint.c - output wrappe

[Devel] [RFC v13][PATCH 05/14] x86 support for checkpoint/restart

2009-01-27 Thread Oren Laadan
Add logic to save and restore architecture specific state, including thread-specific state, CPU registers and FPU state. In addition, architecture capabilities are saved in an architecure specific extension of the header (cr_hdr_head_arch); Currently this includes only FPU capabilities. Currently

[Devel] [RFC v13][PATCH 14/14] Restart multiple processes

2009-01-27 Thread Oren Laadan
Restarting of multiple processes expects all restarting tasks to call sys_restart(). Once inside the system call, each task will restart itself at the same order that they were saved. The internals of the syscall will take care of in-kernel synchronization bewteen tasks. This patch does _not_ crea

[Devel] [RFC v13][PATCH 09/14] Dump open file descriptors

2009-01-27 Thread Oren Laadan
Dump the files_struct of a task with 'struct cr_hdr_files', followed by all open file descriptors. Because the 'struct file' corresponding to an FD can be shared, each they are assigned an objref and registered in the object hash. A reference to the 'file *' is kept for as long as it lives in the h

[Devel] [RFC v13][PATCH 08/14] Infrastructure for shared objects

2009-01-27 Thread Oren Laadan
Infrastructure to handle objects that may be shared and referenced by multiple tasks or other objects, e..g open files, memory address space etc. The state of shared objects is saved once. On the first encounter, the state is dumped and the object is assigned a unique identifier (objref) and also

[Devel] [RFC v13][PATCH 10/14] Restore open file descriprtors

2009-01-27 Thread Oren Laadan
Restore open file descriptors: for each FD read 'struct cr_hdr_fd_ent' and lookup objref in the hash table; if not found (first occurence), read in 'struct cr_hdr_fd_data', create a new FD and register in the hash. Otherwise attach the file pointer from the hash as an FD. This patch only handles b

[Devel] [RFC v13][PATCH 00/14] Kernel based checkpoint/restart

2009-01-27 Thread Oren Laadan
Checkpoint-restart (c/r): a couple of fixes in preparation for 64bit architectures, and a couple of fixes for bugss (comments from Serge Hallyn, Sudakvev Bhattiprolu and Nathan Lynch). Updated and tested against v2.6.28. Aiming for -mm. The git tree tracking v13, branch 'ckpt-v13' (and older vers

[Devel] [RFC v13][PATCH 01/14] Create syscalls: sys_checkpoint, sys_restart

2009-01-27 Thread Oren Laadan
Create trivial sys_checkpoint and sys_restore system calls. They will enable to checkpoint and restart an entire container, to and from a checkpoint image file descriptor. The syscalls take a file descriptor (for the image file) and flags as arguments. For sys_checkpoint the first argument identif

[Devel] [RFC v13][PATCH 11/14] External checkpoint of a task other than ourself

2009-01-27 Thread Oren Laadan
Now we can do "external" checkpoint, i.e. act on another task. sys_checkpoint() now looks up the target pid (in our namespace) and checkpoints that corresponding task. That task should be the root of a container. sys_restart() remains the same, as the restart is always done in the context of the

[Devel] [RFC v13][PATCH 03/14] Make file_pos_read/write() public

2009-01-27 Thread Oren Laadan
These two are used in the next patch when calling vfs_read/write() --- fs/read_write.c| 10 -- include/linux/fs.h | 10 ++ 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index 969a6d9..dda4eab 100644 --- a/fs/read_write.c

[Devel] [RFC v13][PATCH 12/14] Track in-kernel when we expect checkpoint/restart to work

2009-01-27 Thread Oren Laadan
From: Dave Hansen Suggested by Ingo. Checkpoint/restart is going to be a long effort to get things working. We're going to have a lot of things that we know just don't work for a long time. That doesn't mean that it will be useless, it just means that there's some complicated features that we a

[Devel] Re: [RFC v13][PATCH 01/14] Create syscalls: sys_checkpoint, sys_restart

2009-01-27 Thread Randy Dunlap
Oren Laadan wrote: > Changelog[v5]: > - Config is 'def_bool n' by default That's true by default; it doesn't have to be written/typed. > Signed-off-by: Oren Laadan > Acked-by: Serge Hallyn > Signed-off-by: Dave Hansen > --- > arch/x86/include/asm/unistd_32.h |2 + > arch/x86/kernel/s

[Devel] Re: [PATCH] c/r: define s390-specific checkpoint-restart code

2009-01-27 Thread Oren Laadan
Serge E. Hallyn wrote: > Implement the s390 arch-specific checkpoint/restart helpers. This Thanks for the patch. I will assume that the s390 specifics are correct... > is on top of Oren Laadan's c/r code (which so far was x86_32-only) > submitted here: http://lkml.org/lkml/2008/12/29/38, plus

[Devel] [RFC v13][PATCH 07/14] Restore memory address space

2009-01-27 Thread Oren Laadan
Restoring the memory address space begins with nuking the existing one of the current process, and then reading the VMA state and contents. Call do_mmap_pgoffset() for each VMA and then read in the data. Changelog[v13]: - Avoid access to hh->vma_type after the header is freed - Test for no vma

[Devel] [RFC v13][PATCH 13/14] Checkpoint multiple processes

2009-01-27 Thread Oren Laadan
Checkpointing of multiple processes works by recording the tasks tree structure below a given task (usually this task is the container init). For a given task, do a DFS scan of the tasks tree and collect them into an array (keeping a reference to each task). Using DFS simplifies the recreation of

[Devel] Re: Checkpoint / Restart

2009-01-27 Thread Serge E. Hallyn
Quoting Ralph-Gordon Paul (ralph-gordon.p...@uni-duesseldorf.de): > Hello, > > i'm searching for the right Mailing List for Linux checkpoint / restart. > > I'm working for the XtreemOS Project (http://www.xtreemos.eu). We want > to include the linux native checkpoint / restart, but it seems to

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread Balbir Singh
* Evgeniy Polyakov [2009-01-27 16:45:59]: > Hi. > > On Tue, Jan 27, 2009 at 07:40:58PM +0900, KOSAKI Motohiro > (kosaki.motoh...@jp.fujitsu.com) wrote: > > I'd like to respect your requiremnt. but I also would like to know > > why you like deterministic hierarchy oom than notification. > > > >

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread Evgeniy Polyakov
Hi. On Tue, Jan 27, 2009 at 07:40:58PM +0900, KOSAKI Motohiro (kosaki.motoh...@jp.fujitsu.com) wrote: > I'd like to respect your requiremnt. but I also would like to know > why you like deterministic hierarchy oom than notification. > > I think one of problem is, current patch description is a b

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread Evgeniy Polyakov
Hi David. On Tue, Jan 27, 2009 at 01:37:55AM -0800, David Rientjes (rient...@google.com) wrote: > > /dev/mem_notify is a great idea, but please do not limit existing > > oom-killer in its ability to do the job and do not rely on application's > > ability to send a SIGKILL which will not kill task

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread Nikanth Karthikesan
On Tuesday 27 January 2009 16:51:26 David Rientjes wrote: > On Tue, 27 Jan 2009, Nikanth Karthikesan wrote: > > > I don't understand what you're arguing for here. Are you suggesting > > > that we should not prefer tasks that intersect the set of allowable > > > nodes? That makes no sense if the go

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread David Rientjes
On Tue, 27 Jan 2009, Nikanth Karthikesan wrote: > > I don't understand what you're arguing for here. Are you suggesting that > > we should not prefer tasks that intersect the set of allowable nodes? > > That makes no sense if the goal is to allow for future memory freeing. > > > > No. Actually I

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread Nikanth Karthikesan
On Tuesday 27 January 2009 16:23:00 David Rientjes wrote: > On Tue, 27 Jan 2009, Nikanth Karthikesan wrote: > > > As previously stated, I think the heuristic to penalize tasks for not > > > having an intersection with the set of allowable nodes of the oom > > > triggering task could be made slightl

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread David Rientjes
On Tue, 27 Jan 2009, Nikanth Karthikesan wrote: > > As previously stated, I think the heuristic to penalize tasks for not > > having an intersection with the set of allowable nodes of the oom > > triggering task could be made slightly more severe. That's irrelevant to > > your patch, though. > >

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread KOSAKI Motohiro
Hi Evgeniy, > On Mon, Jan 26, 2009 at 11:51:27PM -0800, David Rientjes > (rient...@google.com) wrote: > > Yeah, I proposed /dev/mem_notify being made as a client of cgroups there > > in http://marc.info/?l=linux-kernel&m=123200623628685 > > > > How do you replace the oom killer's capability of

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread Nikanth Karthikesan
On Saturday 24 January 2009 02:14:59 David Rientjes wrote: > On Fri, 23 Jan 2009, Nikanth Karthikesan wrote: > > In other instances, It can actually also kill some innocent tasks unless > > the administrator tunes oom_adj, say something like kvm which would have > > a huge memory accounted, but mig

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread David Rientjes
On Tue, 27 Jan 2009, Evgeniy Polyakov wrote: > /dev/mem_notify is a great idea, but please do not limit existing > oom-killer in its ability to do the job and do not rely on application's > ability to send a SIGKILL which will not kill tasks in unkillable state > contrary to oom-killer. > You're

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread Evgeniy Polyakov
On Mon, Jan 26, 2009 at 11:51:27PM -0800, David Rientjes (rient...@google.com) wrote: > Yeah, I proposed /dev/mem_notify being made as a client of cgroups there > in http://marc.info/?l=linux-kernel&m=123200623628685 > > How do you replace the oom killer's capability of giving a killed task > a