[Devel] Re: [PATCH 0/7][v8] Container-init signal semantics

2009-02-19 Thread Daniel Lezcano
Sukadev Bhattiprolu wrote: Patch 5/7 is new in this set and fixes a bug. Remaining patches are just a forward-port from previous version and I believe they address all comments I have received. Oleg please sign-off/ack if you agree. --- Container-init must behave like global-init to

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread Daniel Lezcano
suka...@linux.vnet.ibm.com wrote: Enable multiple instances of devpts filesystem so each container can allocate ptys independently. Hi suka, It looks like the /proc/sys/kernel/pty/max and nr are not virtualized. Modifying in the container the max pty, that impacts the init_pty. Same as nr

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Sukadev Bhattiprolu suka...@linux.vnet.ibm.com writes: From: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Date: Wed, 24 Dec 2008 14:14:18 -0800 Subject: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary When sending a signal to a descendant namespace, set -si_pid to

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread H. Peter Anvin
Daniel Lezcano wrote: suka...@linux.vnet.ibm.com wrote: Enable multiple instances of devpts filesystem so each container can allocate ptys independently. Hi suka, It looks like the /proc/sys/kernel/pty/max and nr are not virtualized. Modifying in the container the max pty, that

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread Daniel Lezcano
H. Peter Anvin wrote: Daniel Lezcano wrote: suka...@linux.vnet.ibm.com wrote: Enable multiple instances of devpts filesystem so each container can allocate ptys independently. Hi suka, It looks like the /proc/sys/kernel/pty/max and nr are not virtualized. Modifying

[Devel] [RFC][PATCH 5/5] add c/r info to fdinfo

2009-02-19 Thread Dave Hansen
Use the new checkpoint/restart file functions to query and report on each fd in the /proc/$$/fdinfo/X file. This should provide an easy way to examine processes at runtime to see what exactly is causing their inability to checkpoint. Signed-off-by: Dave Hansen d...@linux.vnet.ibm.com ---

[Devel] [RFC][PATCH 4/5] breakout fdinfo sprintf() into its own function

2009-02-19 Thread Dave Hansen
I'll be adding to this in a moment and it is in a bad place to do that cleanly now. Also, increase the buffer size. Most /proc files can output up to a page, so use the same here. Signed-off-by: Dave Hansen d...@linux.vnet.ibm.com --- linux-2.6.git-dave/fs/proc/base.c | 23

[Devel] [RFC][PATCH 2/5] file c/r: expose functions to query fs support

2009-02-19 Thread Dave Hansen
This pair of functions will check to see whether a given 'struct file' can be checkpointed. If it can't be, the explain function can also give a description why. Signed-off-by: Dave Hansen d...@linux.vnet.ibm.com --- linux-2.6.git-dave/checkpoint/ckpt_file.c | 30

[Devel] [RFC][PATCH 3/5] check files for checkpointability

2009-02-19 Thread Dave Hansen
Introduce a files_struct counter to indicate whether a particular file_struct has ever contained a file which can not be checkpointed. This flag is a one-way trip; once it is set, it may not be unset. We assume at allocation that a new files_struct is clean and may be checkpointed. However, as

[Devel] [RFC][PATCH 1/5] create fs flag to mark c/r supported fs's

2009-02-19 Thread Dave Hansen
There are plenty of filesystems that are not supported for c/r at this point. Think of things like hugetlbfs which are externally visible or pipefs which are kernel-internal. This provides a quick way to make the normal filesystems which are currently supported. This is also safe if any new

[Devel] Re: [RFC][PATCH 1/5] create fs flag to mark c/r supported fs's

2009-02-19 Thread Dave Hansen
BTW, after you apply this and turn on the config option, you do get a ton of warnings at runtime. qemu:~# cat /proc/*/fdinfo/* | grep check | sort | uniq -c | sort -n 1 checkpointable: 0(proc does not support checkpoint) 6 checkpointable: 0(pipefs does not support checkpoint)

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Oleg Nesterov
On 02/19, Eric W. Biederman wrote: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com writes: From: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Date: Wed, 24 Dec 2008 14:14:18 -0800 Subject: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary When sending a

[Devel] Banning checkpoint (was: Re: What can OpenVZ do?)

2009-02-19 Thread Alexey Dobriyan
I think that all these efforts to abort checkpoint intelligently by banning it early are completely misguided. Checkpointable property isn't one-way ticket like tainted flag, so doing it like tainted var isn't right, atomic or not, SMP-safe or not. With filesystems, one has -f_op field to

[Devel] Re: [RFC][PATCH 1/5] create fs flag to mark c/r supported fs's

2009-02-19 Thread Christoph Hellwig
On Thu, Feb 19, 2009 at 10:20:07AM -0800, Dave Hansen wrote: There are plenty of filesystems that are not supported for c/r at this point. Think of things like hugetlbfs which are externally visible or pipefs which are kernel-internal. This provides a quick way to make the normal

[Devel] Re: Banning checkpoint (was: Re: What can OpenVZ do?)

2009-02-19 Thread Dave Hansen
On Thu, 2009-02-19 at 22:06 +0300, Alexey Dobriyan wrote: Inotify isn't supported yet? You do if (!list_empty(inode-inotify_watches)) return -E; without hooking into inotify syscalls. ptrace(2) isn't supported -- look at struct task_struct::ptraced and friends.

[Devel] Re: [RFC][PATCH 1/5] create fs flag to mark c/r supported fs's

2009-02-19 Thread Dave Hansen
On Thu, 2009-02-19 at 14:00 -0500, Christoph Hellwig wrote: On Thu, Feb 19, 2009 at 10:20:07AM -0800, Dave Hansen wrote: There are plenty of filesystems that are not supported for c/r at this point. Think of things like hugetlbfs which are externally visible or pipefs which are

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread H. Peter Anvin
Daniel Lezcano wrote: Resource limit partitioning is a much bigger and orthogonal problem. In this case we don't have the pty allocated independently, no ? I mean one container can allocate 4095 pty, making a pty starvation for others containers. Or imagine I am a vilain and I want to

[Devel] Re: [PATCH 5/7][v8] zap_pid_ns_process() should use force_sig()

2009-02-19 Thread Sukadev Bhattiprolu
Oleg Nesterov [o...@redhat.com] wrote: | On 02/18, Sukadev Bhattiprolu wrote: | | read_lock(tasklist_lock); | nr = next_pidmap(pid_ns, 1); | while (nr 0) { | - kill_proc_info(SIGKILL, SEND_SIG_PRIV, nr); | + rcu_read_lock(); | + | + /* | +

[Devel] Re: [PATCH 0/7][v8] Container-init signal semantics

2009-02-19 Thread Oleg Nesterov
On 02/18, Sukadev Bhattiprolu wrote: Patch 5/7 is new in this set and fixes a bug. To clarify, the current code is buggy, and the fix doesn't depend on any other patch, afaics. Remaining patches are just a forward-port from previous version and I believe they address all comments I have

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Oleg Nesterov o...@redhat.com writes: On 02/19, Eric W. Biederman wrote: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com writes: From: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Date: Wed, 24 Dec 2008 14:14:18 -0800 Subject: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread Eric W. Biederman
H. Peter Anvin h...@zytor.com writes: Daniel Lezcano wrote: Resource limit partitioning is a much bigger and orthogonal problem. In this case we don't have the pty allocated independently, no ? I mean one container can allocate 4095 pty, making a pty starvation for others containers.

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Oleg Nesterov
On 02/19, Eric W. Biederman wrote: Oleg Nesterov o...@redhat.com writes: SI_FROMUSER() == T, unless we have more (hopefully not) in-kernel users which send SI_FROMUSER() signals, .si_pid must be valid? So the argument is that while things such as force_sig_info(SIGSEGV) don't have a

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread Daniel Lezcano
H. Peter Anvin wrote: Daniel Lezcano wrote: Resource limit partitioning is a much bigger and orthogonal problem. In this case we don't have the pty allocated independently, no ? I mean one container can allocate 4095 pty, making a pty starvation for others containers. Or imagine I am a

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Oleg Nesterov o...@redhat.com writes: On 02/19, Eric W. Biederman wrote: Oleg Nesterov o...@redhat.com writes: SI_FROMUSER() == T, unless we have more (hopefully not) in-kernel users which send SI_FROMUSER() signals, .si_pid must be valid? So the argument is that while things such as

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Roland McGrath
Suppose I have 3 processes in a process group in three separate pid namespaces. Looking from the init pid namespace I have: pid pgrp ppid 10 101 11 1010 12 1011 Looking from the pid namespace of pid 11 I have: pid pgrp ppid 0 0 0

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread Eric W. Biederman
Daniel Lezcano daniel.lezc...@free.fr writes: But if I am able to create a new instance of devpts for a container and modify the configuration of another devpts from this container, is it acceptable ? Can we convince people to use the containers for security and have anybody able to make a

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Oleg Nesterov
On 02/19, Eric W. Biederman wrote: Oleg Nesterov o...@redhat.com writes: On 02/19, Eric W. Biederman wrote: Oleg Nesterov o...@redhat.com writes: SI_FROMUSER() == T, unless we have more (hopefully not) in-kernel users which send SI_FROMUSER() signals, .si_pid must be valid?

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Roland McGrath rol...@redhat.com writes: Suppose I have 3 processes in a process group in three separate pid namespaces. Looking from the init pid namespace I have: pid pgrp ppid 10 101 11 1010 12 1011 Looking from the pid namespace of pid 11 I have:

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Roland McGrath
It is especially useful, and this is a deliberate feature. Ok, I thought that might be so. In practice I don't care about si_pid and I doubt I care about processes sending signals outside of their pid namespace. But I do care about sharing a tty and a session and having job control work.

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Roland McGrath rol...@redhat.com writes: It is especially useful, and this is a deliberate feature. Ok, I thought that might be so. In practice I don't care about si_pid and I doubt I care about processes sending signals outside of their pid namespace. But I do care about sharing a tty

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Roland McGrath
think it would be best to fully elucidate what we think about desireable semantics for the whole spectrum of cross-NS signal-sending cases before actually choosing the implementation details. ... and then you answered all the questions that are already well settled, and did not address the

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Roland McGrath rol...@redhat.com writes: think it would be best to fully elucidate what we think about desireable semantics for the whole spectrum of cross-NS signal-sending cases before actually choosing the implementation details. ... and then you answered all the questions that are

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Oleg Nesterov o...@redhat.com writes: On 02/19, Eric W. Biederman wrote: Oleg Nesterov o...@redhat.com writes: On 02/19, Eric W. Biederman wrote: Oleg Nesterov o...@redhat.com writes: SI_FROMUSER() == T, unless we have more (hopefully not) in-kernel users which send

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread H. Peter Anvin
Eric W. Biederman wrote: Really. You have the same classes of issues with ANY allocatable resource in the system. Period. Furthermore, there are quite a few applications which want one and not the other. Trying to entangle them is broken. Peter they are entangled issues because the