[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread H. Peter Anvin
Eric W. Biederman wrote: >> >> Really. You have the same classes of issues with ANY allocatable >> resource in the system. Period. Furthermore, there are quite a few >> applications which want one and not the other. Trying to entangle >> them is broken. > > Peter they are entangled issues beca

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Oleg Nesterov writes: > On 02/19, Eric W. Biederman wrote: >> >> Oleg Nesterov writes: >> >> > On 02/19, Eric W. Biederman wrote: >> >> >> >> Oleg Nesterov writes: >> >> > >> >> > SI_FROMUSER() == T, unless we have more (hopefully not) in-kernel >> >> > users which send SI_FROMUSER() signals,

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Roland McGrath writes: >> > think it would be best to fully elucidate what we think about desireable >> > semantics for the whole spectrum of cross-NS signal-sending cases before >> > actually choosing the implementation details. > > ... and then you answered all the questions that are already we

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Roland McGrath
> > think it would be best to fully elucidate what we think about desireable > > semantics for the whole spectrum of cross-NS signal-sending cases before > > actually choosing the implementation details. ... and then you answered all the questions that are already well settled, and did not address

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Roland McGrath writes: >> It is especially useful, and this is a deliberate feature. > > Ok, I thought that might be so. > >> In practice I don't care about si_pid and I doubt I care about processes >> sending signals outside of their pid namespace. But I do care about >> sharing a tty and a s

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Roland McGrath
> It is especially useful, and this is a deliberate feature. Ok, I thought that might be so. > In practice I don't care about si_pid and I doubt I care about processes > sending signals outside of their pid namespace. But I do care about > sharing a tty and a session and having job control wor

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Roland McGrath writes: >> Suppose I have 3 processes in a process group in three separate pid >> namespaces. >> >> Looking from the init pid namespace I have: >> pid pgrp ppid >> 10 101 >> 11 1010 >> 12 1011 >> >> Looking from the pid namespace of pid 11 I hav

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Oleg Nesterov
On 02/19, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > On 02/19, Eric W. Biederman wrote: > >> > >> Oleg Nesterov writes: > >> > > >> > SI_FROMUSER() == T, unless we have more (hopefully not) in-kernel > >> > users which send SI_FROMUSER() signals, .si_pid must be valid? > >> > >> So

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread Eric W. Biederman
Daniel Lezcano writes: > But if I am able to create a new instance of devpts for a container and modify > the configuration of another devpts from this container, is it acceptable ? > Can > we convince people to use the containers for security and have anybody able to > make a pty starvation fro

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Roland McGrath
> Suppose I have 3 processes in a process group in three separate pid > namespaces. > > Looking from the init pid namespace I have: > pid pgrp ppid > 10 101 > 11 1010 > 12 1011 > > Looking from the pid namespace of pid 11 I have: > pid pgrp ppid > 0

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Oleg Nesterov writes: > On 02/19, Eric W. Biederman wrote: >> >> Oleg Nesterov writes: >> > >> > SI_FROMUSER() == T, unless we have more (hopefully not) in-kernel >> > users which send SI_FROMUSER() signals, .si_pid must be valid? >> >> So the argument is that while things such as force_sig_info

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread H. Peter Anvin
Daniel Lezcano wrote: > > But if I am able to create a new instance of devpts for a container and > modify the configuration of another devpts from this container, is it > acceptable ? Can we convince people to use the containers for security > and have anybody able to make a pty starvation fro

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread Daniel Lezcano
H. Peter Anvin wrote: > Daniel Lezcano wrote: >>> >>> Resource limit partitioning is a much bigger and orthogonal problem. >>> >> In this case we don't have the pty allocated independently, no ? >> I mean one container can allocate 4095 pty, making a pty starvation >> for others containers. Or

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Oleg Nesterov
On 02/19, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > > SI_FROMUSER() == T, unless we have more (hopefully not) in-kernel > > users which send SI_FROMUSER() signals, .si_pid must be valid? > > So the argument is that while things such as force_sig_info(SIGSEGV) > don't have a si_pid w

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread Eric W. Biederman
"H. Peter Anvin" writes: > Daniel Lezcano wrote: >>> >>> Resource limit partitioning is a much bigger and orthogonal problem. >>> >> In this case we don't have the pty allocated independently, no ? >> I mean one container can allocate 4095 pty, making a pty starvation for >> others >> contain

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Oleg Nesterov writes: > On 02/19, Eric W. Biederman wrote: >> >> Sukadev Bhattiprolu writes: >> >> > From: Sukadev Bhattiprolu >> > Date: Wed, 24 Dec 2008 14:14:18 -0800 >> > Subject: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns >> > boundary >> > >> > When sending a signal t

[Devel] Re: [PATCH 0/7][v8] Container-init signal semantics

2009-02-19 Thread Oleg Nesterov
On 02/18, Sukadev Bhattiprolu wrote: > > Patch 5/7 is new in this set and fixes a bug. To clarify, the current code is buggy, and the fix doesn't depend on any other patch, afaics. > Remaining patches are > just a forward-port from previous version and I believe they address > all comments I have

[Devel] Re: [PATCH 5/7][v8] zap_pid_ns_process() should use force_sig()

2009-02-19 Thread Sukadev Bhattiprolu
Oleg Nesterov [o...@redhat.com] wrote: | On 02/18, Sukadev Bhattiprolu wrote: | > | > read_lock(&tasklist_lock); | > nr = next_pidmap(pid_ns, 1); | > while (nr > 0) { | > - kill_proc_info(SIGKILL, SEND_SIG_PRIV, nr); | > + rcu_read_lock(); | > + | > + /* |

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread H. Peter Anvin
Daniel Lezcano wrote: >> >> Resource limit partitioning is a much bigger and orthogonal problem. >> > In this case we don't have the pty allocated independently, no ? > I mean one container can allocate 4095 pty, making a pty starvation for > others containers. Or imagine I am a vilain and I wa

[Devel] Re: [RFC][PATCH 1/5] create fs flag to mark c/r supported fs's

2009-02-19 Thread Dave Hansen
On Thu, 2009-02-19 at 14:00 -0500, Christoph Hellwig wrote: > On Thu, Feb 19, 2009 at 10:20:07AM -0800, Dave Hansen wrote: > > > > There are plenty of filesystems that are not supported for > > c/r at this point. Think of things like hugetlbfs which > > are externally visible or pipefs which are

[Devel] Re: Banning checkpoint (was: Re: What can OpenVZ do?)

2009-02-19 Thread Dave Hansen
On Thu, 2009-02-19 at 22:06 +0300, Alexey Dobriyan wrote: > Inotify isn't supported yet? You do > > if (!list_empty(&inode->inotify_watches)) > return -E; > > without hooking into inotify syscalls. > > ptrace(2) isn't supported -- look at struct task_struct::ptraced and >

[Devel] Re: [RFC][PATCH 1/5] create fs flag to mark c/r supported fs's

2009-02-19 Thread Christoph Hellwig
On Thu, Feb 19, 2009 at 10:20:07AM -0800, Dave Hansen wrote: > > There are plenty of filesystems that are not supported for > c/r at this point. Think of things like hugetlbfs which > are externally visible or pipefs which are kernel-internal. > > This provides a quick way to make the "normal" f

[Devel] Banning checkpoint (was: Re: What can OpenVZ do?)

2009-02-19 Thread Alexey Dobriyan
I think that all these efforts to abort checkpoint "intelligently" by banning it early are completely misguided. "Checkpointable" property isn't one-way ticket like "tainted" flag, so doing it like tainted var isn't right, atomic or not, SMP-safe or not. With filesystems, one has ->f_op field to

[Devel] Re: [PATCH 5/7][v8] zap_pid_ns_process() should use force_sig()

2009-02-19 Thread Oleg Nesterov
On 02/18, Sukadev Bhattiprolu wrote: > > read_lock(&tasklist_lock); > nr = next_pidmap(pid_ns, 1); > while (nr > 0) { > - kill_proc_info(SIGKILL, SEND_SIG_PRIV, nr); > + rcu_read_lock(); > + > + /* > + * Use force_sig() since it cle

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Oleg Nesterov
On 02/19, Eric W. Biederman wrote: > > Sukadev Bhattiprolu writes: > > > From: Sukadev Bhattiprolu > > Date: Wed, 24 Dec 2008 14:14:18 -0800 > > Subject: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns > > boundary > > > > When sending a signal to a descendant namespace, set ->si_

[Devel] Re: [RFC][PATCH 1/5] create fs flag to mark c/r supported fs's

2009-02-19 Thread Dave Hansen
BTW, after you apply this and turn on the config option, you do get a ton of warnings at runtime. qemu:~# cat /proc/*/fdinfo/* | grep check | sort | uniq -c | sort -n 1 checkpointable: 0(proc does not support checkpoint) 6 checkpointable: 0(pipefs does not support checkpoint)

[Devel] [RFC][PATCH 1/5] create fs flag to mark c/r supported fs's

2009-02-19 Thread Dave Hansen
There are plenty of filesystems that are not supported for c/r at this point. Think of things like hugetlbfs which are externally visible or pipefs which are kernel-internal. This provides a quick way to make the "normal" filesystems which are currently supported. This is also safe if any new c

[Devel] [RFC][PATCH 3/5] check files for checkpointability

2009-02-19 Thread Dave Hansen
Introduce a files_struct counter to indicate whether a particular file_struct has ever contained a file which can not be checkpointed. This flag is a one-way trip; once it is set, it may not be unset. We assume at allocation that a new files_struct is clean and may be checkpointed. However, as

[Devel] [RFC][PATCH 2/5] file c/r: expose functions to query fs support

2009-02-19 Thread Dave Hansen
This pair of functions will check to see whether a given 'struct file' can be checkpointed. If it can't be, the "explain" function can also give a description why. Signed-off-by: Dave Hansen --- linux-2.6.git-dave/checkpoint/ckpt_file.c | 30 ++ linux-2.6.git-dav

[Devel] [RFC][PATCH 4/5] breakout fdinfo sprintf() into its own function

2009-02-19 Thread Dave Hansen
I'll be adding to this in a moment and it is in a bad place to do that cleanly now. Also, increase the buffer size. Most /proc files can output up to a page, so use the same here. Signed-off-by: Dave Hansen --- linux-2.6.git-dave/fs/proc/base.c | 23 +++ 1 file changed,

[Devel] [RFC][PATCH 5/5] add c/r info to fdinfo

2009-02-19 Thread Dave Hansen
Use the new checkpoint/restart file functions to query and report on each fd in the /proc/$$/fdinfo/X file. This should provide an easy way to examine processes at runtime to see what exactly is causing their inability to checkpoint. Signed-off-by: Dave Hansen --- linux-2.6.git-dave/fs/proc/b

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread Daniel Lezcano
H. Peter Anvin wrote: > Daniel Lezcano wrote: > >> suka...@linux.vnet.ibm.com wrote: >> >>> Enable multiple instances of devpts filesystem so each container can >>> allocate >>> ptys independently. >>> >>> >> Hi suka, >> >> It looks like the /proc/sys/kernel/pty/max and nr are not

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread H. Peter Anvin
Daniel Lezcano wrote: > suka...@linux.vnet.ibm.com wrote: >> Enable multiple instances of devpts filesystem so each container can >> allocate >> ptys independently. >> > Hi suka, > > It looks like the /proc/sys/kernel/pty/max and nr are not virtualized. > Modifying in the container the "max" pt

[Devel] Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary

2009-02-19 Thread Eric W. Biederman
Sukadev Bhattiprolu writes: > From: Sukadev Bhattiprolu > Date: Wed, 24 Dec 2008 14:14:18 -0800 > Subject: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns > boundary > > When sending a signal to a descendant namespace, set ->si_pid to 0 since > the sender does not have a pid in t

[Devel] Re: [PATCH 0/9] Multiple devpts instances

2009-02-19 Thread Daniel Lezcano
suka...@linux.vnet.ibm.com wrote: > Enable multiple instances of devpts filesystem so each container can allocate > ptys independently. > Hi suka, It looks like the /proc/sys/kernel/pty/max and nr are not virtualized. Modifying in the container the "max" pty, that impacts the init_pty. Same as

[Devel] Re: [PATCH 0/7][v8] Container-init signal semantics

2009-02-19 Thread Daniel Lezcano
Sukadev Bhattiprolu wrote: > Patch 5/7 is new in this set and fixes a bug. Remaining patches are > just a forward-port from previous version and I believe they address > all comments I have received. > > Oleg please sign-off/ack if you agree. > > --- > > Container-init must behave like global-init