[Devel] Re: [PATCH 2/3] Introduce the res_counter_populate() function

2007-10-03 Thread Pavel Emelyanov
Paul Menage wrote: > On 10/3/07, Pavel Emelyanov <[EMAIL PROTECTED]> wrote: >> + >> + strcpy(files[RES_USAGE].name, "usage_in_bytes"); > > This prevents the resource counters from being used for anything that > doesn't measure its limit/usage in bytes. E.g. number of tasks. > > I like the i

[Devel] Re: [RFC] [PATCH] memory controller statistics

2007-10-03 Thread YAMAMOTO Takashi
> hi, > > i implemented some statistics for your memory controller. > > it's tested with 2.6.23-rc2-mm2 + memory controller v7. > i think it can be applied to 2.6.23-rc4-mm1 as well. > > YAMOMOTO Takshi > > todo: something like nr_active/inactive in /proc/vmstat. here's the version i'm working

[Devel] Re: [PATCH 03/33] task containersv11 add tasks file interface

2007-10-03 Thread Paul Jackson
One more cgroup code review detail ... The following is evidence of some more stale comments in kernel/cpuset.c. Some routines which used to be in that file, but which are now reimplemented in cgroups, are still named in cpuset.c comments: $ grep -E 'cpuset_rmdir|cpuset_exit|cpuset_fork' kernel/

[Devel] Re: [PATCH 03/33] task containersv11 add tasks file interface

2007-10-03 Thread Paul Menage
On 10/3/07, Paul Jackson <[EMAIL PROTECTED]> wrote: > > I can't say for sure, but I suspect that if cgroups had always > been cgroups (short for control groups), then these local 'cont' > variables would have a different name. Oh, absolutely. I just refrained from changing them in the rename since

[Devel] Re: [PATCH 03/33] task containersv11 add tasks file interface

2007-10-03 Thread Paul Jackson
> > - There are many instances of the local variable 'cont', referring > >to a struct cgroup pointer. I presume the spelling 'cont' is a > >holdover from the time when we called these containers. > > Yes, and since cgroup is short for "control group", "cont" still > seemed like a reasona

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-03 Thread Paul Menage
On 10/3/07, Paul Jackson <[EMAIL PROTECTED]> wrote: > > But now (correct me if I'm wrong here) cgroups has a per-cgroup task > list, and the above loop has cost linear in the number of tasks > actually in the cgroup, plus (unfortunate but necessary and tolerable) > the cost of taking a global css_s

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-03 Thread Paul Jackson
Andrew - please kill this patch. Looks like Paul Menage has a better solution that I will be trying out. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-03 Thread Paul Jackson
> What was wrong with my suggestion from a couple of emails back? Adding > the following in cpuset_attach(): > > struct cgroup_iter it; > struct task_struct *p; > while ((p = cgroup_iter_next(cs->css.cgroup, &it))) { >set_cpus_allowed(p, cs->cpus_allowed); > } > cgroup_iter_end(cs->css.cgr

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-03 Thread Paul Menage
On 10/3/07, Paul Jackson <[EMAIL PROTECTED]> wrote: > > So far as I can tell, this patch just removes a minor optimization, It's not minor for any subsystem that has a non-trivial attach() operation. > > Any further suggestions, or embarrassing (;) questions? > What was wrong with my suggestion

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-03 Thread Paul Jackson
Paul M wrote: > Given that later in cpusets.txt you say: > > >If hotplug functionality is used > >to remove all the CPUs that are currently assigned to a cpuset, > >then the kernel will automatically update the cpus_allowed of all > >tasks attached to CPUs in that cpuset to allow all CPUs > > why

[Devel] Re: netns : close all sockets at unshare ?

2007-10-03 Thread Daniel Lezcano
Eric W. Biederman wrote: Daniel Lezcano <[EMAIL PROTECTED]> writes: Yes, it will work. Do we want to be inside a network namespace and to use a socket belonging to another network namespace ? If yes, then my remark is irrelevant. Yes we do. Shall we close all fd sockets when doing an unshar

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-03 Thread Paul Menage
On 10/3/07, Paul Menage <[EMAIL PROTECTED]> wrote: > > What's wrong with, in update_cpumask(), doing a loop across all > members of the cgroup and updating their cpus_allowed fields? i.e. something like: struct cgroup_iter it; struct task_struct *p; while ((p = cgroup_iter_next(cs->css.cgroup, &i

[Devel] Re: [PATCH 03/33] task containersv11 add tasks file interface

2007-10-03 Thread Paul Menage
On 10/3/07, Paul Jackson <[EMAIL PROTECTED]> wrote: > > So after changing the 'cpus' of a cpuset, you (something in > user space) then has to rewrite every pid in that cpusets > tasks file back to that same tasks file, in order to get the > change to be applied to each of those tasks, and get them

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-03 Thread Paul Menage
On 10/3/07, Paul Jackson <[EMAIL PROTECTED]> wrote: > > Yes, something in user space has to do it. It's part of the > kernel-user cpuset API. If you change a cpuset's 'cpus', then > you have to rewrite each pid in its 'tasks' file back to that > 'tasks' file in order to get that 'cpus' change to

[Devel] Re: [PATCH 2/3] Introduce the res_counter_populate() function

2007-10-03 Thread Balbir Singh
Pavel Emelyanov wrote: > > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > index 61363ce..8ee8316 100644 > --- a/include/linux/res_counter.h > +++ b/include/linux/res_counter.h > @@ -58,6 +58,9 @@ ssize_t res_counter_write(struct res_cou > const char __user *

[Devel] Re: [PATCH 1/3] Introduce the dummy_pid

2007-10-03 Thread Randy Dunlap
On Wed, 03 Oct 2007 18:19:01 +0400 Pavel Emelyanov wrote: > This is a pid which is attached to tasks when they detach > their pids. This is done in detach_pid() and transfer_pid(). > The pid_alive() check is changed to reflect this fact. > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> > >

[Devel] Re: [PATCH 0/3] Consolidate cgroup files creation for resource counters

2007-10-03 Thread Balbir Singh
Pavel Emelyanov wrote: > Right now we have only one controller in -mm tree - the memory > one - and it initializes all its files manually. After I developed > the kernel memory one it turned out, that the names of resource > counter specific files has changed. In the future, if anything in > the re

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-03 Thread Paul Jackson
Paul M wrote: > > The code in kernel/cgroup.c attach_task() which skips the > > attachment of a task to the group it is already in has to be > > removed. Cpusets depends on reattaching a task to its current > > cpuset, in order to trigger updating the cpus_allowed mask in the > > task struct. > >

[Devel] Re: [PATCH 03/33] task containersv11 add tasks file interface

2007-10-03 Thread Paul Jackson
Paul M wrote: > Are there cases when userspace is > required to try to reattach a task to its current cpuset in order to > get a cpu mask change to stick? Yes, there are such cases. If tasks are running in a cpuset, and someone changes the 'cpus' of that cpuset, the tasks already in that cpuset d

[Devel] Re: netns : close all sockets at unshare ?

2007-10-03 Thread Eric W. Biederman
Daniel Lezcano <[EMAIL PROTECTED]> writes: > > Yes, it will work. > > Do we want to be inside a network namespace and to use a socket belonging to > another network namespace ? If yes, then my remark is irrelevant. Yes we do. >>> Shall we close all fd sockets when doing an unshare ? like a close-

[Devel] Re: [PATCH 2/3] Prepare pid_nr() etc functions to work with not-NULL pids

2007-10-03 Thread Matt Mackall
On Wed, Oct 03, 2007 at 06:20:43PM +0400, Pavel Emelyanov wrote: > Just make the __pid_nr() etc functions that expect the argument > to always be not NULL. > > Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> > static inline pid_t pid_nr(struct pid *pid) > { > pid_t nr = 0; > if (

[Devel] Re: [PATCH 2/3] Introduce the res_counter_populate() function

2007-10-03 Thread Paul Menage
On 10/3/07, Pavel Emelyanov <[EMAIL PROTECTED]> wrote: > + > + strcpy(files[RES_USAGE].name, "usage_in_bytes"); This prevents the resource counters from being used for anything that doesn't measure its limit/usage in bytes. E.g. number of tasks. I like the idea of having a unified interface

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-03 Thread Paul Menage
On 10/3/07, Paul Jackson <[EMAIL PROTECTED]> wrote: > From: Paul Jackson <[EMAIL PROTECTED]> > > The code in kernel/cgroup.c attach_task() which skips the > attachment of a task to the group it is already in has to be > removed. Cpusets depends on reattaching a task to its current > cpuset, in ord

[Devel] Re: [patch 0/1][NETNS49] Make af_unix autobind per namespace

2007-10-03 Thread Cedric Le Goater
Eric W. Biederman wrote: > Cedric Le Goater <[EMAIL PROTECTED]> writes: >> my 2 cnts, >> >> when 'restarting' a socket bound to an abstract name, we will have >> a EADDRINUSE if we try to rebind it to an abtract name which is >> already in use by a socket in a another namespace ? > > No. ok. i

[Devel] Re: [PATCH 03/33] task containersv11 add tasks file interface

2007-10-03 Thread Paul Menage
On 10/3/07, Paul Jackson <[EMAIL PROTECTED]> wrote: > > - What are these apparent 'exec notifications' that are provided to >user space that the following mentions - I cannot find any other >mention of them: > > With the ability to classify tasks differently for different >

[Devel] Re: [patch 0/1][NETNS49] Make af_unix autobind per namespace

2007-10-03 Thread Eric W. Biederman
Cedric Le Goater <[EMAIL PROTECTED]> writes: > my 2 cnts, > > when 'restarting' a socket bound to an abstract name, we will have > a EADDRINUSE if we try to rebind it to an abtract name which is > already in use by a socket in a another namespace ? No. > it seems to me that this is an identifi

[Devel] [PATCH] Use task_pid_nr() instead of pid_nr(task_pid())

2007-10-03 Thread Pavel Emelyanov
There are two places that do so - the cgroups subsystem and the autofs code. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/fs/autofs/root.c b/fs/autofs/root.c index 592f640..5efff3c 100644 --- a/fs/autofs/root.c +++ b/fs/autofs/root.c @@ -214,7 +214,7 @@ static struct dent

[Devel] [PATCH 3/3] Use the __pid_nr() calls where appropriate

2007-10-03 Thread Pavel Emelyanov
Since we now have all the tasks have non-zero pids we may call the __pid_nr() instead of pid_nr() when the pid is get from task with task_pid() or similar call. Besides, there are some other places where the check for pid to be not NULL is already done, so change them too. Signed-off-by: Pavel

[Devel] [PATCH 2/3] Prepare pid_nr() etc functions to work with not-NULL pids

2007-10-03 Thread Pavel Emelyanov
Just make the __pid_nr() etc functions that expect the argument to always be not NULL. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/include/linux/pid.h b/include/linux/pid.h index e29a900..50b6899 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -135,21 +135,3

[Devel] [PATCH 1/3] Introduce the dummy_pid

2007-10-03 Thread Pavel Emelyanov
This is a pid which is attached to tasks when they detach their pids. This is done in detach_pid() and transfer_pid(). The pid_alive() check is changed to reflect this fact. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/include/linux/sched.h b/include/linux/sched.h index 4f

[Devel] [PATCH 0/3] Make tasks always have non-zero pids

2007-10-03 Thread Pavel Emelyanov
Some time ago Sukadev noticed that the vmlinux size has grown 5Kb due to merged pid namespaces. One of the big problems with it was fat inline functions. The other thing was noticed by Matt - the checks for task's pid to be not NULL take place and make the kernel grow due to inlining, but these che

[Devel] Re: [patch 0/1][NETNS49] Make af_unix autobind per namespace

2007-10-03 Thread Cedric Le Goater
Denis V. Lunev wrote: > Daniel Lezcano wrote: >> Eric W. Biederman wrote: >>> Daniel Lezcano <[EMAIL PROTECTED]> writes: >>> The following patch change autobind fonction to use the ordernum from the network namespace instead of using the local static variable. >>> Why do we care? >>> Info

[Devel] [PATCH 3/3] Use the res_counter_populate in memory controller

2007-10-03 Thread Pavel Emelyanov
Note, that the controller code dealing with the cftype files for resource counters becomes much shorter and won't have to be changed in the future. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1b8bf24..d1b43bc 100644 --- a/mm/memcon

[Devel] [PATCH 2/3] Introduce the res_counter_populate() function

2007-10-03 Thread Pavel Emelyanov
This one is responsible for initializing the RES_CFT_MAX files properly and register them inside the container. The caller must provide the cgroup, the cgroup_subsys, the RES_CFT_MAX * sizeof(cftype) chunk of uninitialized memory and the read and write callbacks. In the future, if we add more res

[Devel] [PATCH 1/3] Typedefs the read and write functions in cftype

2007-10-03 Thread Pavel Emelyanov
This is just to reduce the code amount in the future. Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]> --- diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 8747932..0635004 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -178,6 +178,15 @@ struct css_set {

[Devel] [PATCH 0/3] Consolidate cgroup files creation for resource counters

2007-10-03 Thread Pavel Emelyanov
Right now we have only one controller in -mm tree - the memory one - and it initializes all its files manually. After I developed the kernel memory one it turned out, that the names of resource counter specific files has changed. In the future, if anything in the resource counters change, we'll hav

[Devel] Re: [PATCH] mark read_crX() asm code as volatile

2007-10-03 Thread Nick Piggin
On Wednesday 03 October 2007 16:18, H. Peter Anvin wrote: > Nick Piggin wrote: > >> This should work because the result gets used before reading again: > >> > >> read_cr3(a); > >> write_cr3(a | 1); > >> read_cr3(a); > >> > >> But this might be reordered so that b gets read before the write: > >> >

[Devel] Re: netns : close all sockets at unshare ?

2007-10-03 Thread Daniel Lezcano
Eric W. Biederman wrote: Daniel Lezcano <[EMAIL PROTECTED]> writes: Hi, I was looking at some cornercases and trying to figure out what happens if someone does: 1 - fd = socket(...) 2 - unshare(CLONE_NEWNET) 3 - bind(fd, ...) / listen(fd, ...) There is here an interaction between two namespa

[Devel] Re: [PATCH] mark read_crX() asm code as volatile

2007-10-03 Thread Andi Kleen
> > How does the compiler know it doesn't depend on memory? When it has no m (or equivalent like g) constrained argument and no memory clobber. > How do you say it depends on memory? You add any of the above. > You really need something as heavy as volatile? You could do a memory clobber,

[Devel] [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-03 Thread Paul Jackson
From: Paul Jackson <[EMAIL PROTECTED]> The code in kernel/cgroup.c attach_task() which skips the attachment of a task to the group it is already in has to be removed. Cpusets depends on reattaching a task to its current cpuset, in order to trigger updating the cpus_allowed mask in the task struct

[Devel] Re: [patch 0/1][NETNS49] Make af_unix autobind per namespace

2007-10-03 Thread Daniel Lezcano
Eric W. Biederman wrote: Daniel Lezcano <[EMAIL PROTECTED]> writes: Eric W. Biederman wrote: Daniel Lezcano <[EMAIL PROTECTED]> writes: The following patch change autobind fonction to use the ordernum from the network namespace instead of using the local static variable. Why do we care? Inf

[Devel] Re: [PATCH] mark read_crX() asm code as volatile

2007-10-03 Thread Kirill Korotaev
Arjan, I can experiment with any constraints if you suggest which one. >From our experiments with gcc, it compares asm strings (sic!!!) to find matches to be merged! Sigh... Below are 2 programs which differ in one space in read_cr3_b() asm statement. The first one compiles incorrectly, while 2nd

[Devel] Re: [patch 0/1][NETNS49] Make af_unix autobind per namespace

2007-10-03 Thread Denis V. Lunev
Daniel Lezcano wrote: > Eric W. Biederman wrote: >> Daniel Lezcano <[EMAIL PROTECTED]> writes: >> >>> The following patch change autobind fonction to use the ordernum >>> from the network namespace instead of using the local static variable. >> >> Why do we care? >> Information leak? >> Some applic

[Devel] Re: [patch 0/1][NETNS49] Make af_unix autobind per namespace

2007-10-03 Thread Denis V. Lunev
Eric W. Biederman wrote: > Daniel Lezcano <[EMAIL PROTECTED]> writes: > >> The following patch change autobind fonction to use the ordernum >> from the network namespace instead of using the local static variable. > > Why do we care? > Information leak? > Some application is expecting a predictab

[Devel] Re: [PATCH 03/33] task containersv11 add tasks file interface

2007-10-03 Thread Paul Jackson
Cgroup (aka container) code review: Except for the very last item below, my other comments are minor. And the last item is pretty easy too - just more important. Overall - nice stuff. I like this generalization of the cpuset hierarchy. Thanks. === Review comments on Documentation/cgro

[Devel] Re: [patch -mm 1/5] mqueue namespace : add struct mq_namespace

2007-10-03 Thread Cedric Le Goater
Eric W. Biederman wrote: > [EMAIL PROTECTED] writes: > >> Cedric Le Goater [EMAIL PROTECTED] wrote: >> | >> | >> however, we have an issue with the signal notification in __do_notify() >> | >> we could kill a process in a different pid namespace. >> | > >> | > So I took a quick look at the code

[Devel] Re: [patch -mm 1/5] mqueue namespace : add struct mq_namespace

2007-10-03 Thread Cedric Le Goater
[ I have big fingers this morning and I managed to send this email while typing it ... see below for the end. I should be awake now :) ] > The really challenging case to handle here is what happens if we are > signaling to someone in a sibling pid namespace. What do we set the > parent pi

[Devel] Re: [patch -mm 1/5] mqueue namespace : add struct mq_namespace

2007-10-03 Thread Cedric Le Goater
[EMAIL PROTECTED] wrote: > Cedric Le Goater [EMAIL PROTECTED] wrote: > | > | >> however, we have an issue with the signal notification in __do_notify() > | >> we could kill a process in a different pid namespace. > | > > | > So I took a quick look at the code as it is (before this patchset) > | >

[Devel] Re: [PATCH 5/5] Account for the slub objects

2007-10-03 Thread Pavel Emelyanov
Christoph Lameter wrote: > On Tue, 2 Oct 2007, Pavel Emelyanov wrote: > >> Christoph Lameter wrote: >>> On Mon, 1 Oct 2007, Pavel Emelyanov wrote: >>> >> + > Quick check, slub_free_notify() and slab_alloc_notify() are called > from serialized contexts, right? Yup. >>> How is it se