Re: [PATCH] memcg: stop warning on memcg_propagate_kmem

2013-02-04 Thread Lord Glauber Costa of Sealand
On 02/04/2013 12:36 PM, Michal Hocko wrote: > On Mon 04-02-13 12:04:06, Glauber Costa wrote: >> On 02/04/2013 11:57 AM, Michal Hocko wrote: >>> On Sun 03-02-13 20:29:01, Hugh Dickins wrote: Whilst I run the risk of a flogging for disloyalty to the Lord of Sealand, I do have

Re: [PATCH] memcg: stop warning on memcg_propagate_kmem

2013-02-04 Thread Lord Glauber Costa of Sealand
On 02/04/2013 11:57 AM, Michal Hocko wrote: > On Sun 03-02-13 20:29:01, Hugh Dickins wrote: >> Whilst I run the risk of a flogging for disloyalty to the Lord of Sealand, >> I do have CONFIG_MEMCG=y CONFIG_MEMCG_KMEM not set, and grow tired of the >> "mm/memcontrol.c:4972:12: warning:

Re: [PATCH] memcg: stop warning on memcg_propagate_kmem

2013-02-04 Thread Lord Glauber Costa of Sealand
On 02/04/2013 11:57 AM, Michal Hocko wrote: On Sun 03-02-13 20:29:01, Hugh Dickins wrote: Whilst I run the risk of a flogging for disloyalty to the Lord of Sealand, I do have CONFIG_MEMCG=y CONFIG_MEMCG_KMEM not set, and grow tired of the mm/memcontrol.c:4972:12: warning:

Re: [PATCH] memcg: stop warning on memcg_propagate_kmem

2013-02-04 Thread Lord Glauber Costa of Sealand
On 02/04/2013 12:36 PM, Michal Hocko wrote: On Mon 04-02-13 12:04:06, Glauber Costa wrote: On 02/04/2013 11:57 AM, Michal Hocko wrote: On Sun 03-02-13 20:29:01, Hugh Dickins wrote: Whilst I run the risk of a flogging for disloyalty to the Lord of Sealand, I do have CONFIG_MEMCG=y

Re: [PATCH] memcg: stop warning on memcg_propagate_kmem

2013-02-03 Thread Lord Glauber Costa of Sealand
On 02/04/2013 08:29 AM, Hugh Dickins wrote: > Whilst I run the risk of a flogging for disloyalty to the Lord of Sealand, > I do have CONFIG_MEMCG=y CONFIG_MEMCG_KMEM not set, and grow tired of the > "mm/memcontrol.c:4972:12: warning: `memcg_propagate_kmem' defined but not > used

Re: [PATCH] memcg: stop warning on memcg_propagate_kmem

2013-02-03 Thread Lord Glauber Costa of Sealand
On 02/04/2013 08:29 AM, Hugh Dickins wrote: Whilst I run the risk of a flogging for disloyalty to the Lord of Sealand, I do have CONFIG_MEMCG=y CONFIG_MEMCG_KMEM not set, and grow tired of the mm/memcontrol.c:4972:12: warning: `memcg_propagate_kmem' defined but not used [-Wunused-function]

Re: [PATCHv2 8/9] zswap: add to mm/

2013-01-29 Thread Lord Glauber Costa of Sealand
On 01/28/2013 07:27 PM, Seth Jennings wrote: > Yes, I prototyped a shrinker interface for zswap, but, as we both > figured, it shrinks the zswap compressed pool too aggressively to the > point of being useless. Can't you advertise a smaller number of objects that you actively have? Since the

Re: [PATCHv2 8/9] zswap: add to mm/

2013-01-29 Thread Lord Glauber Costa of Sealand
On 01/28/2013 07:27 PM, Seth Jennings wrote: Yes, I prototyped a shrinker interface for zswap, but, as we both figured, it shrinks the zswap compressed pool too aggressively to the point of being useless. Can't you advertise a smaller number of objects that you actively have? Since the

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-28 Thread Lord Glauber Costa of Sealand
On 01/28/2013 08:19 PM, Eric W. Biederman wrote: > Lord Glauber Costa of Sealand writes: > >> On 01/28/2013 12:14 PM, Eric W. Biederman wrote: >>> Lord Glauber Costa of Sealand writes: >>> >>>> I just saw in a later patch of yours that your concern

[PATCH] cfq: fix lock imbalance with failed allocations

2013-01-28 Thread Lord Glauber Costa of Sealand
From: Glauber Costa While stress-running very-small container scenarios with the Kernel Memory Controller, I've run into a lockdep-detected lock imbalance in cfq-iosched.c. I'll apologize beforehand for not posting a backlog: I didn't anticipate it would be so hard to reproduce, so I didn't

Re: [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap.

2013-01-28 Thread Lord Glauber Costa of Sealand
Hello Mr. Someone. On 01/28/2013 06:28 PM, Aristeu Rozanski wrote: > On Fri, Jan 25, 2013 at 06:21:00PM -0800, Eric W. Biederman wrote: >> When I initially wrote the code for /proc//uid_map. I was lazy >> and avoided duplicate mappings by the simple expedient of ensuring the >> first number in a

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-28 Thread Lord Glauber Costa of Sealand
On 01/28/2013 12:14 PM, Eric W. Biederman wrote: > Lord Glauber Costa of Sealand writes: > >> I just saw in a later patch of yours that your concern here seems not >> limited to backed ram by tmpfs, but with things like the internal >> structures for userns , to av

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-28 Thread Lord Glauber Costa of Sealand
On 01/28/2013 12:14 PM, Eric W. Biederman wrote: Lord Glauber Costa of Sealand glom...@parallels.com writes: I just saw in a later patch of yours that your concern here seems not limited to backed ram by tmpfs, but with things like the internal structures for userns , to avoid patterns

Re: [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap.

2013-01-28 Thread Lord Glauber Costa of Sealand
Hello Mr. Someone. On 01/28/2013 06:28 PM, Aristeu Rozanski wrote: On Fri, Jan 25, 2013 at 06:21:00PM -0800, Eric W. Biederman wrote: When I initially wrote the code for /proc/pid/uid_map. I was lazy and avoided duplicate mappings by the simple expedient of ensuring the first number in a new

[PATCH] cfq: fix lock imbalance with failed allocations

2013-01-28 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com While stress-running very-small container scenarios with the Kernel Memory Controller, I've run into a lockdep-detected lock imbalance in cfq-iosched.c. I'll apologize beforehand for not posting a backlog: I didn't anticipate it would be so hard to

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-28 Thread Lord Glauber Costa of Sealand
On 01/28/2013 08:19 PM, Eric W. Biederman wrote: Lord Glauber Costa of Sealand glom...@parallels.com writes: On 01/28/2013 12:14 PM, Eric W. Biederman wrote: Lord Glauber Costa of Sealand glom...@parallels.com writes: I just saw in a later patch of yours that your concern here seems

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-27 Thread Lord Glauber Costa of Sealand
On 01/28/2013 11:37 AM, Lord Glauber Costa of Sealand wrote: > On 01/26/2013 06:22 AM, Eric W. Biederman wrote: >> >> In the help text describing user namespaces recommend use of memory >> control groups. In many cases memory control groups are the only >> mechanism

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-27 Thread Lord Glauber Costa of Sealand
On 01/26/2013 06:22 AM, Eric W. Biederman wrote: > > In the help text describing user namespaces recommend use of memory > control groups. In many cases memory control groups are the only > mechanism there is to limit how much memory a user who can create > user namespaces can use. > >

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-27 Thread Lord Glauber Costa of Sealand
On 01/26/2013 06:22 AM, Eric W. Biederman wrote: In the help text describing user namespaces recommend use of memory control groups. In many cases memory control groups are the only mechanism there is to limit how much memory a user who can create user namespaces can use. Signed-off-by:

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-27 Thread Lord Glauber Costa of Sealand
On 01/28/2013 11:37 AM, Lord Glauber Costa of Sealand wrote: On 01/26/2013 06:22 AM, Eric W. Biederman wrote: In the help text describing user namespaces recommend use of memory control groups. In many cases memory control groups are the only mechanism there is to limit how much memory

[PATCH v6 06/12] cpuacct: don't actually do anything.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa All the information we have that is needed for cpuusage (and cpuusage_percpu) is present in schedstats. It is already recorded in a sane hierarchical way. If we have CONFIG_SCHEDSTATS, we don't really need to do any extra work. All former functions become empty inlines.

[PATCH v6 04/12] cgroup, sched: deprecate cpuacct

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Tejun Heo Now that cpu serves the same files as cpuacct and using cpuacct separately from cpu is deprecated, we can deprecate cpuacct. To avoid disturbing userland which has been co-mounting cpu and cpuacct, implement some hackery in cgroup core so that cpuacct co-mounting still works

[PATCH v6 00/12] per-cgroup cpu-stat

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa Hi all, This is an attempt to provide userspace with enough information to reconstruct per-container version of files like "/proc/stat". In particular, we are interested in knowing the per-cgroup slices of user time, system time, wait time, number of processes, and a variety

[PATCH v6 03/12] cgroup, sched: let cpu serve the same files as cpuacct

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Tejun Heo cpuacct being on a separate hierarchy is one of the main cgroup related complaints from scheduler side and the consensus seems to be * Allowing cpuacct to be a separate controller was a mistake. In general multiple controllers on the same type of resource should be avoided,

[PATCH v6 08/12] sched: account guest time per-cgroup as well.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa We already track multiple tick statistics per-cgroup, using the task_group_account_field facility. This patch accounts guest_time in that manner as well. Signed-off-by: Glauber Costa CC: Peter Zijlstra CC: Paul Turner --- kernel/sched/cputime.c | 10 -- 1 file

[PATCH v6 07/12] sched: document the cpu cgroup.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa The CPU cgroup is so far, undocumented. Although data exists in the Documentation directory about its functioning, it is usually spread, and/or presented in the context of something else. This file consolidates all cgroup-related information about it. Signed-off-by: Glauber

[PATCH v6 05/12] sched: adjust exec_clock to use it as cpu usage metric

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa exec_clock already provides per-group cpu usage metrics, and can be reused by cpuacct in case cpu and cpuacct are comounted. However, it is only provided by tasks in fair class. Doing the same for rt is easy, and can be done in an already existing hierarchy loop. This is an

[PATCH v6 12/12] sched: introduce cgroup file stat_percpu

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa The file cpu.stat_percpu will show various scheduler related information, that are usually available to the top level through other files. For instance, most of the meaningful data in /proc/stat is presented here. Given this file, a container can easily construct a local

[PATCH v6 09/12] sched: Push put_prev_task() into pick_next_task()

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Peter Zijlstra In order to avoid having to do put/set on a whole cgroup hierarchy when we context switch, push the put into pick_next_task() so that both operations are in the same function. Further changes then allow us to possibly optimize away redundant work. [ glom...@parallels.com:

[PATCH v6 11/12] sched: change nr_context_switches calculation.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa This patch changes the calculation of nr_context_switches. The variable "nr_switches" is now used to account for the number of transition to the idle task, or stop task. It is removed from the schedule() path. The total calculation can be made using the fact that the

[PATCH v6 02/12] cgroup: implement CFTYPE_NO_PREFIX

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Tejun Heo When cgroup files are created, cgroup core automatically prepends the name of the subsystem as prefix. This patch adds CFTYPE_NO_PREFIX which disables the automatic prefix. This will be used to deprecate cpuacct which will make cpu create and serve the cpuacct files.

[PATCH v6 10/12] sched: record per-cgroup number of context switches

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa Context switches are, to this moment, a property of the runqueue. When running containers, we would like to be able to present a separate figure for each container (or cgroup, in this context). The chosen way to accomplish this is to increment a per cfs_rq or rt_rq,

[PATCH v6 01/12] don't call cpuacct_charge in stop_task.c

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa Commit 8f618968 changed stop_task to do the same bookkeping as the other classes. However, the call to cpuacct_charge() doesn't affect the scheduler decisions at all, and doesn't need to be moved over. Moreover, being a kthread, the migration thread won't belong to any

[PATCH v6 01/12] don't call cpuacct_charge in stop_task.c

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com Commit 8f618968 changed stop_task to do the same bookkeping as the other classes. However, the call to cpuacct_charge() doesn't affect the scheduler decisions at all, and doesn't need to be moved over. Moreover, being a kthread, the migration thread

[PATCH v6 02/12] cgroup: implement CFTYPE_NO_PREFIX

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Tejun Heo t...@kernel.org When cgroup files are created, cgroup core automatically prepends the name of the subsystem as prefix. This patch adds CFTYPE_NO_PREFIX which disables the automatic prefix. This will be used to deprecate cpuacct which will make cpu create and serve the cpuacct

[PATCH v6 10/12] sched: record per-cgroup number of context switches

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com Context switches are, to this moment, a property of the runqueue. When running containers, we would like to be able to present a separate figure for each container (or cgroup, in this context). The chosen way to accomplish this is to increment a per

[PATCH v6 11/12] sched: change nr_context_switches calculation.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com This patch changes the calculation of nr_context_switches. The variable nr_switches is now used to account for the number of transition to the idle task, or stop task. It is removed from the schedule() path. The total calculation can be made using the

[PATCH v6 12/12] sched: introduce cgroup file stat_percpu

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com The file cpu.stat_percpu will show various scheduler related information, that are usually available to the top level through other files. For instance, most of the meaningful data in /proc/stat is presented here. Given this file, a container can easily

[PATCH v6 09/12] sched: Push put_prev_task() into pick_next_task()

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Peter Zijlstra a.p.zijls...@chello.nl In order to avoid having to do put/set on a whole cgroup hierarchy when we context switch, push the put into pick_next_task() so that both operations are in the same function. Further changes then allow us to possibly optimize away redundant work. [

[PATCH v6 05/12] sched: adjust exec_clock to use it as cpu usage metric

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com exec_clock already provides per-group cpu usage metrics, and can be reused by cpuacct in case cpu and cpuacct are comounted. However, it is only provided by tasks in fair class. Doing the same for rt is easy, and can be done in an already existing

[PATCH v6 07/12] sched: document the cpu cgroup.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com The CPU cgroup is so far, undocumented. Although data exists in the Documentation directory about its functioning, it is usually spread, and/or presented in the context of something else. This file consolidates all cgroup-related information about it.

[PATCH v6 08/12] sched: account guest time per-cgroup as well.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com We already track multiple tick statistics per-cgroup, using the task_group_account_field facility. This patch accounts guest_time in that manner as well. Signed-off-by: Glauber Costa glom...@parallels.com CC: Peter Zijlstra a.p.zijls...@chello.nl CC:

[PATCH v6 00/12] per-cgroup cpu-stat

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com Hi all, This is an attempt to provide userspace with enough information to reconstruct per-container version of files like /proc/stat. In particular, we are interested in knowing the per-cgroup slices of user time, system time, wait time, number of

[PATCH v6 03/12] cgroup, sched: let cpu serve the same files as cpuacct

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Tejun Heo t...@kernel.org cpuacct being on a separate hierarchy is one of the main cgroup related complaints from scheduler side and the consensus seems to be * Allowing cpuacct to be a separate controller was a mistake. In general multiple controllers on the same type of resource

[PATCH v6 06/12] cpuacct: don't actually do anything.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com All the information we have that is needed for cpuusage (and cpuusage_percpu) is present in schedstats. It is already recorded in a sane hierarchical way. If we have CONFIG_SCHEDSTATS, we don't really need to do any extra work. All former functions

[PATCH v6 04/12] cgroup, sched: deprecate cpuacct

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Tejun Heo t...@kernel.org Now that cpu serves the same files as cpuacct and using cpuacct separately from cpu is deprecated, we can deprecate cpuacct. To avoid disturbing userland which has been co-mounting cpu and cpuacct, implement some hackery in cgroup core so that cpuacct co-mounting