[PATCH] cfq: fix lock imbalance with failed allocations

2013-01-28 Thread Lord Glauber Costa of Sealand
From: Glauber Costa While stress-running very-small container scenarios with the Kernel Memory Controller, I've run into a lockdep-detected lock imbalance in cfq-iosched.c. I'll apologize beforehand for not posting a backlog: I didn't anticipate it would be so hard to reproduce, so I didn't

Re: [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap.

2013-01-28 Thread Lord Glauber Costa of Sealand
Hello Mr. Someone. On 01/28/2013 06:28 PM, Aristeu Rozanski wrote: > On Fri, Jan 25, 2013 at 06:21:00PM -0800, Eric W. Biederman wrote: >> When I initially wrote the code for /proc//uid_map. I was lazy >> and avoided duplicate mappings by the simple expedient of ensuring the >> first number in a

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-28 Thread Lord Glauber Costa of Sealand
On 01/28/2013 12:14 PM, Eric W. Biederman wrote: > Lord Glauber Costa of Sealand writes: > >> I just saw in a later patch of yours that your concern here seems not >> limited to backed ram by tmpfs, but with things like the internal >> structures for userns , to av

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-28 Thread Lord Glauber Costa of Sealand
On 01/28/2013 12:14 PM, Eric W. Biederman wrote: Lord Glauber Costa of Sealand glom...@parallels.com writes: I just saw in a later patch of yours that your concern here seems not limited to backed ram by tmpfs, but with things like the internal structures for userns , to avoid patterns

Re: [PATCH review 2/6] userns: Allow any uid or gid mappings that don't overlap.

2013-01-28 Thread Lord Glauber Costa of Sealand
Hello Mr. Someone. On 01/28/2013 06:28 PM, Aristeu Rozanski wrote: On Fri, Jan 25, 2013 at 06:21:00PM -0800, Eric W. Biederman wrote: When I initially wrote the code for /proc/pid/uid_map. I was lazy and avoided duplicate mappings by the simple expedient of ensuring the first number in a new

[PATCH] cfq: fix lock imbalance with failed allocations

2013-01-28 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com While stress-running very-small container scenarios with the Kernel Memory Controller, I've run into a lockdep-detected lock imbalance in cfq-iosched.c. I'll apologize beforehand for not posting a backlog: I didn't anticipate it would be so hard

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-28 Thread Lord Glauber Costa of Sealand
On 01/28/2013 08:19 PM, Eric W. Biederman wrote: Lord Glauber Costa of Sealand glom...@parallels.com writes: On 01/28/2013 12:14 PM, Eric W. Biederman wrote: Lord Glauber Costa of Sealand glom...@parallels.com writes: I just saw in a later patch of yours that your concern here seems

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-27 Thread Lord Glauber Costa of Sealand
On 01/28/2013 11:37 AM, Lord Glauber Costa of Sealand wrote: > On 01/26/2013 06:22 AM, Eric W. Biederman wrote: >> >> In the help text describing user namespaces recommend use of memory >> control groups. In many cases memory control groups are the only >> mechanism

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-27 Thread Lord Glauber Costa of Sealand
On 01/26/2013 06:22 AM, Eric W. Biederman wrote: > > In the help text describing user namespaces recommend use of memory > control groups. In many cases memory control groups are the only > mechanism there is to limit how much memory a user who can create > user namespaces can use. > >

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-27 Thread Lord Glauber Costa of Sealand
On 01/26/2013 06:22 AM, Eric W. Biederman wrote: In the help text describing user namespaces recommend use of memory control groups. In many cases memory control groups are the only mechanism there is to limit how much memory a user who can create user namespaces can use. Signed-off-by:

Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-27 Thread Lord Glauber Costa of Sealand
On 01/28/2013 11:37 AM, Lord Glauber Costa of Sealand wrote: On 01/26/2013 06:22 AM, Eric W. Biederman wrote: In the help text describing user namespaces recommend use of memory control groups. In many cases memory control groups are the only mechanism there is to limit how much memory

[PATCH v6 06/12] cpuacct: don't actually do anything.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa All the information we have that is needed for cpuusage (and cpuusage_percpu) is present in schedstats. It is already recorded in a sane hierarchical way. If we have CONFIG_SCHEDSTATS, we don't really need to do any extra work. All former functions become empty inlines

[PATCH v6 04/12] cgroup, sched: deprecate cpuacct

2013-01-24 Thread Lord Glauber Costa of Sealand
off-by: Tejun Heo Cc: Peter Zijlstra Cc: Glauber Costa Cc: Michal Hocko Cc: Kay Sievers Cc: Lennart Poettering Cc: Dave Jones Cc: Ben Hutchings Cc: Paul Turner --- init/Kconfig| 11 ++- kernel/cgroup.c | 47 ++- kernel/sc

[PATCH v6 00/12] per-cgroup cpu-stat

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa Hi all, This is an attempt to provide userspace with enough information to reconstruct per-container version of files like "/proc/stat". In particular, we are interested in knowing the per-cgroup slices of user time, system time, wait time, number of processes, and

[PATCH v6 03/12] cgroup, sched: let cpu serve the same files as cpuacct

2013-01-24 Thread Lord Glauber Costa of Sealand
on top of which cpu can implement proper optimization. [ glommer: don't call *_charge in stop_task.c ] Signed-off-by: Tejun Heo Signed-off-by: Glauber Costa Cc: Peter Zijlstra Cc: Michal Hocko Cc: Kay Sievers Cc: Lennart Poettering Cc: Dave Jones Cc: Ben Hutchings Cc: Paul Turner

[PATCH v6 08/12] sched: account guest time per-cgroup as well.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa We already track multiple tick statistics per-cgroup, using the task_group_account_field facility. This patch accounts guest_time in that manner as well. Signed-off-by: Glauber Costa CC: Peter Zijlstra CC: Paul Turner --- kernel/sched/cputime.c | 10 -- 1 file

[PATCH v6 07/12] sched: document the cpu cgroup.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa The CPU cgroup is so far, undocumented. Although data exists in the Documentation directory about its functioning, it is usually spread, and/or presented in the context of something else. This file consolidates all cgroup-related information about it. Signed-off-by: Glauber

[PATCH v6 05/12] sched: adjust exec_clock to use it as cpu usage metric

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa exec_clock already provides per-group cpu usage metrics, and can be reused by cpuacct in case cpu and cpuacct are comounted. However, it is only provided by tasks in fair class. Doing the same for rt is easy, and can be done in an already existing hierarchy loop

[PATCH v6 12/12] sched: introduce cgroup file stat_percpu

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa The file cpu.stat_percpu will show various scheduler related information, that are usually available to the top level through other files. For instance, most of the meaningful data in /proc/stat is presented here. Given this file, a container can easily construct a local

[PATCH v6 09/12] sched: Push put_prev_task() into pick_next_task()

2013-01-24 Thread Lord Glauber Costa of Sealand
: incorporated mailing list feedback ] Signed-off-by: Peter Zijlstra Signed-off-by: Glauber Costa --- include/linux/sched.h| 8 +++- kernel/sched/core.c | 20 +++- kernel/sched/fair.c | 6 +- kernel/sched/idle_task.c | 6 +- kernel/sched/rt.c| 27

[PATCH v6 11/12] sched: change nr_context_switches calculation.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa This patch changes the calculation of nr_context_switches. The variable "nr_switches" is now used to account for the number of transition to the idle task, or stop task. It is removed from the schedule() path. The total calculation can be made usin

[PATCH v6 02/12] cgroup: implement CFTYPE_NO_PREFIX

2013-01-24 Thread Lord Glauber Costa of Sealand
-off-by: Tejun Heo Cc: Peter Zijlstra Cc: Glauber Costa --- include/linux/cgroup.h | 1 + kernel/cgroup.c| 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 7d73905..7d193f9 100644 --- a/include/linux/cgroup.h +++ b

[PATCH v6 10/12] sched: record per-cgroup number of context switches

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa Context switches are, to this moment, a property of the runqueue. When running containers, we would like to be able to present a separate figure for each container (or cgroup, in this context). The chosen way to accomplish this is to increment a per cfs_rq or rt_rq

[PATCH v6 01/12] don't call cpuacct_charge in stop_task.c

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa Commit 8f618968 changed stop_task to do the same bookkeping as the other classes. However, the call to cpuacct_charge() doesn't affect the scheduler decisions at all, and doesn't need to be moved over. Moreover, being a kthread, the migration thread won't belong to any

[PATCH v6 01/12] don't call cpuacct_charge in stop_task.c

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com Commit 8f618968 changed stop_task to do the same bookkeping as the other classes. However, the call to cpuacct_charge() doesn't affect the scheduler decisions at all, and doesn't need to be moved over. Moreover, being a kthread, the migration thread

[PATCH v6 02/12] cgroup: implement CFTYPE_NO_PREFIX

2013-01-24 Thread Lord Glauber Costa of Sealand
files. Signed-off-by: Tejun Heo t...@kernel.org Cc: Peter Zijlstra pet...@infradead.org Cc: Glauber Costa glom...@parallels.com --- include/linux/cgroup.h | 1 + kernel/cgroup.c| 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/cgroup.h b/include/linux

[PATCH v6 10/12] sched: record per-cgroup number of context switches

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com Context switches are, to this moment, a property of the runqueue. When running containers, we would like to be able to present a separate figure for each container (or cgroup, in this context). The chosen way to accomplish this is to increment a per

[PATCH v6 11/12] sched: change nr_context_switches calculation.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com This patch changes the calculation of nr_context_switches. The variable nr_switches is now used to account for the number of transition to the idle task, or stop task. It is removed from the schedule() path. The total calculation can be made using

[PATCH v6 12/12] sched: introduce cgroup file stat_percpu

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com The file cpu.stat_percpu will show various scheduler related information, that are usually available to the top level through other files. For instance, most of the meaningful data in /proc/stat is presented here. Given this file, a container can easily

[PATCH v6 09/12] sched: Push put_prev_task() into pick_next_task()

2013-01-24 Thread Lord Glauber Costa of Sealand
. [ glom...@parallels.com: incorporated mailing list feedback ] Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl Signed-off-by: Glauber Costa glom...@parallels.com --- include/linux/sched.h| 8 +++- kernel/sched/core.c | 20 +++- kernel/sched/fair.c | 6

[PATCH v6 05/12] sched: adjust exec_clock to use it as cpu usage metric

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com exec_clock already provides per-group cpu usage metrics, and can be reused by cpuacct in case cpu and cpuacct are comounted. However, it is only provided by tasks in fair class. Doing the same for rt is easy, and can be done in an already existing

[PATCH v6 07/12] sched: document the cpu cgroup.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com The CPU cgroup is so far, undocumented. Although data exists in the Documentation directory about its functioning, it is usually spread, and/or presented in the context of something else. This file consolidates all cgroup-related information about

[PATCH v6 08/12] sched: account guest time per-cgroup as well.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com We already track multiple tick statistics per-cgroup, using the task_group_account_field facility. This patch accounts guest_time in that manner as well. Signed-off-by: Glauber Costa glom...@parallels.com CC: Peter Zijlstra a.p.zijls...@chello.nl CC

[PATCH v6 00/12] per-cgroup cpu-stat

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com Hi all, This is an attempt to provide userspace with enough information to reconstruct per-container version of files like /proc/stat. In particular, we are interested in knowing the per-cgroup slices of user time, system time, wait time, number

[PATCH v6 03/12] cgroup, sched: let cpu serve the same files as cpuacct

2013-01-24 Thread Lord Glauber Costa of Sealand
and creating a base on top of which cpu can implement proper optimization. [ glommer: don't call *_charge in stop_task.c ] Signed-off-by: Tejun Heo t...@kernel.org Signed-off-by: Glauber Costa glom...@parallels.com Cc: Peter Zijlstra pet...@infradead.org Cc: Michal Hocko mho...@suse.cz Cc: Kay Sievers

[PATCH v6 06/12] cpuacct: don't actually do anything.

2013-01-24 Thread Lord Glauber Costa of Sealand
From: Glauber Costa glom...@parallels.com All the information we have that is needed for cpuusage (and cpuusage_percpu) is present in schedstats. It is already recorded in a sane hierarchical way. If we have CONFIG_SCHEDSTATS, we don't really need to do any extra work. All former functions

[PATCH v6 04/12] cgroup, sched: deprecate cpuacct

2013-01-24 Thread Lord Glauber Costa of Sealand
] Signed-off-by: Tejun Heo t...@kernel.org Cc: Peter Zijlstra pet...@infradead.org Cc: Glauber Costa glom...@parallels.com Cc: Michal Hocko mho...@suse.cz Cc: Kay Sievers kay.siev...@vrfy.org Cc: Lennart Poettering mzxre...@0pointer.de Cc: Dave Jones da...@redhat.com Cc: Ben Hutchings b

Re: [RFC, PATCH 00/19] Numa aware LRU lists and shrinkers

2013-01-23 Thread Glauber Costa
On 01/22/2013 03:21 AM, Dave Chinner wrote: > On Mon, Jan 21, 2013 at 08:08:53PM +0400, Glauber Costa wrote: >> On 11/28/2012 03:14 AM, Dave Chinner wrote: >>> [PATCH 09/19] list_lru: per-node list infrastructure >>> >>> This makes the generic LRU

Re: [PATCH v5 11/11] sched: introduce cgroup file stat_percpu

2013-01-23 Thread Glauber Costa
On 01/10/2013 01:27 AM, Glauber Costa wrote: > On 01/10/2013 01:17 AM, Andrew Morton wrote: >> On Thu, 10 Jan 2013 01:10:02 +0400 >> Glauber Costa wrote: >> >>> The main advantage I see in this approach, is that there is way less >>> data to be written

Re: [PATCH v5 11/11] sched: introduce cgroup file stat_percpu

2013-01-23 Thread Glauber Costa
On 01/10/2013 12:42 AM, Andrew Morton wrote: > Also, I'm not seeing any changes to Docmentation/ in this patchset. > How do we explain the interface to our users? There is little point in adding any Documentation, since the cpu cgroup itself is not documented. I took the liberty of doing this

Re: [PATCH v5 00/11] per-cgroup cpu-stat

2013-01-23 Thread Glauber Costa
On 01/23/2013 05:53 AM, Colin Cross wrote: > On Tue, Jan 22, 2013 at 5:02 PM, Tejun Heo wrote: >> Hello, >> >> On Mon, Jan 21, 2013 at 04:14:27PM +0400, Glauber Costa wrote: >>>> Android userspace is currently using both cpu and cpuacct, and not >

Re: [PATCH v5 00/11] per-cgroup cpu-stat

2013-01-23 Thread Glauber Costa
On 01/23/2013 05:53 AM, Colin Cross wrote: On Tue, Jan 22, 2013 at 5:02 PM, Tejun Heo t...@kernel.org wrote: Hello, On Mon, Jan 21, 2013 at 04:14:27PM +0400, Glauber Costa wrote: Android userspace is currently using both cpu and cpuacct, and not co-mounting them. They are used

Re: [PATCH v5 11/11] sched: introduce cgroup file stat_percpu

2013-01-23 Thread Glauber Costa
On 01/10/2013 12:42 AM, Andrew Morton wrote: Also, I'm not seeing any changes to Docmentation/ in this patchset. How do we explain the interface to our users? There is little point in adding any Documentation, since the cpu cgroup itself is not documented. I took the liberty of doing this

Re: [PATCH v5 11/11] sched: introduce cgroup file stat_percpu

2013-01-23 Thread Glauber Costa
On 01/10/2013 01:27 AM, Glauber Costa wrote: On 01/10/2013 01:17 AM, Andrew Morton wrote: On Thu, 10 Jan 2013 01:10:02 +0400 Glauber Costa glom...@parallels.com wrote: The main advantage I see in this approach, is that there is way less data to be written using a header. Although your way

Re: [RFC, PATCH 00/19] Numa aware LRU lists and shrinkers

2013-01-23 Thread Glauber Costa
On 01/22/2013 03:21 AM, Dave Chinner wrote: On Mon, Jan 21, 2013 at 08:08:53PM +0400, Glauber Costa wrote: On 11/28/2012 03:14 AM, Dave Chinner wrote: [PATCH 09/19] list_lru: per-node list infrastructure This makes the generic LRU list much more scalable by changing it to a {list,lock,count

Re: [RFC, PATCH 00/19] Numa aware LRU lists and shrinkers

2013-01-21 Thread Glauber Costa
On 11/28/2012 03:14 AM, Dave Chinner wrote: > [PATCH 09/19] list_lru: per-node list infrastructure > > This makes the generic LRU list much more scalable by changing it to > a {list,lock,count} tuple per node. There are no external API > changes to this changeover, so is transparent to current

Re: [PATCH v5 00/11] per-cgroup cpu-stat

2013-01-21 Thread Glauber Costa
On 01/16/2013 04:33 AM, Colin Cross wrote: > On Wed, Jan 9, 2013 at 3:45 AM, Glauber Costa wrote: >> [ update: I thought I posted this already before leaving for holidays. >> However, >> now that I am checking for replies, I can't find nor replies nor the >> ori

Re: [PATCH v5 00/11] per-cgroup cpu-stat

2013-01-21 Thread Glauber Costa
On 01/16/2013 04:33 AM, Colin Cross wrote: On Wed, Jan 9, 2013 at 3:45 AM, Glauber Costa glom...@parallels.com wrote: [ update: I thought I posted this already before leaving for holidays. However, now that I am checking for replies, I can't find nor replies nor the original mail in my

Re: [RFC, PATCH 00/19] Numa aware LRU lists and shrinkers

2013-01-21 Thread Glauber Costa
On 11/28/2012 03:14 AM, Dave Chinner wrote: [PATCH 09/19] list_lru: per-node list infrastructure This makes the generic LRU list much more scalable by changing it to a {list,lock,count} tuple per node. There are no external API changes to this changeover, so is transparent to current users.

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-18 Thread Glauber Costa
On 01/18/2013 04:10 PM, Dave Chinner wrote: > On Fri, Jan 18, 2013 at 11:10:00AM -0800, Glauber Costa wrote: >> On 01/18/2013 12:11 AM, Dave Chinner wrote: >>> On Thu, Jan 17, 2013 at 04:14:10PM -0800, Glauber Costa wrote: >>>> On 01/17/2013 04:10 PM, Dave Chinner wro

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-18 Thread Glauber Costa
On 01/18/2013 12:11 AM, Dave Chinner wrote: > On Thu, Jan 17, 2013 at 04:14:10PM -0800, Glauber Costa wrote: >> On 01/17/2013 04:10 PM, Dave Chinner wrote: >>> And then each object uses: >>> >>> struct lru_item { >>> struct list_head glo

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-18 Thread Glauber Costa
On 01/18/2013 12:08 AM, Dave Chinner wrote: > On Thu, Jan 17, 2013 at 04:51:03PM -0800, Glauber Costa wrote: >> On 01/17/2013 04:10 PM, Dave Chinner wrote: >>> and we end up with: >>> >>> lru_add(struct lru_list *lru, struct lru_item *item) >>> { &

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-18 Thread Glauber Costa
On 01/18/2013 12:08 AM, Dave Chinner wrote: On Thu, Jan 17, 2013 at 04:51:03PM -0800, Glauber Costa wrote: On 01/17/2013 04:10 PM, Dave Chinner wrote: and we end up with: lru_add(struct lru_list *lru, struct lru_item *item) { node_id = min(object_to_nid(item), lru-numnodes

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-18 Thread Glauber Costa
On 01/18/2013 12:11 AM, Dave Chinner wrote: On Thu, Jan 17, 2013 at 04:14:10PM -0800, Glauber Costa wrote: On 01/17/2013 04:10 PM, Dave Chinner wrote: And then each object uses: struct lru_item { struct list_head global_list; struct list_head memcg_list; } by objects you mean

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-18 Thread Glauber Costa
On 01/18/2013 04:10 PM, Dave Chinner wrote: On Fri, Jan 18, 2013 at 11:10:00AM -0800, Glauber Costa wrote: On 01/18/2013 12:11 AM, Dave Chinner wrote: On Thu, Jan 17, 2013 at 04:14:10PM -0800, Glauber Costa wrote: On 01/17/2013 04:10 PM, Dave Chinner wrote: And then each object uses: struct

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-17 Thread Glauber Costa
On 01/17/2013 04:10 PM, Dave Chinner wrote: > and we end up with: > > lru_add(struct lru_list *lru, struct lru_item *item) > { > node_id = min(object_to_nid(item), lru->numnodes); > > __lru_add(lru, node_id, >global_list); > if (memcg) { > memcg_lru =

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-17 Thread Glauber Costa
On 01/17/2013 04:10 PM, Dave Chinner wrote: > And then each object uses: > > struct lru_item { > struct list_head global_list; > struct list_head memcg_list; > } by objects you mean dentries, inodes, and the such, right? Would it be acceptable to you? We've been of course doing our

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-17 Thread Glauber Costa
>> Deepest fears: >> >> 1) snakes. > > Snakes are merely poisonous. Drop Bears are far more dangerous :P > fears are irrational anyway... >> 2) It won't surprise you to know that I am adapting your work, which >> provides a very sane and helpful API, to memcg shrinking. >> >> The dumb and

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-17 Thread Glauber Costa
Deepest fears: 1) snakes. Snakes are merely poisonous. Drop Bears are far more dangerous :P fears are irrational anyway... 2) It won't surprise you to know that I am adapting your work, which provides a very sane and helpful API, to memcg shrinking. The dumb and simple approach in

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-17 Thread Glauber Costa
On 01/17/2013 04:10 PM, Dave Chinner wrote: And then each object uses: struct lru_item { struct list_head global_list; struct list_head memcg_list; } by objects you mean dentries, inodes, and the such, right? Would it be acceptable to you? We've been of course doing our best to

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-17 Thread Glauber Costa
On 01/17/2013 04:10 PM, Dave Chinner wrote: and we end up with: lru_add(struct lru_list *lru, struct lru_item *item) { node_id = min(object_to_nid(item), lru-numnodes); __lru_add(lru, node_id, item-global_list); if (memcg) { memcg_lru =

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-16 Thread Glauber Costa
>> The superblocks only, are present by the dozens even in a small system, >> and I believe the whole goal of this API is to get more users to switch >> to it. This can easily use up a respectable bunch of megs. >> >> Isn't it a bit too much ? > > Maybe, but for active superblocks it only takes

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-16 Thread Glauber Costa
On 11/27/2012 03:14 PM, Dave Chinner wrote: > From: Dave Chinner > > Now that we have an LRU list API, we can start to enhance the > implementation. This splits the single LRU list into per-node lists > and locks to enhance scalability. Items are placed on lists > according to the node the

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-16 Thread Glauber Costa
On 11/27/2012 03:14 PM, Dave Chinner wrote: From: Dave Chinner dchin...@redhat.com Now that we have an LRU list API, we can start to enhance the implementation. This splits the single LRU list into per-node lists and locks to enhance scalability. Items are placed on lists according to the

Re: [PATCH 09/19] list_lru: per-node list infrastructure

2013-01-16 Thread Glauber Costa
The superblocks only, are present by the dozens even in a small system, and I believe the whole goal of this API is to get more users to switch to it. This can easily use up a respectable bunch of megs. Isn't it a bit too much ? Maybe, but for active superblocks it only takes a handful of

Re: [PATCH v5 03/11] cgroup, sched: let cpu serve the same files as cpuacct

2013-01-15 Thread Glauber Costa
On 01/15/2013 02:19 AM, Sha Zhengju wrote: > On Mon, Jan 14, 2013 at 10:55 PM, Glauber Costa wrote: >> On 01/14/2013 12:34 AM, Sha Zhengju wrote: >>>> + struct kernel_cpustat *kcpustat = >>>> this_cpu_ptr(ca->cpustat); >>>>>

Re: [PATCH v5 03/11] cgroup, sched: let cpu serve the same files as cpuacct

2013-01-15 Thread Glauber Costa
On 01/15/2013 02:19 AM, Sha Zhengju wrote: On Mon, Jan 14, 2013 at 10:55 PM, Glauber Costa glom...@parallels.com wrote: On 01/14/2013 12:34 AM, Sha Zhengju wrote: + struct kernel_cpustat *kcpustat = this_cpu_ptr(ca-cpustat); + kcpustat = this_cpu_ptr(ca-cpustat

Re: [PATCH v5 03/11] cgroup, sched: let cpu serve the same files as cpuacct

2013-01-14 Thread Glauber Costa
On 01/14/2013 12:34 AM, Sha Zhengju wrote: >> + struct kernel_cpustat *kcpustat = this_cpu_ptr(ca->cpustat); >> > + >> > kcpustat = this_cpu_ptr(ca->cpustat); > Is this reassignment unnecessary? > > No. -- To unsubscribe from this list: send the line "unsubscribe

Re: [PATCH v5 03/11] cgroup, sched: let cpu serve the same files as cpuacct

2013-01-14 Thread Glauber Costa
On 01/14/2013 12:34 AM, Sha Zhengju wrote: + struct kernel_cpustat *kcpustat = this_cpu_ptr(ca-cpustat); + kcpustat = this_cpu_ptr(ca-cpustat); Is this reassignment unnecessary? No. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-10 Thread Glauber Costa
> If it's configure as ZONE_NORMAL, you need to pray for offlining memory. > > AFAIK, IBM's ppc? has 16MB section size. So, some of sections can be > offlined > even if they are configured as ZONE_NORMAL. For them, placement of offlined > memory is not important because it's virtualized by LPAR,

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-10 Thread Glauber Costa
If it's configure as ZONE_NORMAL, you need to pray for offlining memory. AFAIK, IBM's ppc? has 16MB section size. So, some of sections can be offlined even if they are configured as ZONE_NORMAL. For them, placement of offlined memory is not important because it's virtualized by LPAR, they

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-10 Thread Glauber Costa
If it's configure as ZONE_NORMAL, you need to pray for offlining memory. AFAIK, IBM's ppc? has 16MB section size. So, some of sections can be offlined even if they are configured as ZONE_NORMAL. For them, placement of offlined memory is not important because it's virtualized by LPAR, they

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-09 Thread Glauber Costa
On 01/10/2013 11:31 AM, Kamezawa Hiroyuki wrote: > (2013/01/10 16:14), Glauber Costa wrote: >> On 01/10/2013 06:17 AM, Tang Chen wrote: >>>>> Note: if the memory provided by the memory device is used by the >>>>> kernel, it >>>>> can't be offl

Re: [PATCH 1/2] Add mempressure cgroup

2013-01-09 Thread Glauber Costa
On 01/10/2013 02:06 AM, Anton Vorontsov wrote: > On Wed, Jan 09, 2013 at 01:55:14PM -0800, Tejun Heo wrote: > [...] >>> We can use mempressure w/o memcg, and even then it can (or should :) be >>> useful (for cpuset, for example). >> >> The problem is that you end with, at the very least, duplicate

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-09 Thread Glauber Costa
On 01/10/2013 06:17 AM, Tang Chen wrote: >>> Note: if the memory provided by the memory device is used by the >>> kernel, it >>> can't be offlined. It is not a bug. >> >> Right. But how often does this happen in testing? In other words, >> please provide an overall description of how well memory

Re: [PATCH v5 11/11] sched: introduce cgroup file stat_percpu

2013-01-09 Thread Glauber Costa
On 01/10/2013 01:17 AM, Andrew Morton wrote: > On Thu, 10 Jan 2013 01:10:02 +0400 > Glauber Costa wrote: > >> The main advantage I see in this approach, is that there is way less >> data to be written using a header. Although your way works, it means we >> will write

Re: [PATCH 1/2] Add mempressure cgroup

2013-01-09 Thread Glauber Costa
On 01/10/2013 12:37 AM, Tejun Heo wrote: > Hello, > > Can you please cc me too when posting further patches? I kinda missed > the whole discussion upto this point. > > On Fri, Jan 04, 2013 at 12:29:11AM -0800, Anton Vorontsov wrote: >> This commit implements David Rientjes' idea of mempressure

Re: [PATCH v5 11/11] sched: introduce cgroup file stat_percpu

2013-01-09 Thread Glauber Costa
On 01/10/2013 12:42 AM, Andrew Morton wrote: > On Wed, 9 Jan 2013 15:45:38 +0400 > Glauber Costa wrote: > >> The file cpu.stat_percpu will show various scheduler related >> information, that are usually available to the top level through other >> file

Re: [PATCH v5 01/14] memory-hotplug: try to offline the memory twice to avoid dependence

2013-01-09 Thread Glauber Costa
On 12/30/2012 09:58 AM, Wen Congyang wrote: > At 12/25/2012 04:35 PM, Glauber Costa Wrote: >> On 12/24/2012 04:09 PM, Tang Chen wrote: >>> From: Wen Congyang >>> >>> memory can't be offlined when CONFIG_MEMCG is selected. >>> For example: there is

Re: [PATCH 1/2] Add mempressure cgroup

2013-01-09 Thread Glauber Costa
On 01/09/2013 01:44 AM, Andrew Morton wrote: > On Fri, 4 Jan 2013 00:29:11 -0800 > Anton Vorontsov wrote: > >> This commit implements David Rientjes' idea of mempressure cgroup. >> >> The main characteristics are the same to what I've tried to add to vmevent >> API; internally, it uses Mel

Re: [PATCH 1/2] Add mempressure cgroup

2013-01-09 Thread Glauber Costa
On 01/09/2013 01:15 PM, Andrew Morton wrote: > On Wed, 9 Jan 2013 12:56:46 +0400 Glauber Costa wrote: > >>> +#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_MEMPRESSURE) >>> +SUBSYS(mpc_cgroup) >>> +#endif >> >> It might be just me, but if one does not know wha

[PATCH v5 04/11] cgroup, sched: deprecate cpuacct

2013-01-09 Thread Glauber Costa
off-by: Tejun Heo Cc: Peter Zijlstra Cc: Glauber Costa Cc: Michal Hocko Cc: Kay Sievers Cc: Lennart Poettering Cc: Dave Jones Cc: Ben Hutchings Cc: Paul Turner --- init/Kconfig| 11 ++- kernel/cgroup.c | 47 ++- kernel/sc

[PATCH v5 03/11] cgroup, sched: let cpu serve the same files as cpuacct

2013-01-09 Thread Glauber Costa
on top of which cpu can implement proper optimization. [ glommer: don't call *_charge in stop_task.c ] Signed-off-by: Tejun Heo Signed-off-by: Glauber Costa Cc: Peter Zijlstra Cc: Michal Hocko Cc: Kay Sievers Cc: Lennart Poettering Cc: Dave Jones Cc: Ben Hutchings Cc: Paul Turner

[PATCH v5 06/11] cpuacct: don't actually do anything.

2013-01-09 Thread Glauber Costa
All the information we have that is needed for cpuusage (and cpuusage_percpu) is present in schedstats. It is already recorded in a sane hierarchical way. If we have CONFIG_SCHEDSTATS, we don't really need to do any extra work. All former functions become empty inlines. Signed-off-by: Glauber

[PATCH v5 09/11] record per-cgroup number of context switches

2013-01-09 Thread Glauber Costa
not likely, it seems a fair price to pay. 2. Those figures do not include switches from and to the idle or stop task. Those need to be recorded separately, which will happen in a follow up patch. Signed-off-by: Glauber Costa CC: Peter Zijlstra CC: Paul Turner --- kernel/sched/fair.c | 18

[PATCH v5 00/11] per-cgroup cpu-stat

2013-01-09 Thread Glauber Costa
s provided by the cpu controller, resulting in greater simplicity. This also tries to hook into the existing scheduler hierarchy walks instead of providing new ones. Glauber Costa (7): don't call cpuacct_charge in stop_task.c sched: adjust exec_clock to use it as cpu usage metric cpuacct: don'

[PATCH v5 01/11] don't call cpuacct_charge in stop_task.c

2013-01-09 Thread Glauber Costa
this call quite useless. Signed-off-by: Glauber Costa CC: Mike Galbraith CC: Peter Zijlstra CC: Thomas Gleixner --- kernel/sched/stop_task.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c index da5eb5b..fda1cbe 100644 --- a/kernel/sched

[PATCH v5 10/11] sched: change nr_context_switches calculation.

2013-01-09 Thread Glauber Costa
air and rt classes are recorded in the root_task_group. One can easily derive the total figure by adding those quantities together. Signed-off-by: Glauber Costa CC: Peter Zijlstra CC: Paul Turner --- kernel/sched/core.c | 17 +++-- kernel/sched/idle_task.c | 3 +++ kernel/sched/s

[PATCH v5 05/11] sched: adjust exec_clock to use it as cpu usage metric

2013-01-09 Thread Glauber Costa
the independent hierarchy walk executed by cpuacct. Signed-off-by: Glauber Costa CC: Dave Jones CC: Ben Hutchings CC: Peter Zijlstra CC: Paul Turner CC: Lennart Poettering CC: Kay Sievers CC: Tejun Heo --- kernel/sched/rt.c| 1 + kernel/sched/sched.h | 3 +++ 2 files changed, 4 insertions

[PATCH v5 11/11] sched: introduce cgroup file stat_percpu

2013-01-09 Thread Glauber Costa
are cgroup-local versions of their global counterparts. The file includes a header, so fields can come and go if needed. Signed-off-by: Glauber Costa CC: Peter Zijlstra CC: Paul Turner --- kernel/sched/core.c | 97 kernel/sched/fair.c

[PATCH v5 08/11] sched: Push put_prev_task() into pick_next_task()

2013-01-09 Thread Glauber Costa
: incorporated mailing list feedback ] Signed-off-by: Peter Zijlstra Signed-off-by: Glauber Costa --- include/linux/sched.h| 8 +++- kernel/sched/core.c | 20 +++- kernel/sched/fair.c | 6 +- kernel/sched/idle_task.c | 6 +- kernel/sched/rt.c| 27

[PATCH v5 07/11] account guest time per-cgroup as well.

2013-01-09 Thread Glauber Costa
We already track multiple tick statistics per-cgroup, using the task_group_account_field facility. This patch accounts guest_time in that manner as well. Signed-off-by: Glauber Costa CC: Peter Zijlstra CC: Paul Turner --- kernel/sched/cputime.c | 10 -- 1 file changed, 4 insertions

[PATCH v5 02/11] cgroup: implement CFTYPE_NO_PREFIX

2013-01-09 Thread Glauber Costa
-off-by: Tejun Heo Cc: Peter Zijlstra Cc: Glauber Costa --- include/linux/cgroup.h | 1 + kernel/cgroup.c| 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 7d73905..7d193f9 100644 --- a/include/linux/cgroup.h +++ b

Re: [PATCHSET] cpuset: decouple cpuset locking from cgroup core, take#2

2013-01-09 Thread Glauber Costa
On 01/04/2013 01:35 AM, Tejun Heo wrote: > Note that this leaves memcg as the only external user of cgroup_mutex. > Michal, Kame, can you guys please convert memcg to use its own locking > too? I've already done this, I just have to rework it according to latest feedback and repost it. It should

Re: [PATCH 1/2] Add mempressure cgroup

2013-01-09 Thread Glauber Costa
Hi. I have a couple of small questions. On 01/04/2013 12:29 PM, Anton Vorontsov wrote: > This commit implements David Rientjes' idea of mempressure cgroup. > > The main characteristics are the same to what I've tried to add to vmevent > API; internally, it uses Mel Gorman's idea of

Re: [PATCH 1/2] Add mempressure cgroup

2013-01-09 Thread Glauber Costa
Hi. I have a couple of small questions. On 01/04/2013 12:29 PM, Anton Vorontsov wrote: This commit implements David Rientjes' idea of mempressure cgroup. The main characteristics are the same to what I've tried to add to vmevent API; internally, it uses Mel Gorman's idea of

Re: [PATCHSET] cpuset: decouple cpuset locking from cgroup core, take#2

2013-01-09 Thread Glauber Costa
On 01/04/2013 01:35 AM, Tejun Heo wrote: Note that this leaves memcg as the only external user of cgroup_mutex. Michal, Kame, can you guys please convert memcg to use its own locking too? I've already done this, I just have to rework it according to latest feedback and repost it. It should be

[PATCH v5 02/11] cgroup: implement CFTYPE_NO_PREFIX

2013-01-09 Thread Glauber Costa
files. Signed-off-by: Tejun Heo t...@kernel.org Cc: Peter Zijlstra pet...@infradead.org Cc: Glauber Costa glom...@parallels.com --- include/linux/cgroup.h | 1 + kernel/cgroup.c| 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/cgroup.h b/include/linux

[PATCH v5 08/11] sched: Push put_prev_task() into pick_next_task()

2013-01-09 Thread Glauber Costa
. [ glom...@parallels.com: incorporated mailing list feedback ] Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl Signed-off-by: Glauber Costa glom...@parallels.com --- include/linux/sched.h| 8 +++- kernel/sched/core.c | 20 +++- kernel/sched/fair.c | 6

[PATCH v5 07/11] account guest time per-cgroup as well.

2013-01-09 Thread Glauber Costa
We already track multiple tick statistics per-cgroup, using the task_group_account_field facility. This patch accounts guest_time in that manner as well. Signed-off-by: Glauber Costa glom...@parallels.com CC: Peter Zijlstra a.p.zijls...@chello.nl CC: Paul Turner p...@google.com --- kernel/sched

<    1   2   3   4   5   6   7   8   9   10   >