[Devel] Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-15 Thread Glauber Costa
On 08/14/2012 07:16 PM, Mel Gorman wrote: On Thu, Aug 09, 2012 at 05:01:15PM +0400, Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Glauber Costa
On 08/14/2012 10:58 PM, Greg Thelen wrote: On Mon, Aug 13 2012, Glauber Costa wrote: + WARN_ON(mem_cgroup_is_root(memcg)); + size = (1 order) PAGE_SHIFT; + memcg_uncharge_kmem(memcg, size); + mem_cgroup_put(memcg); Why do we need ref-counting here ? kmem res_counter cannot work as

[Devel] Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-15 Thread Michal Hocko
On Thu 09-08-12 17:01:15, Glauber Costa wrote: [...] diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b956cec..da341dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct page *page = NULL;

[Devel] Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-15 Thread Mel Gorman
On Mon, Aug 13, 2012 at 12:03:38PM +0400, Glauber Costa wrote: On 08/10/2012 09:33 PM, Kamezawa Hiroyuki wrote: (2012/08/09 22:01), Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to

[Devel] Re: [PATCH] SUNRPC: check current nsproxy before set of node name on client creation

2012-08-15 Thread Jeff Layton
On Mon, 13 Aug 2012 15:21:56 +0400 Stanislav Kinsbursky skinsbur...@parallels.com wrote: When child reaper exits, it can destroy mount namespace it belong to, and if there are NFS mounts inside, then it will try to umount them. But in this point current-nsproxy is set to NULL and all

[Devel] Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-15 Thread Mel Gorman
On Thu, Aug 09, 2012 at 05:01:15PM +0400, Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa
We always account to both user and kernel resource_counters. This effectively means that an independent kernel limit is in place when the limit is set to a lower value than the user memory. A equal or higher value means that the user limit will always hit first, meaning that kmem is

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Glauber Costa
+ * memcg_kmem_new_page: verify if a new kmem allocation is allowed. + * @gfp: the gfp allocation flags. + * @handle: a pointer to the memcg this was charged against. + * @order: allocation order. + * + * returns true if the memcg where the current task belongs can hold this + *

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Glauber Costa
On 08/15/2012 01:42 PM, Glauber Costa wrote: Also, as I have mentioned in the other email in this thread. Why should we reclaim just because of kernel allocation when we are not reclaiming any of it because shrink_slab is ignored in the memcg reclaim. Don't get too distracted by the fact

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread James Bottomley
On Wed, 2012-08-15 at 13:33 +0400, Glauber Costa wrote: This can be quite confusing. I am still not sure whether we should mix the two things together. If somebody wants to limit the kernel memory he has to touch the other limit anyway. Do you have a strong reason to mix the user and

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Michal Hocko
On Wed 15-08-12 13:33:55, Glauber Costa wrote: [...] This can be quite confusing. I am still not sure whether we should mix the two things together. If somebody wants to limit the kernel memory he has to touch the other limit anyway. Do you have a strong reason to mix the user and

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa
On 08/15/2012 04:39 PM, Michal Hocko wrote: On Wed 15-08-12 13:33:55, Glauber Costa wrote: [...] This can be quite confusing. I am still not sure whether we should mix the two things together. If somebody wants to limit the kernel memory he has to touch the other limit anyway. Do you have

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Michal Hocko
On Wed 15-08-12 12:12:23, James Bottomley wrote: On Wed, 2012-08-15 at 13:33 +0400, Glauber Costa wrote: This can be quite confusing. I am still not sure whether we should mix the two things together. If somebody wants to limit the kernel memory he has to touch the other limit

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Michal Hocko
On Wed 15-08-12 16:53:40, Glauber Costa wrote: [...] This doesn't check for the hierachy so kmem_accounted might not be in sync with it's parents. mem_cgroup_create (below) needs to copy kmem_accounted down from the parent and the above needs to check if this is a similar dance like

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa
On 08/15/2012 05:02 PM, Michal Hocko wrote: On Wed 15-08-12 16:53:40, Glauber Costa wrote: [...] This doesn't check for the hierachy so kmem_accounted might not be in sync with it's parents. mem_cgroup_create (below) needs to copy kmem_accounted down from the parent and the above needs to

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Michal Hocko
On Wed 15-08-12 13:42:24, Glauber Costa wrote: [...] + + ret = 0; + + if (!memcg) + return ret; + + _memcg = memcg; + ret = __mem_cgroup_try_charge(NULL, gfp, delta / PAGE_SIZE, + _memcg, may_oom); This is really dangerous because atomic allocation which

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Michal Hocko
On Wed 15-08-12 17:04:31, Glauber Costa wrote: On 08/15/2012 05:02 PM, Michal Hocko wrote: On Wed 15-08-12 16:53:40, Glauber Costa wrote: [...] This doesn't check for the hierachy so kmem_accounted might not be in sync with it's parents. mem_cgroup_create (below) needs to copy

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread James Bottomley
On Wed, 2012-08-15 at 14:55 +0200, Michal Hocko wrote: On Wed 15-08-12 12:12:23, James Bottomley wrote: On Wed, 2012-08-15 at 13:33 +0400, Glauber Costa wrote: This can be quite confusing. I am still not sure whether we should mix the two things together. If somebody wants to limit

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa
On 08/15/2012 05:26 PM, Michal Hocko wrote: On Wed 15-08-12 17:04:31, Glauber Costa wrote: On 08/15/2012 05:02 PM, Michal Hocko wrote: On Wed 15-08-12 16:53:40, Glauber Costa wrote: [...] This doesn't check for the hierachy so kmem_accounted might not be in sync with it's parents.

[Devel] Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-15 Thread Glauber Costa
As for the type, do you think using struct mem_cgroup would be less confusing? Yes and returning the mem_cgroup or NULL instead of bool. Ok. struct mem_cgroup it is. The placeholder is there, but it is later patched to the final thing. With that explained, if you want me to change it

[Devel] Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-15 Thread Glauber Costa
On 08/15/2012 05:22 PM, Mel Gorman wrote: I believe it to be a better and less complicated approach then letting a page appear and then charging it. Besides being consistent with the rest of memcg, it won't create unnecessary disturbance in the page allocator when the allocation is to

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Glauber Costa
On 08/15/2012 05:09 PM, Michal Hocko wrote: On Wed 15-08-12 13:42:24, Glauber Costa wrote: [...] + + ret = 0; + + if (!memcg) + return ret; + + _memcg = memcg; + ret = __mem_cgroup_try_charge(NULL, gfp, delta / PAGE_SIZE, + _memcg, may_oom); This is really dangerous

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Michal Hocko
On Wed 15-08-12 17:31:24, Glauber Costa wrote: On 08/15/2012 05:26 PM, Michal Hocko wrote: On Wed 15-08-12 17:04:31, Glauber Costa wrote: On 08/15/2012 05:02 PM, Michal Hocko wrote: On Wed 15-08-12 16:53:40, Glauber Costa wrote: [...] This doesn't check for the hierachy so

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa
OK, I missed an important point that kmem_accounted is not exported to the userspace (I thought it would be done later in the series) which is not the case so actually nobody get's confused by the inconsistency because it is about RESOURCE_MAX which they see in both cases. Sorry about the

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Michal Hocko
On Wed 15-08-12 18:01:51, Glauber Costa wrote: On 08/15/2012 05:09 PM, Michal Hocko wrote: On Wed 15-08-12 13:42:24, Glauber Costa wrote: [...] + +ret = 0; + +if (!memcg) +return ret; + +_memcg = memcg; +ret =

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Glauber Costa
I see now, you seem to be right. No I am not because it seems that I am really blind these days... We were doing this in mem_cgroup_do_charge for ages: if (!(gfp_mask __GFP_WAIT)) return CHARGE_WOULDBLOCK; /me goes to hide and get with further feedback with a

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Christoph Lameter
On Wed, 15 Aug 2012, Michal Hocko wrote: That is not what the kernel does, in general. We assume that if he wants that memory and we can serve it, we should. Also, not all kernel memory is unreclaimable. We can shrink the slabs, for instance. Ying Han claims she has patches for that

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa
On 08/15/2012 06:47 PM, Christoph Lameter wrote: On Wed, 15 Aug 2012, Michal Hocko wrote: That is not what the kernel does, in general. We assume that if he wants that memory and we can serve it, we should. Also, not all kernel memory is unreclaimable. We can shrink the slabs, for instance.

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Greg Thelen
On Wed, Aug 15 2012, Christoph Lameter wrote: On Wed, 15 Aug 2012, Michal Hocko wrote: That is not what the kernel does, in general. We assume that if he wants that memory and we can serve it, we should. Also, not all kernel memory is unreclaimable. We can shrink the slabs, for instance.

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Christoph Lameter
On Wed, 15 Aug 2012, Glauber Costa wrote: On 08/15/2012 06:47 PM, Christoph Lameter wrote: On Wed, 15 Aug 2012, Michal Hocko wrote: That is not what the kernel does, in general. We assume that if he wants that memory and we can serve it, we should. Also, not all kernel memory is

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Christoph Lameter
On Wed, 15 Aug 2012, Greg Thelen wrote: You can already shrink the reclaimable slabs (dentries / inodes) via calls to the subsystem specific shrinkers. Did Ying Han do anything to go beyond that? cc: Ying The Google shrinker patches enhance prune_dcache_sb() to limit dentry pressure to

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa
On 08/15/2012 07:34 PM, Christoph Lameter wrote: On Wed, 15 Aug 2012, Glauber Costa wrote: On 08/15/2012 06:47 PM, Christoph Lameter wrote: On Wed, 15 Aug 2012, Michal Hocko wrote: That is not what the kernel does, in general. We assume that if he wants that memory and we can serve it, we

[Devel] [RFC PATCH 0/5] net: socket bind to file descriptor introduced

2012-08-15 Thread Stanislav Kinsbursky
This patch set introduces new socket operation and new system call: sys_fbind(), which allows to bind socket to opened file. File to bind to can be created by sys_mknod(S_IFSOCK) and opened by open(O_PATH). This system call is especially required for UNIX sockets, which has name lenght

[Devel] [RFC PATCH 2/5] net: split unix_bind()

2012-08-15 Thread Stanislav Kinsbursky
This patch moves UNIX socket insert into separated function, because this code will be used for unix_fbind() too. Signed-off-by: Stanislav Kinsbursky skinsbur...@parallels.com --- net/unix/af_unix.c | 52 +--- 1 files changed, 29 insertions(+),

[Devel] [RFC PATCH 1/5] net: cleanup unix_bind() a little

2012-08-15 Thread Stanislav Kinsbursky
This will simplify further changes for unix_fbind(). --- net/unix/af_unix.c | 12 +--- 1 files changed, 5 insertions(+), 7 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 641f2e4..bc90ddb 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -880,10 +880,8

[Devel] [RFC PATCH 3/5] net: new protocol operation fbind() introduced

2012-08-15 Thread Stanislav Kinsbursky
This operation is used to bind socket to specified file. Signed-off-by: Stanislav Kinsbursky skinsbur...@parallels.com --- include/linux/net.h |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/include/linux/net.h b/include/linux/net.h index e9ac2df..843cb75 100644 ---

[Devel] [RFC PATCH 4/5] net: fbind() for unix sockets protocol operations introduced

2012-08-15 Thread Stanislav Kinsbursky
Path for unix_address is taken from passed file. File inode have to be socket. Since no sunaddr is present, addr-name is constructed at the place. It obviously means, then path name can be truncated is it's longer then UNIX_MAX_PATH. Signed-off-by: Stanislav Kinsbursky skinsbur...@parallels.com

[Devel] [RFC PATCH 5/5] syscall: sys_fbind() introduced

2012-08-15 Thread Stanislav Kinsbursky
This syscall allows to bind socket to specified file descriptor. Descriptor can be gained by simple open with O_PATH flag. Socket node can be created by sys_mknod(). Signed-off-by: Stanislav Kinsbursky skinsbur...@parallels.com --- arch/x86/syscalls/syscall_32.tbl |1 +

[Devel] Re: [RFC PATCH 5/5] syscall: sys_fbind() introduced

2012-08-15 Thread H. Peter Anvin
On 08/15/2012 09:22 AM, Stanislav Kinsbursky wrote: This syscall allows to bind socket to specified file descriptor. Descriptor can be gained by simple open with O_PATH flag. Socket node can be created by sys_mknod(). Signed-off-by: Stanislav Kinsbursky skinsbur...@parallels.com ---

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Greg Thelen
On Wed, Aug 15 2012, Glauber Costa wrote: On 08/14/2012 10:58 PM, Greg Thelen wrote: On Mon, Aug 13 2012, Glauber Costa wrote: +WARN_ON(mem_cgroup_is_root(memcg)); +size = (1 order) PAGE_SHIFT; +memcg_uncharge_kmem(memcg, size); +mem_cgroup_put(memcg);

[Devel] Re: [RFC PATCH 5/5] syscall: sys_fbind() introduced

2012-08-15 Thread Stanislav Kinsbursky
15.08.2012 20:30, H. Peter Anvin пишет: On 08/15/2012 09:22 AM, Stanislav Kinsbursky wrote: This syscall allows to bind socket to specified file descriptor. Descriptor can be gained by simple open with O_PATH flag. Socket node can be created by sys_mknod(). Signed-off-by: Stanislav Kinsbursky

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Glauber Costa
On 08/15/2012 08:38 PM, Greg Thelen wrote: On Wed, Aug 15 2012, Glauber Costa wrote: On 08/14/2012 10:58 PM, Greg Thelen wrote: On Mon, Aug 13 2012, Glauber Costa wrote: + WARN_ON(mem_cgroup_is_root(memcg)); + size = (1 order) PAGE_SHIFT; + memcg_uncharge_kmem(memcg,

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Greg Thelen
On Wed, Aug 15 2012, Glauber Costa wrote: On 08/15/2012 08:38 PM, Greg Thelen wrote: On Wed, Aug 15 2012, Glauber Costa wrote: On 08/14/2012 10:58 PM, Greg Thelen wrote: On Mon, Aug 13 2012, Glauber Costa wrote: + WARN_ON(mem_cgroup_is_root(memcg)); + size = (1 order)

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Christoph Lameter
On Wed, 15 Aug 2012, Glauber Costa wrote: Remember we copy over the metadata and create copies of the caches per-memcg. Therefore, a dentry belongs to a memcg if it was allocated from the slab pertaining to that memcg. The dentry could be used by other processes in the system though. F.e.

[Devel] Re: [RFC PATCH 0/5] net: socket bind to file descriptor introduced

2012-08-15 Thread H. Peter Anvin
On 08/15/2012 09:52 AM, Ben Pfaff wrote: Stanislav Kinsbursky skinsbur...@parallels.com writes: This system call is especially required for UNIX sockets, which has name lenght limitation. The worst of the name length limitations can be worked around by opening the directory where the

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Ying Han
On Wed, Aug 15, 2012 at 5:39 AM, Michal Hocko mho...@suse.cz wrote: On Wed 15-08-12 13:33:55, Glauber Costa wrote: [...] This can be quite confusing. I am still not sure whether we should mix the two things together. If somebody wants to limit the kernel memory he has to touch the other

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa
On 08/15/2012 10:01 PM, Ying Han wrote: On Wed, Aug 15, 2012 at 5:39 AM, Michal Hocko mho...@suse.cz wrote: On Wed 15-08-12 13:33:55, Glauber Costa wrote: [...] This can be quite confusing. I am still not sure whether we should mix the two things together. If somebody wants to limit the

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Ying Han
On Wed, Aug 15, 2012 at 8:11 AM, Glauber Costa glom...@parallels.com wrote: On 08/15/2012 06:47 PM, Christoph Lameter wrote: On Wed, 15 Aug 2012, Michal Hocko wrote: That is not what the kernel does, in general. We assume that if he wants that memory and we can serve it, we should. Also, not

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Ying Han
On Wed, Aug 15, 2012 at 8:34 AM, Christoph Lameter c...@linux.com wrote: On Wed, 15 Aug 2012, Glauber Costa wrote: On 08/15/2012 06:47 PM, Christoph Lameter wrote: On Wed, 15 Aug 2012, Michal Hocko wrote: That is not what the kernel does, in general. We assume that if he wants that

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Glauber Costa
On 08/15/2012 10:25 PM, Christoph Lameter wrote: On Wed, 15 Aug 2012, Ying Han wrote: How can you figure out which objects belong to which memcg? The ownerships of dentries and inodes is a dubious concept already. I figured it out based on the kernel slab accounting.

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Glauber Costa
On 08/15/2012 09:12 PM, Greg Thelen wrote: On Wed, Aug 15 2012, Glauber Costa wrote: On 08/15/2012 08:38 PM, Greg Thelen wrote: On Wed, Aug 15 2012, Glauber Costa wrote: On 08/14/2012 10:58 PM, Greg Thelen wrote: On Mon, Aug 13 2012, Glauber Costa wrote: +

[Devel] Re: [RFC PATCH 0/5] net: socket bind to file descriptor introduced

2012-08-15 Thread Eric W. Biederman
Stanislav Kinsbursky skinsbur...@parallels.com writes: This patch set introduces new socket operation and new system call: sys_fbind(), which allows to bind socket to opened file. File to bind to can be created by sys_mknod(S_IFSOCK) and opened by open(O_PATH). This system call is

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-15 Thread Ying Han
On Tue, Aug 14, 2012 at 9:21 AM, Michal Hocko mho...@suse.cz wrote: On Thu 09-08-12 17:01:12, Glauber Costa wrote: This patch adds the basic infrastructure for the accounting of the slab caches. To control that, the following files are created: * memory.kmem.usage_in_bytes *

[Devel] Re: [RFC PATCH 0/5] net: socket bind to file descriptor introduced

2012-08-15 Thread H. Peter Anvin
On 08/15/2012 12:49 PM, Eric W. Biederman wrote: There is also the trick of getting a shorter directory name using /proc/self/fd if you are threaded and can't change the directory. The obvious choices at this point are - Teach bind and connect and af_unix sockets to take longer AF_UNIX

[Devel] Re: [RFC PATCH 0/5] net: socket bind to file descriptor introduced

2012-08-15 Thread Eric W. Biederman
H. Peter Anvin h...@zytor.com writes: On 08/15/2012 12:49 PM, Eric W. Biederman wrote: There is also the trick of getting a shorter directory name using /proc/self/fd if you are threaded and can't change the directory. The obvious choices at this point are - Teach bind and connect and

[Devel] Re: [RFC PATCH 0/5] net: socket bind to file descriptor introduced

2012-08-15 Thread Eric W. Biederman
Stanislav Kinsbursky skinsbur...@parallels.com writes: This patch set introduces new socket operation and new system call: sys_fbind(), which allows to bind socket to opened file. File to bind to can be created by sys_mknod(S_IFSOCK) and opened by open(O_PATH). This system call is

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-15 Thread Greg Thelen
On Wed, Aug 15 2012, Glauber Costa wrote: On 08/15/2012 09:12 PM, Greg Thelen wrote: On Wed, Aug 15 2012, Glauber Costa wrote: On 08/15/2012 08:38 PM, Greg Thelen wrote: On Wed, Aug 15 2012, Glauber Costa wrote: On 08/14/2012 10:58 PM, Greg Thelen wrote: On Mon, Aug 13 2012, Glauber