[Devel] [RFC PATCH 1/2] unix sockets: add ability for search for peer from passed root

2012-08-10 Thread Stanislav Kinsbursky
This helper is used stream sockets yet. All is simple: if non-NULL struct path was passed to unix_find_other(), then vfs_path_lookup() is called instead of kern_path(). Signed-off-by: Stanislav Kinsbursky skinsbur...@parallels.com --- include/net/af_unix.h |2 ++ net/unix/af_unix.c| 25

[Devel] [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread Stanislav Kinsbursky
Today, there is a problem in connecting of local SUNRPC thansports. These transports uses UNIX sockets and connection itself is done by rpciod workqueue. But UNIX sockets lookup is done in context of process file system root. I.e. all local thunsports are connecting in rpciod context. This works

[Devel] [RFC PATCH 2/2] SUNRPC: connect local transports with unix_stream_connect_root() helper

2012-08-10 Thread Stanislav Kinsbursky
Today, there is a problem in connecting of local SUNRPC thansports. These transports uses UNIX sockets and connection itself is done by rpciod workqueue. But UNIX sockets lookup is done in context of process file system root. I.e. all local thunsports are connecting in rpciod context. This works

[Devel] [PATCH v3 00/10] IPC: checkpoint/restore in userspace enhancements

2012-08-10 Thread Stanislav Kinsbursky
v3: 1) Copy messages to user-space under spinlock was replaced by allocation of dummy message before queue lock and then copy of desired message to the dummy one instead of unlinking it from queue list. I.e. the message queue copy logic was changed: messages can be retrived one by one (instead of

[Devel] [PATCH v3 01/10] ipc: remove forced assignment of selected message

2012-08-10 Thread Stanislav Kinsbursky
This is a cleanup patch. The assignment is redundant. --- ipc/msg.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/ipc/msg.c b/ipc/msg.c index 7385de2..f3bfbb8 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -787,7 +787,6 @@ long do_msgrcv(int msqid, long *pmtype, void __user

[Devel] [PATCH v3 02/10] ipc: use key as id functionality for resource get system call introduced

2012-08-10 Thread Stanislav Kinsbursky
This patch introduces new IPC resource get request flag IPC_PRESET, which should be interpreted as a request to try to allocate IPC slot with number, starting from value resented by key. IOW, kernel will try allocate new segment in specified slot. If slot is not emply, them -EEXIST returned.

[Devel] [PATCH v3 03/10] ipc: segment key change helper introduced

2012-08-10 Thread Stanislav Kinsbursky
This patch introduces existent segment key changing infrastructure. New function ipc_update_key() can be used change segment key, cuid, cgid values. It checks for that new key is not used (except IPC_PRIVATE) prior to set it on existent. To make this possible, added copying of this fields from

[Devel] [PATCH v3 04/10] ipc: add new SHM_SET command for sys_shmctl() call

2012-08-10 Thread Stanislav Kinsbursky
New SHM_SET command will be interpreted exactly as IPC_SET, but also will update key, cuid and cgid values. IOW, it allows to change existent key value. The fact, that key is not used is checked before update. Otherwise -EEXIST is returned. Signed-off-by: Stanislav Kinsbursky

[Devel] [PATCH v3 06/10] ipc: add new SEM_SET command for sys_semctl() call

2012-08-10 Thread Stanislav Kinsbursky
New SEM_SET command will be interpreted exactly as IPC_SET, but also will update key, cuid and cgid values. IOW, it allows to change existent key value. The fact, that key is not used is checked before update. Otherwise -EEXIST is returned. Signed-off-by: Stanislav Kinsbursky

[Devel] [PATCH v3 07/10] IPC: message queue receive cleanup

2012-08-10 Thread Stanislav Kinsbursky
This patch moves all message related manipulation into one function msg_fill(). Actually, two functions because of the compat one. Signed-off-by: Stanislav Kinsbursky skinsbur...@parallels.com Signed-off-by: Cyrill Gorcunov gorcu...@openvz.org Conflicts: arch/tile/kernel/compat.c

[Devel] [PATCH v3 05/10] ipc: add new MSG_SET command for sys_msgctl() call

2012-08-10 Thread Stanislav Kinsbursky
New MSG_SET command will be interpreted exactly as IPC_SET, but also will update key, cuid and cgid values. IOW, it allows to change existent key value. The fact, that key is not used is checked before update. Otherwise -EEXIST is returned. Signed-off-by: Stanislav Kinsbursky

[Devel] [PATCH v3 08/10] IPC: message queue copy feature introduced

2012-08-10 Thread Stanislav Kinsbursky
This patch is required for checkpoint/restore in userspace. IOW, c/r requires some way to get all pending IPC messages without deleting them from the queue (checkpoint can fail and in this case tasks will be resumed, so queue have to be valid). To achive this, new operation flag MSG_COPY for

[Devel] [PATCH v3 09/10] ipc: add new MSG_SET_COPY command for sys_msgctl() call

2012-08-10 Thread Stanislav Kinsbursky
New MSG_SET_COPY allows to set specified queue copy counter to passed value. Passed struct msqid_ds *buf interpreted as pointer to unsigned int in this case. --- include/linux/msg.h |1 + ipc/compat.c|3 +++ ipc/msg.c | 18 ++ 3 files changed, 22

[Devel] [PATCH v3 10/10] test: IPC message queue migration test

2012-08-10 Thread Stanislav Kinsbursky
This test is a part of CRIU development test suit. --- tools/testing/selftests/ipc/msgque.c | 151 ++ 1 files changed, 151 insertions(+), 0 deletions(-) create mode 100644 tools/testing/selftests/ipc/msgque.c diff --git a/tools/testing/selftests/ipc/msgque.c

[Devel] Re: [PATCH v2 01/11] memcg: Make it possible to use the stock for more than one page.

2012-08-10 Thread Michal Hocko
On Thu 09-08-12 17:01:09, Glauber Costa wrote: From: Suleiman Souhlal ssouh...@freebsd.org We currently have a percpu stock cache scheme that charges one page at a time from memcg-res, the user counter. When the kernel memory controller comes into play, we'll need to charge more than that.

[Devel] Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Michal Hocko
On Thu 09-08-12 17:01:10, Glauber Costa wrote: [...] @@ -2317,18 +2318,18 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, } else mem_over_limit = mem_cgroup_from_res_counter(fail_res, res); /* - * nr_pages can be either a huge page

[Devel] Re: [PATCH v2 03/11] memcg: change defines to an enum

2012-08-10 Thread Michal Hocko
On Thu 09-08-12 17:01:11, Glauber Costa wrote: This is just a cleanup patch for clarity of expression. In earlier submissions, people asked it to be in a separate patch, so here it is. [ v2: use named enum as type throughout the file as well ] Signed-off-by: Glauber Costa

[Devel] Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/11 0:42), Michal Hocko wrote: On Thu 09-08-12 17:01:10, Glauber Costa wrote: [...] @@ -2317,18 +2318,18 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, } else mem_over_limit = mem_cgroup_from_res_counter(fail_res, res); /* -

[Devel] Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/09 22:01), Glauber Costa wrote: This patch adds the basic infrastructure for the accounting of the slab caches. To control that, the following files are created: * memory.kmem.usage_in_bytes * memory.kmem.limit_in_bytes * memory.kmem.failcnt *

[Devel] Re: [PATCH v2 05/11] Add a __GFP_KMEMCG flag

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/09 22:01), Glauber Costa wrote: This flag is used to indicate to the callees that this allocation is a kernel allocation in process context, and should be accounted to current's memcg. It takes numerical place of the of the recently removed __GFP_NO_KSWAPD. Signed-off-by: Glauber

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/09 22:01), Glauber Costa wrote: This patch introduces infrastructure for tracking kernel memory pages to a given memcg. This will happen whenever the caller includes the flag __GFP_KMEMCG flag, and the task belong to a memcg other than the root. In memcontrol.h those functions are

[Devel] Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Michal Hocko
On Sat 11-08-12 01:49:25, KAMEZAWA Hiroyuki wrote: (2012/08/11 0:42), Michal Hocko wrote: On Thu 09-08-12 17:01:10, Glauber Costa wrote: [...] @@ -2317,18 +2318,18 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, } else mem_over_limit =

[Devel] Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Michal Hocko
On Thu 09-08-12 17:01:10, Glauber Costa wrote: [...] For now retry up to COSTLY_ORDER (as page_alloc.c does) and make sure not to do it if __GFP_NORETRY. Who is using __GFP_NORETRY for user backed memory (except for hugetlb which has its own controller)? -- Michal Hocko SUSE Labs

[Devel] Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/09 22:01), Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem

[Devel] Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-10 Thread Greg Thelen
On Thu, Aug 09 2012, Glauber Costa wrote: When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a

[Devel] Re: [PATCH v2 09/11] memcg: propagate kmem limiting information to children

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/09 22:01), Glauber Costa wrote: The current memcg slab cache management fails to present satisfatory hierarchical behavior in the following scenario: - /cgroups/memory/A/B/C * kmem limit set at A, * A and B have no tasks, * span a new task in in C. Because kmem_accounted is a

[Devel] Re: [PATCH v2 11/11] protect architectures where THREAD_SIZE = PAGE_SIZE against fork bombs

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/09 22:01), Glauber Costa wrote: Because those architectures will draw their stacks directly from the page allocator, rather than the slab cache, we can directly pass __GFP_KMEMCG flag, and issue the corresponding free_pages. This code path is taken when the architecture doesn't

[Devel] Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/11 2:28), Michal Hocko wrote: On Sat 11-08-12 01:49:25, KAMEZAWA Hiroyuki wrote: (2012/08/11 0:42), Michal Hocko wrote: On Thu 09-08-12 17:01:10, Glauber Costa wrote: [...] @@ -2317,18 +2318,18 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, }

[Devel] Re: [RFC PATCH 1/2] unix sockets: add ability for search for peer from passed root

2012-08-10 Thread J. Bruce Fields
On Fri, Aug 10, 2012 at 04:57:30PM +0400, Stanislav Kinsbursky wrote: This helper is used stream sockets yet. All is simple: if non-NULL struct path was passed to unix_find_other(), then vfs_path_lookup() is called instead of kern_path(). I'm having some trouble parsing the changelog. Maybe

[Devel] Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread H. Peter Anvin
On 08/10/2012 05:57 AM, Stanislav Kinsbursky wrote: Today, there is a problem in connecting of local SUNRPC thansports. These transports uses UNIX sockets and connection itself is done by rpciod workqueue. But UNIX sockets lookup is done in context of process file system root. I.e. all local

[Devel] Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread Alan Cox
On that whole subject... Do we need a Unix domain socket equivalent to openat()? I don't think so. The name is just a file system indexing trick, it's not really the socket proper. It's little more than ascii string with permissions attached - indeed we also support an abstract name space

[Devel] Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread H. Peter Anvin
On 08/10/2012 11:26 AM, Alan Cox wrote: On that whole subject... Do we need a Unix domain socket equivalent to openat()? I don't think so. The name is just a file system indexing trick, it's not really the socket proper. It's little more than ascii string with permissions attached - indeed

[Devel] Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread Alan Cox
AF_UNIX between roots raises some interesting semantic questions when you begin passing file descriptors down them as well. Why is that? A file descriptor carries all that information with it... Things like fchdir(). It's not a machine breaking problem but for containers as opposed to

[Devel] Re: [RFC PATCH 1/2] unix sockets: add ability for search for peer from passed root

2012-08-10 Thread Stanislav Kinsbursky
10.08.2012 22:10, J. Bruce Fields пишет: On Fri, Aug 10, 2012 at 04:57:30PM +0400, Stanislav Kinsbursky wrote: This helper is used stream sockets yet. All is simple: if non-NULL struct path was passed to unix_find_other(), then vfs_path_lookup() is called instead of kern_path(). I'm having

[Devel] Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread H. Peter Anvin
On 08/10/2012 11:40 AM, Alan Cox wrote: Agreed on open() for sockets.. the lack of open is a Berklix derived pecularity of the interface. It would equally be useful to be able to open /dev/socket/ipv4/1.2.3.4/1135 and the like for scripts and stuff That needs VFS changes however so you can

[Devel] Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread Stanislav Kinsbursky
10.08.2012 22:15, H. Peter Anvin пишет: On 08/10/2012 05:57 AM, Stanislav Kinsbursky wrote: Today, there is a problem in connecting of local SUNRPC thansports. These transports uses UNIX sockets and connection itself is done by rpciod workqueue. But UNIX sockets lookup is done in context of

[Devel] Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Michal Hocko
On Fri 10-08-12 19:30:00, Michal Hocko wrote: On Thu 09-08-12 17:01:10, Glauber Costa wrote: [...] For now retry up to COSTLY_ORDER (as page_alloc.c does) and make sure not to do it if __GFP_NORETRY. Who is using __GFP_NORETRY for user backed memory (except for hugetlb which has its own

[Devel] Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Michal Hocko
On Thu 09-08-12 17:01:10, Glauber Costa wrote: From: Suleiman Souhlal ssouh...@freebsd.org mem_cgroup_do_charge() was written before kmem accounting, and expects three cases: being called for 1 page, being called for a stock of 32 pages, or being called for a hugepage. If we call for 2 or 3

[Devel] Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread J. Bruce Fields
On Fri, Aug 10, 2012 at 07:26:28PM +0100, Alan Cox wrote: On that whole subject... Do we need a Unix domain socket equivalent to openat()? I don't think so. The name is just a file system indexing trick, it's not really the socket proper. It's little more than ascii string with

[Devel] Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread Alan Cox
On Fri, 10 Aug 2012 15:11:50 -0400 J. Bruce Fields bfie...@fieldses.org wrote: On Fri, Aug 10, 2012 at 07:26:28PM +0100, Alan Cox wrote: On that whole subject... Do we need a Unix domain socket equivalent to openat()? I don't think so. The name is just a file system indexing

[Devel] Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread H. Peter Anvin
On 08/10/2012 12:28 PM, Alan Cox wrote: Explicitly for Linux yes - this is not generally true of the AF_UNIX socket domain and even the permissions aspect isn't guaranteed to be supported on some BSD environments ! Yes, but let's worry about what the Linux behavior should be. The name is

[Devel] Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-10 Thread Greg Thelen
On Thu, Aug 09 2012, Glauber Costa wrote: This patch introduces infrastructure for tracking kernel memory pages to a given memcg. This will happen whenever the caller includes the flag __GFP_KMEMCG flag, and the task belong to a memcg other than the root. In memcontrol.h those functions are