Balbir Singh wrote:
>
> Original Message
> Subject: [PATCH -RSS 2/2] Fix limit check after reclaim
> Date: Mon, 04 Jun 2007 21:03:04 +0530
> From: Balbir Singh <[EMAIL PROTECTED]>
> To: Andrew Morton <[EMAIL PROTECTED]>
> CC: Linux Containers <[EMAIL PROTECTED]>,Balbir Si
On Tue, Jun 05, 2007 at 10:11:12AM +0400, Vasily Averin wrote:
> >>return d_splice_alias(inode, dentry);
> >> }
> > Seems reasonable. So this prevents the bad inodes from getting onto the
> > orphan list in the first place?
>
> make_bad_inode() is called from ext3_read_inode() that is called
Eric Sandeen wrote:
> Vasily Averin wrote:
>> Bad inode can live some time, ext3_unlink can add it to orphan list, but
>> ext3_delete_inode() do not deleted this inode from orphan list. As result
>> we can have orphan list corruption detected in ext3_destroy_inode().
>
> Ah, I see - so you have c
Eric Sandeen wrote:
> Vasily Averin wrote:
>> Customers claims to ext3-related errors, investigation showed that ext3
>> orphan list has been corrupted and have the reference to non-ext3 inode.
>> The following debug helps to understand the reasons of this issue.
>
> Vasily, does your customer hav
Andrew Morton wrote:
> On Mon, 04 Jun 2007 09:19:10 +0400 Vasily Averin <[EMAIL PROTECTED]> wrote:
>> diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
>> index 9bb046d..e3ac8c3 100644
>> --- a/fs/ext3/namei.c
>> +++ b/fs/ext3/namei.c
>> @@ -1019,6 +1019,11 @@ static struct dentry *ext3_lookup(struct
Like we discussed earlier and Pavel/others had pointed out,
proc_flush_task() in its current place in release_task() is
useless with the new pid namespace code, because task_pid()
for the task is already NULL before the call to proc_flush_task().
So as a simple change I tried to move proc_flush_t
On Jun 04, 2007 19:03 -0700, Andrew Morton wrote:
> What caused those inodes to be bad, anyway? Memory allocation failures?
This can happen if e.g. NFS has a stale file handle - it will look up
the inode by inum, but ext3_read_inode() will create a bad inode due to
i_nlink = 0.
Cheers, Andreas
On Mon, 04 Jun 2007 09:19:10 +0400 Vasily Averin <[EMAIL PROTECTED]> wrote:
> After ext3 orphan list check has been added into ext3_destroy_inode() (please
> see my previous patch) the following situation has been detected:
> EXT3-fs warning (device sda6): ext3_unlink: Deleting nonexistent file
On Mon, 04 Jun 2007 09:18:55 +0400 Vasily Averin <[EMAIL PROTECTED]> wrote:
> Customers claims to ext3-related errors, investigation showed that ext3
> orphan list has been corrupted and have the reference to non-ext3 inode. The
> following debug helps to understand the reasons of this issue.
>
Serge wrote:
> Odd, I thought rm -rf used to work in the past,
> but i'm likely wrong.
I'm pretty sure it never worked.
And I've probably tested it myself, every few months,
since the birth of cpusets, when I forget and type it
again, and then stare dumbly at the screen wondering
what all the com
Quoting Paul Menage ([EMAIL PROTECTED]):
> On 6/4/07, Serge E. Hallyn <[EMAIL PROTECTED]> wrote:
> >
> >2. I can't delete containers because of the files they contain, and
> >am not allowed to delete those files by hand.
> >
>
> You should be able to delete a container with rmdir as long as it's
>
Quoting Paul Menage ([EMAIL PROTECTED]):
> On 6/4/07, Serge E. Hallyn <[EMAIL PROTECTED]> wrote:
> >[EMAIL PROTECTED] root]# rm -rf /containers/1
>
> Just use "rmdir /containers/1" here.
Hmm. Ok, that works... Odd, I thought rm -rf used to work in the past,
but i'm likely wrong.
thanks,
-serge
> [EMAIL PROTECTED] root]# rm -rf /containers/1
No - not 'rm -fr'.
'rmdir'
Remove the cpuset directory, not start bottom up
trying to remove the files first.
The poor 'rm -fr' command doesn't understand the
rather odd nature of cpuset file systems, which
have all files coming and going automag
> Would it then make sense to just
> default to (parent_set - sibling_exclusive_set) for a new sibling's
> value?
Which could well be empty, which in turn puts one back in the position
of dealing with a newborn cpuset that is empty (of cpus or of memory),
or else it introduces a new and odd constr
On 6/4/07, Serge E. Hallyn <[EMAIL PROTECTED]> wrote:
[EMAIL PROTECTED] root]# rm -rf /containers/1
Just use "rmdir /containers/1" here.
Ah, I see the second time I typed 'ls /containers/1/tasks' instead of
cat. When I then used cat, the file was empty, and I got an oops just
like Pavel rep
Quoting Paul Menage ([EMAIL PROTECTED]):
> On 6/4/07, Paul Jackson <[EMAIL PROTECTED]> wrote:
> >
> >Yup - early in the life of cpusets, a created cpuset inherited the cpus
> >and mems of its parent. But that broke the exclusive property big
> >time. You will recall that a cpu_exclusive or mem_ex
Paul M wrote:
> Maybe we could make it a per-cpuset option whether children should
> inherit mems/cpus or not?
I suppose, if those needing inherited mems/cpus need it bad enough.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
On 6/4/07, Serge E. Hallyn <[EMAIL PROTECTED]> wrote:
2. I can't delete containers because of the files they contain, and
am not allowed to delete those files by hand.
You should be able to delete a container with rmdir as long as it's
not in use - its control files will get cleaned up automa
On 6/4/07, Paul Jackson <[EMAIL PROTECTED]> wrote:
Yup - early in the life of cpusets, a created cpuset inherited the cpus
and mems of its parent. But that broke the exclusive property big
time. You will recall that a cpu_exclusive or mem_exclusive cpuset
cannot overlap the cpus or memory, res
>From nobody Mon Sep 17 00:00:00 2001
From: Serge Hallyn <[EMAIL PROTECTED]>
Date: Wed, 28 Mar 2007 15:06:47 -0500
Subject: [PATCH 5/6] userns strict: hook ext2
Add a user namespace pointer to the ext2 superblock and inode.
Signed-off-by: Serge E. Hallyn <[EMAIL PROTECTED]>
---
fs/ext2/acl.c
>From nobody Mon Sep 17 00:00:00 2001
From: Cedric Le Goater <[EMAIL PROTECTED]>
Date: Thu, 5 Apr 2007 12:51:51 -0400
Subject: [PATCH 1/6] user namespace : add the framework
Add the user namespace struct and framework
Basically, it will allow a process to unshare its user_struct table, resetting
[ I've been sitting on this for some months, and am just dumping it so
people can talk if they like, maybe even build on the patchset by
adding support for more filesystems or implementing the keyring. Or
tell me how much the approach sucks. ]
First, I point out once more that the base user nam
>From nobody Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <[EMAIL PROTECTED]>
Date: Thu, 5 Apr 2007 14:02:09 -0400
Subject: [PATCH 3/6] user ns: add an inode user_ns pointer
Add a user namespace pointer to each inode. One user namespace is said
to own each inode. Each filesystem can fill these
>From nobody Mon Sep 17 00:00:00 2001
From: Serge Hallyn <[EMAIL PROTECTED]>
Date: Wed, 28 Mar 2007 13:11:19 -0500
Subject: [PATCH 6/6] userns strict: hook ext3
Add a user namespace pointer to the ext3 superblock and inode.
Signed-off-by: Serge E. Hallyn <[EMAIL PROTECTED]>
---
fs/ext3/acl.c
Hi Paul,
I've got two problems working with this patchset:
1. A task can't join a cpuset unless 'cpus' and 'mems' are set. These
don't seem to automatically inherit the parent's values. So when I do
mount -t container -o ns,cpuset nsproxy /containers
(unshare a namespace)
the
>From nobody Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <[EMAIL PROTECTED]>
Date: Thu, 5 Apr 2007 17:17:23 -0400
Subject: [PATCH 4/6] user ns: hook generic_permission()
Hook generic_permission() to check for user namespaces.
Also define task_ino_capable() which denies a capability
if the subje
>From nobody Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <[EMAIL PROTECTED]>
Date: Thu, 5 Apr 2007 13:00:47 -0400
Subject: [PATCH 2/6] user namespace : add unshare
Changelog: Fix !CONFIG_USER_NS clone with CLONE_NEWUSER so it returns -EINVAL
rather than 0, so that userspace knows they d
What you describe, Serge, sounds like semantics carried over from cpusets.
Serge wrote:
> A task can't join a cpuset unless 'cpus' and 'mems' are set.
Yup - don't want to run a task in a cpuset that lacks cpu, or lacks
memory. Hard to run without those.
> These don't seem to automatically inher
Sorry, didn't paste in my comment at the top that this is again not at
all for inclusion, and barely tested, but mainly to get comment i.e. on
the way the naming is done.
thanks,
-serge
Quoting Serge E. Hallyn ([EMAIL PROTECTED]):
> >From 190ea72d213393dd1440643b2b87b5b2128dff87 Mon Sep 17 00:00:
>From 190ea72d213393dd1440643b2b87b5b2128dff87 Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <[EMAIL PROTECTED]>
Date: Mon, 4 Jun 2007 14:18:52 -0400
Subject: [PATCH 1/1] containers: implement nsproxy containers subsystem
When a task enters a new namespace via a clone() or unshare(), a new
contai
Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---
Documentation/controller/rss.txt | 165 +++
1 file changed, 165 insertions(+)
diff -puN /dev/null Documentation/controller/rss.txt
--- /dev/null 2007-06-01 20:42:04.0 +0530
+++ linux-2.6.22-rc2-m
Kirill Korotaev wrote:
>> the results were also very reproducible but the profiling was too noisy.
>> we also changed the kernel. the previous pidns patchset was on a 2.6.21-mm2
>> and we ported it on a 2.6.22-rc1-mm1.
>
> If reproducible, then were they the same as Pavel posted?
>
>> but let me
This patch fixes the problem seen when a container goes over its limit, the
reclaim is unsuccessful and the application is terminated. The problem
is that all pages are by default added to the active list of the RSS
controller. When __isolate_lru_page() is called, it checks to see if
the list that
This patch modifies the reclaim behaviour such that before calling the
container out of memory routine, it checks if as a result of the reclaim
(even though pages might not be fully reclaimed), the resident set size
of the container decreased before declaring the container as out of memory
Signe
> the results were also very reproducible but the profiling was too noisy.
> we also changed the kernel. the previous pidns patchset was on a 2.6.21-mm2
> and we ported it on a 2.6.22-rc1-mm1.
If reproducible, then were they the same as Pavel posted?
> but let me remove some debugging options,
Cedric Le Goater wrote:
> Pavel Emelianov wrote:
>> Serge E. Hallyn wrote:
>>> Quoting Kirill Korotaev ([EMAIL PROTECTED]):
Cedric,
just a small note.
imho it is not correct to check performance with enabled debug in memory
allocator
since it can influence cache effic
Pavel Emelianov wrote:
> Serge E. Hallyn wrote:
>> Quoting Kirill Korotaev ([EMAIL PROTECTED]):
>>> Cedric,
>>>
>>> just a small note.
>>> imho it is not correct to check performance with enabled debug in memory
>>> allocator
>>> since it can influence cache efficiency much.
>>> In you case looks
Serge E. Hallyn wrote:
> Quoting Kirill Korotaev ([EMAIL PROTECTED]):
>> Cedric,
>>
>> just a small note.
>> imho it is not correct to check performance with enabled debug in memory
>> allocator
>> since it can influence cache efficiency much.
>> In you case looks like you have DEBUG_SLAB enabled.
Kirill Korotaev wrote:
> Cedric,
>
> just a small note.
> imho it is not correct to check performance with enabled debug in memory
> allocator
> since it can influence cache efficiency much.
> In you case looks like you have DEBUG_SLAB enabled.
you're right. i'll rerun and resend.
> Pavel will
Quoting Kirill Korotaev ([EMAIL PROTECTED]):
> Cedric,
>
> just a small note.
> imho it is not correct to check performance with enabled debug in memory
> allocator
> since it can influence cache efficiency much.
> In you case looks like you have DEBUG_SLAB enabled.
Hm, good point. Cedric, did
Cedric,
just a small note.
imho it is not correct to check performance with enabled debug in memory
allocator
since it can influence cache efficiency much.
In you case looks like you have DEBUG_SLAB enabled.
Pavel will recheck as well what influences on this particular test.
BTW, it is strange..
As described, pages are charged to their first touchers.
The first toucher is determined using pages' _mapcount
manipulations in rmap calls.
A page is charged in two stages:
1. preparation, in which the resource availability is checked.
This stage may lead to page reclamation, thus it is perfo
Implement try_to_free_pages_in_container() to free the
pages in container that has run out of memory.
The scan_control->isolate_pages() function is set to
isolate_pages_in_container() that isolates the container
pages only. The exported __isolate_lru_page() call
makes things look simpler than in t
When container is completely out of memory some tasks should die.
This is unfair to kill the current task, so a task with the largest
RSS is chosen and killed. The code re-uses current OOM killer
select_bad_process() for task selection.
Signed-off-by: Pavel Emelianov <[EMAIL PROTECTED]>
---
---
Pavel and all,
I've been profiling the different pidns patchsets to chase the perf
bottlenecks in the pidns patchset. As i was not getting accurate
profiling results with unixbench, I changed the benchmark to use the
nptl perf benchmark ingo used when he introduced the generic pidhash
back in
The core routines for tracking the page ownership, RSS subsystem
registration in the containers and the definition of the
rss_container struct as container subsystem combined with the
resource counter structure.
To make the whole set look more consistent the calls to the
reclamation code and oo
The core change is that the isolate_lru_pages() call is
replaced with struct scan_controll->isolate_pages() call.
Other changes include exporting __isolate_lru_page() for
per-container isolator and handling variable-to-pointer
changes in try_to_free_pages().
This makes possible to use different i
Naturally mm_struct determines the resource consumer in memory
accounting. So each mm_struct should have a pointer on container
it belongs to. When a new task is created its mm_struct is
assigned to the container this task belongs to.
include/linux/rss_container.h is added in this patch to make
th
Each page is supposed to have an owner - the container
that touched the page first. The owner stays alive during
the page lifetime even if the task that touched the page
dies or moves to another container.
This ownership is the forerunner for the "fair" page sharing
accounting, in which page has a
Introduce generic structures and routines for resource accounting.
Each resource accounting container is supposed to aggregate it,
container_subsystem_state and its resource-specific members within.
Signed-off-by: Pavel Emelianov <[EMAIL PROTECTED]>
---
diff -upr linux-2.6.22-rc2-mm1.orig/inclu
Adds RSS accounting and control within a container.
Changes from v3
- comments across the code
- git-bisect safe split
- lost places to move the page between active/inactive lists
Ported above Paul's containers V10 with fixes from Balbir.
RSS container includes the per-container RSS accountin
Alexey Dobriyan wrote:
> Wrong pointer was used as kmem_cache pointer.
>
> [Here /proc/slab_allocators appears as empty file, but it's just me, probably]
>
> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
Acked-by: Pavel Emelianov <[EMAIL PROTECTED]>
> ---
>
> mm/slab.c |2 +-
> 1 fil
Wrong pointer was used as kmem_cache pointer.
[Here /proc/slab_allocators appears as empty file, but it's just me, probably]
Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
---
mm/slab.c |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4401,7 +440
Hi Kirill,
imho there is no reason for not adding the patch to the git repository.
I've tested it for one week now and I'm getting no serious errors.
Mit freundlichen Grüßen / Best Regards
Christian Kaiser
--
IBM Deutschland Entwicklung GmbH
Open Systems Firmware Development
mail: [EMAIL PROTECTE
http://git.openvz.org/?p=linux-2.6.18-openvz;a=commit;h=cb649b7cede6764c00e256578dc3c7ad73c1b24c
Thanks,
Kirill
Christian Kaiser2 wrote:
> Hi Kirill,
>
> imho there is no reason for not adding the patch to the git repository.
> I've tested it for one week now and I'm getting no serious errors.
>
55 matches
Mail list logo