On Fri 08-02-13 17:29:18, Michal Hocko wrote:
[...]
> OK, I have checked the allocator slow path and you are right even
> GFP_KERNEL will not fail. This can lead to similar deadlocks - e.g.
> OOM killed task blocked on down_write(mmap_sem) while the page fault
> handler holding mmap_sem for
On Thu 07-02-13 20:27:00, Greg Thelen wrote:
> On Tue, Feb 05 2013, Michal Hocko wrote:
>
> > On Tue 05-02-13 10:09:57, Greg Thelen wrote:
> >> On Tue, Feb 05 2013, Michal Hocko wrote:
> >>
> >> > On Tue 05-02-13 08:48:23, Greg Thelen wrote:
> >> >> On Tue, Feb 05 2013, Michal Hocko wrote:
> >>
On Fri 08-02-13 10:40:13, KAMEZAWA Hiroyuki wrote:
> (2013/02/07 20:01), Kamezawa Hiroyuki wrote:
[...]
> >Hmm. do we need to increase the "limit" virtually at memcg oom until
> >the oom-killed process dies ?
>
> Here is my naive idea...
and the next step would be
On Fri 08-02-13 10:40:13, KAMEZAWA Hiroyuki wrote:
(2013/02/07 20:01), Kamezawa Hiroyuki wrote:
[...]
Hmm. do we need to increase the limit virtually at memcg oom until
the oom-killed process dies ?
Here is my naive idea...
and the next step would be
On Thu 07-02-13 20:27:00, Greg Thelen wrote:
On Tue, Feb 05 2013, Michal Hocko wrote:
On Tue 05-02-13 10:09:57, Greg Thelen wrote:
On Tue, Feb 05 2013, Michal Hocko wrote:
On Tue 05-02-13 08:48:23, Greg Thelen wrote:
On Tue, Feb 05 2013, Michal Hocko wrote:
On Tue 05-02-13
On Fri 08-02-13 17:29:18, Michal Hocko wrote:
[...]
OK, I have checked the allocator slow path and you are right even
GFP_KERNEL will not fail. This can lead to similar deadlocks - e.g.
OOM killed task blocked on down_write(mmap_sem) while the page fault
handler holding mmap_sem for reading
On Tue, Feb 05 2013, Michal Hocko wrote:
> On Tue 05-02-13 10:09:57, Greg Thelen wrote:
>> On Tue, Feb 05 2013, Michal Hocko wrote:
>>
>> > On Tue 05-02-13 08:48:23, Greg Thelen wrote:
>> >> On Tue, Feb 05 2013, Michal Hocko wrote:
>> >>
>> >> > On Tue 05-02-13 15:49:47, azurIt wrote:
>> >> >
(2013/02/07 21:31), Michal Hocko wrote:
On Thu 07-02-13 20:01:45, KAMEZAWA Hiroyuki wrote:
(2013/02/06 23:01), Michal Hocko wrote:
On Wed 06-02-13 02:17:21, azurIt wrote:
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full
(2013/02/07 20:01), Kamezawa Hiroyuki wrote:
(2013/02/06 23:01), Michal Hocko wrote:
On Wed 06-02-13 02:17:21, azurIt wrote:
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
Here is the log where OOM, again, killed
On Thu 07-02-13 20:01:45, KAMEZAWA Hiroyuki wrote:
> (2013/02/06 23:01), Michal Hocko wrote:
> >On Wed 06-02-13 02:17:21, azurIt wrote:
> >>>5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
> >>>mentioned in a follow up email. Here is the full patch:
> >>
> >>
> >>Here is the
(2013/02/06 23:01), Michal Hocko wrote:
On Wed 06-02-13 02:17:21, azurIt wrote:
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
Here is the log where OOM, again, killed MySQL server [search for "(mysqld)"]:
(2013/02/07 20:01), Kamezawa Hiroyuki wrote:
(2013/02/06 23:01), Michal Hocko wrote:
On Wed 06-02-13 02:17:21, azurIt wrote:
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
Here is the log where OOM, again, killed
(2013/02/07 21:31), Michal Hocko wrote:
On Thu 07-02-13 20:01:45, KAMEZAWA Hiroyuki wrote:
(2013/02/06 23:01), Michal Hocko wrote:
On Wed 06-02-13 02:17:21, azurIt wrote:
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full
On Tue, Feb 05 2013, Michal Hocko wrote:
On Tue 05-02-13 10:09:57, Greg Thelen wrote:
On Tue, Feb 05 2013, Michal Hocko wrote:
On Tue 05-02-13 08:48:23, Greg Thelen wrote:
On Tue, Feb 05 2013, Michal Hocko wrote:
On Tue 05-02-13 15:49:47, azurIt wrote:
[...]
Just to be sure -
(2013/02/06 23:01), Michal Hocko wrote:
On Wed 06-02-13 02:17:21, azurIt wrote:
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
Here is the log where OOM, again, killed MySQL server [search for (mysqld)]:
On Thu 07-02-13 20:01:45, KAMEZAWA Hiroyuki wrote:
(2013/02/06 23:01), Michal Hocko wrote:
On Wed 06-02-13 02:17:21, azurIt wrote:
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
Here is the log where OOM,
On Wed 06-02-13 15:01:19, Michal Hocko wrote:
> On Wed 06-02-13 02:17:21, azurIt wrote:
> > >5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
> > >mentioned in a follow up email. Here is the full patch:
> >
> >
> > Here is the log where OOM, again, killed MySQL server
On Wed 06-02-13 02:17:21, azurIt wrote:
> >5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
> >mentioned in a follow up email. Here is the full patch:
>
>
> Here is the log where OOM, again, killed MySQL server [search for "(mysqld)"]:
>
On Wed 06-02-13 02:17:21, azurIt wrote:
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
Here is the log where OOM, again, killed MySQL server [search for (mysqld)]:
http://www.watchdog.sk/lkml/oom_mysqld6
[...]
On Wed 06-02-13 15:01:19, Michal Hocko wrote:
On Wed 06-02-13 02:17:21, azurIt wrote:
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
Here is the log where OOM, again, killed MySQL server [search for
>5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
>mentioned in a follow up email. Here is the full patch:
Here is the log where OOM, again, killed MySQL server [search for "(mysqld)"]:
http://www.watchdog.sk/lkml/oom_mysqld6
azur
--
To unsubscribe from this list: send the
On Tue 05-02-13 10:09:57, Greg Thelen wrote:
> On Tue, Feb 05 2013, Michal Hocko wrote:
>
> > On Tue 05-02-13 08:48:23, Greg Thelen wrote:
> >> On Tue, Feb 05 2013, Michal Hocko wrote:
> >>
> >> > On Tue 05-02-13 15:49:47, azurIt wrote:
> >> > [...]
> >> >> Just to be sure - am i supposed to
On Tue, Feb 05 2013, Michal Hocko wrote:
> On Tue 05-02-13 08:48:23, Greg Thelen wrote:
>> On Tue, Feb 05 2013, Michal Hocko wrote:
>>
>> > On Tue 05-02-13 15:49:47, azurIt wrote:
>> > [...]
>> >> Just to be sure - am i supposed to apply this two patches?
>> >> http://watchdog.sk/lkml/patches/
On Tue 05-02-13 08:48:23, Greg Thelen wrote:
> On Tue, Feb 05 2013, Michal Hocko wrote:
>
> > On Tue 05-02-13 15:49:47, azurIt wrote:
> > [...]
> >> Just to be sure - am i supposed to apply this two patches?
> >> http://watchdog.sk/lkml/patches/
> >
> > 5-memcg-fix-1.patch is not complete. It
On Tue, Feb 05 2013, Michal Hocko wrote:
> On Tue 05-02-13 15:49:47, azurIt wrote:
> [...]
>> Just to be sure - am i supposed to apply this two patches?
>> http://watchdog.sk/lkml/patches/
>
> 5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
> mentioned in a follow up email.
>5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
>mentioned in a follow up email.
ou, it wasn't complete? i used it in my last test.. sorry, i'm litte confused
by all those patches. will try it this night and report back.
--
To unsubscribe from this list: send the line
On Tue 05-02-13 15:49:47, azurIt wrote:
[...]
> I have another old problem which is maybe also related to this. I
> wasn't connecting it with this before but now i'm not sure. Two of our
> servers, which are affected by this cgroup problem, are also randomly
> freezing completely (few times per
On Tue 05-02-13 15:49:47, azurIt wrote:
[...]
> Just to be sure - am i supposed to apply this two patches?
> http://watchdog.sk/lkml/patches/
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
---
>From
>Sorry, to get back to this that late but I was busy as hell since the
>beginning of the year.
Thank you for your time!
>Has the issue repeated since then?
Yes, it's happening all the time but meanwhile i wrote a script which is
monitoring the problem and killing freezed processes when it
On Fri 25-01-13 17:31:30, Michal Hocko wrote:
> On Fri 25-01-13 16:07:23, azurIt wrote:
> > Any news? Thnx!
>
> Sorry, but I didn't get to this one yet.
Sorry, to get back to this that late but I was busy as hell since the
beginning of the year.
Has the issue repeated since then?
You said you
On Fri 25-01-13 17:31:30, Michal Hocko wrote:
On Fri 25-01-13 16:07:23, azurIt wrote:
Any news? Thnx!
Sorry, but I didn't get to this one yet.
Sorry, to get back to this that late but I was busy as hell since the
beginning of the year.
Has the issue repeated since then?
You said you
Sorry, to get back to this that late but I was busy as hell since the
beginning of the year.
Thank you for your time!
Has the issue repeated since then?
Yes, it's happening all the time but meanwhile i wrote a script which is
monitoring the problem and killing freezed processes when it
On Tue 05-02-13 15:49:47, azurIt wrote:
[...]
Just to be sure - am i supposed to apply this two patches?
http://watchdog.sk/lkml/patches/
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
---
From
On Tue 05-02-13 15:49:47, azurIt wrote:
[...]
I have another old problem which is maybe also related to this. I
wasn't connecting it with this before but now i'm not sure. Two of our
servers, which are affected by this cgroup problem, are also randomly
freezing completely (few times per
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email.
ou, it wasn't complete? i used it in my last test.. sorry, i'm litte confused
by all those patches. will try it this night and report back.
--
To unsubscribe from this list: send the line
On Tue, Feb 05 2013, Michal Hocko wrote:
On Tue 05-02-13 15:49:47, azurIt wrote:
[...]
Just to be sure - am i supposed to apply this two patches?
http://watchdog.sk/lkml/patches/
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is
On Tue 05-02-13 08:48:23, Greg Thelen wrote:
On Tue, Feb 05 2013, Michal Hocko wrote:
On Tue 05-02-13 15:49:47, azurIt wrote:
[...]
Just to be sure - am i supposed to apply this two patches?
http://watchdog.sk/lkml/patches/
5-memcg-fix-1.patch is not complete. It doesn't contain the
On Tue, Feb 05 2013, Michal Hocko wrote:
On Tue 05-02-13 08:48:23, Greg Thelen wrote:
On Tue, Feb 05 2013, Michal Hocko wrote:
On Tue 05-02-13 15:49:47, azurIt wrote:
[...]
Just to be sure - am i supposed to apply this two patches?
http://watchdog.sk/lkml/patches/
On Tue 05-02-13 10:09:57, Greg Thelen wrote:
On Tue, Feb 05 2013, Michal Hocko wrote:
On Tue 05-02-13 08:48:23, Greg Thelen wrote:
On Tue, Feb 05 2013, Michal Hocko wrote:
On Tue 05-02-13 15:49:47, azurIt wrote:
[...]
Just to be sure - am i supposed to apply this two patches?
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
Here is the log where OOM, again, killed MySQL server [search for (mysqld)]:
http://www.watchdog.sk/lkml/oom_mysqld6
azur
--
To unsubscribe from this list: send the line
2 12:08
> > Predmet: Re: [PATCH for 3.2.34] memcg: do not trigger OOM from
> > add_to_page_cache_locked
> >
> > CC: linux-kernel@vger.kernel.org, linux...@kvack.org, "cgroups mailinglist"
> > , "KAMEZAWA Hiroyuki"
> > , "Johannes Weine
Any news? Thnx!
azur
__
> Od: "Michal Hocko"
> Komu: azurIt
> Dátum: 30.12.2012 12:08
> Predmet: Re: [PATCH for 3.2.34] memcg: do not trigger OOM from
> add_to_page_cache_locked
>
> CC: linu
Any news? Thnx!
azur
__
Od: Michal Hocko mho...@suse.cz
Komu: azurIt azu...@pobox.sk
Dátum: 30.12.2012 12:08
Predmet: Re: [PATCH for 3.2.34] memcg: do not trigger OOM from
add_to_page_cache_locked
CC: linux-kernel
for 3.2.34] memcg: do not trigger OOM from
add_to_page_cache_locked
CC: linux-kernel@vger.kernel.org, linux...@kvack.org, cgroups mailinglist
cgro...@vger.kernel.org, KAMEZAWA Hiroyuki
kamezawa.hir...@jp.fujitsu.com, Johannes Weiner han...@cmpxchg.org
On Sun 30-12-12 02:09:47, azurIt wrote
On Sun 30-12-12 02:09:47, azurIt wrote:
> >which suggests that the patch is incomplete and that I am blind :/
> >mem_cgroup_cache_charge calls __mem_cgroup_try_charge for the page cache
> >and that one doesn't check GFP_MEMCG_NO_OOM. So you need the following
> >follow-up patch on top of the one
On Sun 30-12-12 02:09:47, azurIt wrote:
which suggests that the patch is incomplete and that I am blind :/
mem_cgroup_cache_charge calls __mem_cgroup_try_charge for the page cache
and that one doesn't check GFP_MEMCG_NO_OOM. So you need the following
follow-up patch on top of the one you
>which suggests that the patch is incomplete and that I am blind :/
>mem_cgroup_cache_charge calls __mem_cgroup_try_charge for the page cache
>and that one doesn't check GFP_MEMCG_NO_OOM. So you need the following
>follow-up patch on top of the one you already have (which should catch
>all the
which suggests that the patch is incomplete and that I am blind :/
mem_cgroup_cache_charge calls __mem_cgroup_try_charge for the page cache
and that one doesn't check GFP_MEMCG_NO_OOM. So you need the following
follow-up patch on top of the one you already have (which should catch
all the
On Mon 24-12-12 14:38:50, azurIt wrote:
> >OK, good to hear and fingers crossed. I will try to get back to the
> >original problem and a better solution sometimes early next year when
> >all the things settle a bit.
>
>
> Btw, i noticed one more thing when problem is happening (=when any
>
On Mon 24-12-12 14:25:26, azurIt wrote:
> >OK, good to hear and fingers crossed. I will try to get back to the
> >original problem and a better solution sometimes early next year when
> >all the things settle a bit.
>
>
> Michal, problem, unfortunately, happened again :( twice. When it
>
On Mon 24-12-12 14:25:26, azurIt wrote:
OK, good to hear and fingers crossed. I will try to get back to the
original problem and a better solution sometimes early next year when
all the things settle a bit.
Michal, problem, unfortunately, happened again :( twice. When it
happened first
On Mon 24-12-12 14:38:50, azurIt wrote:
OK, good to hear and fingers crossed. I will try to get back to the
original problem and a better solution sometimes early next year when
all the things settle a bit.
Btw, i noticed one more thing when problem is happening (=when any
cgroup is
>OK, good to hear and fingers crossed. I will try to get back to the
>original problem and a better solution sometimes early next year when
>all the things settle a bit.
Btw, i noticed one more thing when problem is happening (=when any cgroup is
stucked), i fogot to mention it before, sorry :(
>OK, good to hear and fingers crossed. I will try to get back to the
>original problem and a better solution sometimes early next year when
>all the things settle a bit.
Michal, problem, unfortunately, happened again :( twice. When it happened first
time (two days ago) i don't want to believe
OK, good to hear and fingers crossed. I will try to get back to the
original problem and a better solution sometimes early next year when
all the things settle a bit.
Michal, problem, unfortunately, happened again :( twice. When it happened first
time (two days ago) i don't want to believe it
OK, good to hear and fingers crossed. I will try to get back to the
original problem and a better solution sometimes early next year when
all the things settle a bit.
Btw, i noticed one more thing when problem is happening (=when any cgroup is
stucked), i fogot to mention it before, sorry :( .
On Tue 18-12-12 15:22:23, azurIt wrote:
> >It should mitigate the problem. The real fix shouldn't be that specific
> >(as per discussion in other thread). The chance this will get upstream
> >is not big and that means that it will not get to the stable tree
> >either.
>
>
> OOM is no longer
>It should mitigate the problem. The real fix shouldn't be that specific
>(as per discussion in other thread). The chance this will get upstream
>is not big and that means that it will not get to the stable tree
>either.
OOM is no longer killing processes outside target cgroups, so everything
It should mitigate the problem. The real fix shouldn't be that specific
(as per discussion in other thread). The chance this will get upstream
is not big and that means that it will not get to the stable tree
either.
OOM is no longer killing processes outside target cgroups, so everything looks
On Tue 18-12-12 15:22:23, azurIt wrote:
It should mitigate the problem. The real fix shouldn't be that specific
(as per discussion in other thread). The chance this will get upstream
is not big and that means that it will not get to the stable tree
either.
OOM is no longer killing
On Mon 17-12-12 19:23:01, azurIt wrote:
> >[Ohh, I am really an idiot. I screwed the first patch]
> >- bool oom = true;
> >+ bool oom = !(gfp_mask | GFP_MEMCG_NO_OOM);
> >
> >Which obviously doesn't work. It should read !(gfp_mask _MEMCG_NO_OOM).
> > No idea how I could have missed
>[Ohh, I am really an idiot. I screwed the first patch]
>- bool oom = true;
>+ bool oom = !(gfp_mask | GFP_MEMCG_NO_OOM);
>
>Which obviously doesn't work. It should read !(gfp_mask _MEMCG_NO_OOM).
> No idea how I could have missed that. I am really sorry about that.
:D no problem :)
On Mon 17-12-12 02:34:30, azurIt wrote:
> >I would try to limit changes to minimum. So the original kernel you were
> >using + the first patch to prevent OOM from the write path + 2 debugging
> >patches.
>
>
> It didn't take off the whole system this time (but i was
> prepared to record a video
[Ohh, I am really an idiot. I screwed the first patch]
- bool oom = true;
+ bool oom = !(gfp_mask | GFP_MEMCG_NO_OOM);
Which obviously doesn't work. It should read !(gfp_mask GFP_MEMCG_NO_OOM).
No idea how I could have missed that. I am really sorry about that.
:D no problem :)
On Mon 17-12-12 19:23:01, azurIt wrote:
[Ohh, I am really an idiot. I screwed the first patch]
- bool oom = true;
+ bool oom = !(gfp_mask | GFP_MEMCG_NO_OOM);
Which obviously doesn't work. It should read !(gfp_mask GFP_MEMCG_NO_OOM).
No idea how I could have missed that. I am
On Mon 17-12-12 02:34:30, azurIt wrote:
I would try to limit changes to minimum. So the original kernel you were
using + the first patch to prevent OOM from the write path + 2 debugging
patches.
It didn't take off the whole system this time (but i was
prepared to record a video of console
>I would try to limit changes to minimum. So the original kernel you were
>using + the first patch to prevent OOM from the write path + 2 debugging
>patches.
It didn't take off the whole system this time (but i was prepared to record a
video of console ;) ), here it is:
I would try to limit changes to minimum. So the original kernel you were
using + the first patch to prevent OOM from the write path + 2 debugging
patches.
It didn't take off the whole system this time (but i was prepared to record a
video of console ;) ), here it is:
>I would try to limit changes to minimum. So the original kernel you were
>using + the first patch to prevent OOM from the write path + 2 debugging
>patches.
ok.
>But was it at least related to the debugging from the patch or it was
>rather a totally unrelated thing?
I wasn't reading it much
On Mon 10-12-12 11:18:17, azurIt wrote:
> >Hmm, this is _really_ surprising. The latest patch didn't add any new
> >logging actually. It just enahanced messages which were already printed
> >out previously + changed few functions to be not inlined so they show up
> >in the traces. So the only
>Hmm, this is _really_ surprising. The latest patch didn't add any new
>logging actually. It just enahanced messages which were already printed
>out previously + changed few functions to be not inlined so they show up
>in the traces. So the only explanation is that the workload has changed
>or the
On Mon 10-12-12 02:20:38, azurIt wrote:
[...]
> Michal,
Hi,
> this was printing so many debug messages to console that the whole
> server hangs
Hmm, this is _really_ surprising. The latest patch didn't add any new
logging actually. It just enahanced messages which were already printed
out
On Mon 10-12-12 02:20:38, azurIt wrote:
[...]
Michal,
Hi,
this was printing so many debug messages to console that the whole
server hangs
Hmm, this is _really_ surprising. The latest patch didn't add any new
logging actually. It just enahanced messages which were already printed
out
Hmm, this is _really_ surprising. The latest patch didn't add any new
logging actually. It just enahanced messages which were already printed
out previously + changed few functions to be not inlined so they show up
in the traces. So the only explanation is that the workload has changed
or the
On Mon 10-12-12 11:18:17, azurIt wrote:
Hmm, this is _really_ surprising. The latest patch didn't add any new
logging actually. It just enahanced messages which were already printed
out previously + changed few functions to be not inlined so they show up
in the traces. So the only explanation
I would try to limit changes to minimum. So the original kernel you were
using + the first patch to prevent OOM from the write path + 2 debugging
patches.
ok.
But was it at least related to the debugging from the patch or it was
rather a totally unrelated thing?
I wasn't reading it much but
>There are no other callers AFAICS so I am getting clueless. Maybe more
>debugging will tell us something (the inlining has been reduced for thp
>paths which can reduce performance in thp page fault heavy workloads but
>this will give us better traces - I hope).
Michal,
this was printing so
There are no other callers AFAICS so I am getting clueless. Maybe more
debugging will tell us something (the inlining has been reduced for thp
paths which can reduce performance in thp page fault heavy workloads but
this will give us better traces - I hope).
Michal,
this was printing so many
On Thu 06-12-12 11:12:49, azurIt wrote:
> >Dohh. The very same stack mem_cgroup_newpage_charge called from the page
> >fault. The heavy inlining is not particularly helping here... So there
> >must be some other THP charge leaking out.
> >[/me is diving into the code again]
> >
> >*
>Dohh. The very same stack mem_cgroup_newpage_charge called from the page
>fault. The heavy inlining is not particularly helping here... So there
>must be some other THP charge leaking out.
>[/me is diving into the code again]
>
>* do_huge_pmd_anonymous_page falls back to handle_pte_fault
>*
On Thu 06-12-12 01:29:24, azurIt wrote:
> >OK, so the ENOMEM seems to be leaking from mem_cgroup_newpage_charge.
> >This can only happen if this was an atomic allocation request
> >(!__GFP_WAIT) or if oom is not allowed which is the case only for
> >transparent huge page allocation.
> >The first
On Thu 06-12-12 01:29:24, azurIt wrote:
OK, so the ENOMEM seems to be leaking from mem_cgroup_newpage_charge.
This can only happen if this was an atomic allocation request
(!__GFP_WAIT) or if oom is not allowed which is the case only for
transparent huge page allocation.
The first case can be
Dohh. The very same stack mem_cgroup_newpage_charge called from the page
fault. The heavy inlining is not particularly helping here... So there
must be some other THP charge leaking out.
[/me is diving into the code again]
* do_huge_pmd_anonymous_page falls back to handle_pte_fault
*
On Thu 06-12-12 11:12:49, azurIt wrote:
Dohh. The very same stack mem_cgroup_newpage_charge called from the page
fault. The heavy inlining is not particularly helping here... So there
must be some other THP charge leaking out.
[/me is diving into the code again]
* do_huge_pmd_anonymous_page
>OK, so the ENOMEM seems to be leaking from mem_cgroup_newpage_charge.
>This can only happen if this was an atomic allocation request
>(!__GFP_WAIT) or if oom is not allowed which is the case only for
>transparent huge page allocation.
>The first case can be excluded (in the clean 3.2 stable
On Wed 05-12-12 02:36:44, azurIt wrote:
> >The following should print the traces when we hand over ENOMEM to the
> >caller. It should catch all charge paths (migration is not covered but
> >that one is not important here). If we don't see any traces from here
> >and there is still global OOM
On Wed 05-12-12 02:36:44, azurIt wrote:
The following should print the traces when we hand over ENOMEM to the
caller. It should catch all charge paths (migration is not covered but
that one is not important here). If we don't see any traces from here
and there is still global OOM striking then
OK, so the ENOMEM seems to be leaking from mem_cgroup_newpage_charge.
This can only happen if this was an atomic allocation request
(!__GFP_WAIT) or if oom is not allowed which is the case only for
transparent huge page allocation.
The first case can be excluded (in the clean 3.2 stable kernel)
>The following should print the traces when we hand over ENOMEM to the
>caller. It should catch all charge paths (migration is not covered but
>that one is not important here). If we don't see any traces from here
>and there is still global OOM striking then there must be something else
>to
The following should print the traces when we hand over ENOMEM to the
caller. It should catch all charge paths (migration is not covered but
that one is not important here). If we don't see any traces from here
and there is still global OOM striking then there must be something else
to trigger
On Fri 30-11-12 17:19:23, Michal Hocko wrote:
[...]
> The important question is why you see VM_FAULT_OOM and whether memcg
> charging failure can trigger that. I don not see how this could happen
> right now because __GFP_NORETRY is not used for user pages (except for
> THP which disable memcg OOM
On Fri 30-11-12 17:19:23, Michal Hocko wrote:
[...]
The important question is why you see VM_FAULT_OOM and whether memcg
charging failure can trigger that. I don not see how this could happen
right now because __GFP_NORETRY is not used for user pages (except for
THP which disable memcg OOM
>The only strange thing I noticed is that some groups have 0 limit. Is
>this intentional?
>grep memory.limit_in_bytes cgroups | grep -v uid | sed 's@.*/@@' | sort | uniq
>-c
> 3 memory.limit_in_bytes:0
These are users who are not allowed to run anything.
azur
--
To unsubscribe from this
On Fri 30-11-12 17:26:51, azurIt wrote:
> >Could you also post your complete containers configuration, maybe there
> >is something strange in there (basically grep . -r YOUR_CGROUP_MNT
> >except for tasks files which are of no use right now).
>
>
> Here it is:
>
>Could you also post your complete containers configuration, maybe there
>is something strange in there (basically grep . -r YOUR_CGROUP_MNT
>except for tasks files which are of no use right now).
Here it is:
http://www.watchdog.sk/lkml/cgroups.gz
--
To unsubscribe from this list: send the line
On Fri 30-11-12 16:59:37, azurIt wrote:
> >> Here is the full boot log:
> >> www.watchdog.sk/lkml/kern.log
> >
> >The log is not complete. Could you paste the comple dmesg output? Or
> >even better, do you have logs from the previous run?
>
>
> What is missing there? All kernel messages are
>> Here is the full boot log:
>> www.watchdog.sk/lkml/kern.log
>
>The log is not complete. Could you paste the comple dmesg output? Or
>even better, do you have logs from the previous run?
What is missing there? All kernel messages are logging into /var/log/kern.log
(it's the same as dmesg),
On Fri 30-11-12 16:08:11, azurIt wrote:
> >DMA32 zone is usually fills up first 4G unless your HW remaps the rest
> >of the memory above 4G or you have a numa machine and the rest of the
> >memory is at other node. Could you post your memory map printed during
> >the boot? (e820: BIOS-provided
On Fri 30-11-12 16:03:47, Michal Hocko wrote:
[...]
> Anyway, the more interesting thing is gfp_mask is GFP_NOWAIT allocation
> from the page fault? Huh this shouldn't happen - ever.
OK, it starts making sense now. The message came from
pagefault_out_of_memory which doesn't have gfp nor the
>DMA32 zone is usually fills up first 4G unless your HW remaps the rest
>of the memory above 4G or you have a numa machine and the rest of the
>memory is at other node. Could you post your memory map printed during
>the boot? (e820: BIOS-provided physical RAM map: and following lines)
Here is
1 - 100 of 126 matches
Mail list logo