r. This allows administrators, for example, to require users in
their own top-level mem cgroup subtree to be accounted for with
hierarchical usage. In other words, they can longer evade the oom killer
by using other controllers or subcontainers.
Signed-off-by: David Rientjes
---
Documentation/cgr
writing "cgroup" to the root
mem cgroup's memory.oom_policy).
The "all" oom policy cannot be enabled on the root mem cgroup.
Signed-off-by: David Rientjes
---
Documentation/cgroup-v2.txt | 51 ++---
include/linux/memcon
There are three significant concerns about the cgroup aware oom killer as
it is implemented in -mm:
(1) allows users to evade the oom killer by creating subcontainers or
using other controllers since scoring is done per cgroup and not
hierarchically,
(2) does not allow the user to inf
On Mon, 15 Jan 2018, Michal Hocko wrote:
> > No, this isn't how kernel features get introduced. We don't design a new
> > kernel feature with its own API for a highly specialized usecase and then
> > claim we'll fix the problems later. Users will work around the
> > constraints of the new fea
On Mon, 15 Jan 2018, Johannes Weiner wrote:
> > It's quite trivial to allow the root mem cgroup to be compared exactly the
> > same as another cgroup. Please see
> > https://marc.info/?l=linux-kernel&m=151579459920305.
>
> This only says "that will be fixed" and doesn't address why I care.
>
On Sat, 13 Jan 2018, Johannes Weiner wrote:
> You don't have any control and no accounting of the stuff situated
> inside the root cgroup, so it doesn't make sense to leave anything in
> there while also using sophisticated containerization mechanisms like
> this group oom setting.
>
> In fact, t
.
Cgroup v2 is a very clean interface and I think it's the responsibility of
every controller to maintain that. We should not fall into a cgroup v1
mentality which became very difficult to make extensible. Let's make a
feature that is generally useful, complete, and empowers th
On Thu, 11 Jan 2018, Michal Hocko wrote:
> > > I find this problem quite minor, because I haven't seen any practical
> > > problems
> > > caused by accounting of the root cgroup memory.
> > > If it's a serious problem for you, it can be solved without switching to
> > > the
> > > hierarchical ac
On Wed, 10 Jan 2018, Roman Gushchin wrote:
> > 1. The unfair comparison of the root mem cgroup vs leaf mem cgroups
> >
> > The patchset uses two different heuristics to compare root and leaf mem
> > cgroups and scores them based on number of pages. For the root mem
> > cgroup, it totals the /p
On Thu, 30 Nov 2017, Andrew Morton wrote:
> > This patchset makes the OOM killer cgroup-aware.
>
> Thanks, I'll grab these.
>
> There has been controversy over this patchset, to say the least. I
> can't say that I followed it closely! Could those who still have
> reservations please summarise
ment about invalidate_range() always being called
under the ptl spinlock.
Signed-off-by: David Rientjes
---
include/linux/mmu_notifier.h | 16 +---
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
--- a/inc
On Fri, 15 Dec 2017, Michal Hocko wrote:
> > This uses the new annotation to determine if an mm has mmu notifiers with
> > blockable invalidate range callbacks to avoid oom reaping. Otherwise, the
> > callbacks are used around unmap_page_range().
>
> Do you have any example where this helped? KV
This uses the new annotation to determine if an mm has mmu notifiers with
blockable invalidate range callbacks to avoid oom reaping. Otherwise, the
callbacks are used around unmap_page_range().
Signed-off-by: David Rientjes
---
mm/oom_kill.c | 21 +++--
1 file changed, 11
tch adds a "flags" field
to mmu notifier ops that can set a bit to indicate that these callbacks do
not block.
The implementation is steered toward an expensive slowpath, such as after
the oom reaper has grabbed mm->mmap_sem of a still alive oom victim.
Signed-off-by: David Rientjes
---
On Wed, 13 Dec 2017, Christian König wrote:
> > > > --- a/drivers/misc/sgi-gru/grutlbpurge.c
> > > > +++ b/drivers/misc/sgi-gru/grutlbpurge.c
> > > > @@ -298,6 +298,7 @@ struct gru_mm_struct
> > > > *gru_register_mmu_notifier(void)
> > > > return ERR_PTR(-ENOMEM);
> > > >
On Tue, 12 Dec 2017, Randy Dunlap wrote:
> Sure, but I didn't keep the patch emails.
>
> Acked-by: Randy Dunlap
>
You may have noticed changing functions like is_file_lru() to bool when it
is used to index into an array or as part of an arithmetic operation for
ZVC stats. I'm not sure why y
On Tue, 12 Dec 2017, Dimitri Sivanich wrote:
> > --- a/drivers/misc/sgi-gru/grutlbpurge.c
> > +++ b/drivers/misc/sgi-gru/grutlbpurge.c
> > @@ -298,6 +298,7 @@ struct gru_mm_struct *gru_register_mmu_notifier(void)
> > return ERR_PTR(-ENOMEM);
> > STAT(gms_alloc);
> >
On Mon, 11 Dec 2017, Yaowei Bai wrote:
> This patchset makes some *_is_* like functions return bool because
> these functions only use true or false as their return values.
>
> No functional changes.
>
I think the concern about this type of patchset in the past is that it is
unnecessary churn
On Mon, 11 Dec 2017, Paolo Bonzini wrote:
> > Commit 4d4bbd8526a8 ("mm, oom_reaper: skip mm structs with mmu notifiers")
> > prevented the oom reaper from unmapping private anonymous memory with the
> > oom reaper when the oom victim mm had mmu notifiers registered.
> >
> > The rationale is that
tch adds a "flags" field
for mmu notifiers that can set a bit to indicate that these callbacks do
block.
The implementation is steered toward an expensive slowpath, such as after
the oom reaper has grabbed mm->mmap_sem of a still alive oom victim.
Signed-off-by: David Rientjes
---
arch/po
This uses the new annotation to determine if an mm has mmu notifiers with
blockable invalidate range callbacks to avoid oom reaping. Otherwise, the
callbacks are used around unmap_page_range().
Signed-off-by: David Rientjes
---
mm/oom_kill.c | 21 +++--
1 file changed, 11
On Thu, 7 Dec 2017, Suren Baghdasaryan wrote:
> Slab shrinkers can be quite time consuming and when signal
> is pending they can delay handling of the signal. If fatal
> signal is pending there is no point in shrinking that process
> since it will be killed anyway. This change checks for pending
>
On Thu, 7 Dec 2017, David Rientjes wrote:
> I'm backporting and testing the following patch against Linus's tree. To
> clarify an earlier point, we don't actually have any change from upstream
> code that allows for free_pgtables() before the
> set_bit(MMF_OOM_
On Thu, 7 Dec 2017, Michal Hocko wrote:
> yes. I will fold the following in if this turned out to really address
> David's issue. But I suspect this will be the case considering the NULL
> pmd in the report which would suggest racing with free_pgtable...
>
I'm backporting and testing the followi
On Thu, 7 Dec 2017, Michal Hocko wrote:
> Very well spotted! It could be any task in fact (e.g. somebody reading
> from /proc/ file which requires mm_struct).
>
> oom_reaperoom_victim task
> mmget_not_zero
>
On Wed, 6 Dec 2017, Tetsuo Handa wrote:
> > > One way to solve the issue is to have two mm flags: one to indicate the
> > > mm
> > > is entering unmap_vmas(): set the flag, do down_write(&mm->mmap_sem);
> > > up_write(&mm->mmap_sem), then unmap_vmas(). The oom reaper needs this
> > > flag cle
On Tue, 5 Dec 2017, David Rientjes wrote:
> One way to solve the issue is to have two mm flags: one to indicate the mm
> is entering unmap_vmas(): set the flag, do down_write(&mm->mmap_sem);
> up_write(&mm->mmap_sem), then unmap_vmas(). The oom reaper needs this
> fl
Hi,
I'd like to understand the synchronization between the oom_reaper's
unmap_page_range() and exit_mmap(). The latter does not hold
mm->mmap_sem: it's supposed to be the last thread operating on the mm
before it is destroyed.
If unmap_page_range() races with unmap_vmas(), we trivially call
On Fri, 17 Nov 2017, Yisheng Xie wrote:
> We have already checked whether maxnode is a page worth of bits, by:
> maxnode > PAGE_SIZE*BITS_PER_BYTE
>
> So no need to check it once more.
>
> Acked-by: Vlastimil Babka
> Signed-off-by: Yisheng Xie
Acked-by: David Rientjes
4161536 kB
> DirectMap1G: 6291456 kB
>
> Also, this patch updates corresponding docs to reflect
> Hugetlb entry meaning and difference between Hugetlb and
> HugePages_Total * Hugepagesize.
>
> Signed-off-by: Roman Gushchin
> Cc: Andrew Morton
> Cc: Michal Hocko
>
On Wed, 15 Nov 2017, Michal Hocko wrote:
> > > > if (!hugepages_supported())
> > > > return;
> > > > seq_printf(m,
> > > > @@ -2987,6 +2989,11 @@ void hugetlb_report_meminfo(struct seq_file *m)
> > > > h->resv_huge_pages,
> > > >
> Hugepagesize: 2048 kB
> > Hugetlb: 4194304 kB
> > DirectMap4k: 32632 kB
> > DirectMap2M: 4161536 kB
> > DirectMap1G: 6291456 kB
> >
> > Signed-off-by: Roman Gushchin
> > Cc: Andrew Morton
> > Cc: Michal Hocko
> > C
ge is not synchronously split like it was prior to the thp
refcounting patchset, however.
Acked-by: David Rientjes
On Wed, 1 Nov 2017, Michal Hocko wrote:
> > memory.oom_score_adj would never need to be permanently tuned, just as
> > /proc/pid/oom_score_adj need never be permanently tuned. My response was
> > an answer to Roman's concern that "v8 has it's own limitations," but I
> > haven't seen a concrete
On Tue, 31 Oct 2017, Michal Hocko wrote:
> > I'm not ignoring them, I have stated that we need the ability to protect
> > important cgroups on the system without oom disabling all attached
> > processes. If that is implemented as a memory.oom_score_adj with the same
> > semantics as /proc/pid/
On Fri, 27 Oct 2017, Roman Gushchin wrote:
> The thing is that the hierarchical approach (as in v8), which are you pushing,
> has it's own limitations, which we've discussed in details earlier. There are
> reasons why v12 is different, and we can't really simple go back. I mean if
> there are bett
On Thu, 26 Oct 2017, Johannes Weiner wrote:
> > The nack is for three reasons:
> >
> > (1) unfair comparison of root mem cgroup usage to bias against that mem
> > cgroup from oom kill in system oom conditions,
> >
> > (2) the ability of users to completely evade the oom killer by attachi
On Mon, 23 Oct 2017, Michal Hocko wrote:
> On Sun 22-10-17 17:24:51, David Rientjes wrote:
> > On Thu, 19 Oct 2017, Johannes Weiner wrote:
> >
> > > David would have really liked for this patchset to include knobs to
> > > influence how the algorithm pic
On Thu, 19 Oct 2017, Johannes Weiner wrote:
> David would have really liked for this patchset to include knobs to
> influence how the algorithm picks cgroup victims. The rest of us
> agreed that this is beyond the scope of these patches, that the
> patches don't need it to be useful, and that ther
On Wed, 18 Oct 2017, Yang Shi wrote:
> > Yes, this should catch occurrences of "huge unreclaimable slabs", right?
>
> Yes, it sounds so. Although single "huge" unreclaimable slab might not result
> in excessive slabs use in a whole, but this would help to filter out "small"
> unreclaimable slab.
SLAB_RECLAIM_ACCOUNT is a permanent attribute of a slab cache. Set
__GFP_RECLAIMABLE as part of its ->allocflags rather than check the cachep
flag on every page allocation.
Signed-off-by: David Rientjes
---
mm/slab.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a
On Wed, 18 Oct 2017, Yang Shi wrote:
> > > > Please simply dump statistics for all slab caches where the memory
> > > > footprint is greater than 5% of system memory.
> > >
> > > Unconditionally? User controlable?
> >
> > Unconditionally, it's a single line of output per slab cache and there
> >
On Tue, 17 Oct 2017, Michal Hocko wrote:
> On Mon 16-10-17 17:15:31, David Rientjes wrote:
> > Please simply dump statistics for all slab caches where the memory
> > footprint is greater than 5% of system memory.
>
> Unconditionally? User controlable?
Unconditionally,
nfo/?l=linux-kernel&m=150695909709711&w=2
>
> Signed-off-by: Yang Shi
Acked-by: David Rientjes
Cool!
On Wed, 11 Oct 2017, Yang Shi wrote:
> @@ -161,6 +162,25 @@ static bool oom_unkillable_task(struct task_struct *p,
> return false;
> }
>
> +/*
> + * Print out unreclaimble slabs info when unreclaimable slabs amount is
> greater
> + * than all user memory (LRU pages)
> + */
> +static bool
The same is true for compact_node() when explicitly triggering full node
compaction.
Properly initialize cc.alloc_flags on the stack.
Signed-off-by: David Rientjes
---
mm/compaction.c | 8 +---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
--
On Fri, 13 Oct 2017, Roman Gushchin wrote:
> > Think about it in a different way: we currently compare per-process usage
> > and userspace has /proc/pid/oom_score_adj to adjust that usage depending
> > on priorities of that process and still oom kill if there's a memory leak.
> > Your heuristi
On Thu, 12 Oct 2017, Peter Zijlstra wrote:
> > Attaching kernel threads to a non-root cgroup is generally a bad
> > idea. Kernel threads are generally performing the work required
> > to keep the system working and healthy, and applying various
> > resource limits may affect system stability and p
On Wed, 11 Oct 2017, Roman Gushchin wrote:
> > But let's move the discussion forward to fix it. To avoid necessarily
> > accounting memory to the root mem cgroup, have we considered if it is even
> > necessary to address the root mem cgroup? For the users who opt-in to
> > this heuristic, wou
ocesses to child cgroups either purposefully or unpurposefully, and the
> > inability of userspace to effectively control oom victim selection:
> >
> > Nacked-by: David Rientjes
>
> I consider this NACK rather dubious. Evading the heuristic as you
> describe requir
ompletely evade the oom killer by attaching all
> > processes to child cgroups either purposefully or unpurposefully, and the
> > inability of userspace to effectively control oom victim selection:
> >
> > Nacked-by: David Rientjes
>
> So, if we'll sum the oo
mmit, both of these possibilities exist in the
wild and the problem is only a result of the implementation detail of this
patchset.
For these reasons: unfair comparison of root mem cgroup usage to bias
against that mem cgroup from oom kill in system oom conditions, the
ability of users to completely
On Thu, 5 Oct 2017, Roman Gushchin wrote:
> Traditionally, the OOM killer is operating on a process level.
> Under oom conditions, it finds a process with the highest oom score
> and kills it.
>
> This behavior doesn't suit well the system with many running
> containers:
>
> 1) There is no fairn
hich will use this function to iterate over tasks belonging
> to the root memcg.
>
> Signed-off-by: Roman Gushchin
Acked-by: David Rientjes
On Thu, 5 Oct 2017, Roman Gushchin wrote:
> > This patchset exists because overcommit is real, exactly the same as
> > overcommit within memcg hierarchies is real. 99% of the time we don't run
> > into global oom because people aren't using their limits so it just works
> > out. 1% of the tim
On Thu, 5 Oct 2017, Johannes Weiner wrote:
> > It is, because it can quite clearly be a DoSand was prevented with
> > Roman's earlier design of iterating usage up the hierarchy and comparing
> > siblings based on that criteria. I know exactly why he chose that
> > implementation detail early o
On Wed, 4 Oct 2017, Johannes Weiner wrote:
> > By only considering leaf memcgs, does this penalize users if their memcg
> > becomes oc->chosen_memcg purely because it has aggregated all of its
> > processes to be members of that memcg, which would otherwise be the
> > standard behavior?
> >
>
On Wed, 4 Oct 2017, Roman Gushchin wrote:
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b4de17a78dc1..79f30c281185 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2670,6 +2670,178 @@ static inline bool memcg_has_children(struct
> mem_cgroup *memcg)
> return ret;
> }
On Wed, 4 Oct 2017, Roman Gushchin wrote:
> > > @@ -828,6 +828,12 @@ static void __oom_kill_process(struct task_struct
> > > *victim)
> > > struct mm_struct *mm;
> > > bool can_oom_reap = true;
> > >
> > > + if (is_global_init(victim) || (victim->flags & PF_KTHREAD) ||
> > > + victim->s
On Wed, 4 Oct 2017, Roman Gushchin wrote:
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d5f3a62887cf..b4de17a78dc1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -917,7 +917,8 @@ static void invalidate_reclaim_iterators(struct
> mem_cgroup *dead_memcg)
> * value, the fun
On Tue, 26 Sep 2017, Michal Hocko wrote:
> > No, I agree that we shouldn't compare sibling memory cgroups based on
> > different criteria depending on whether group_oom is set or not.
> >
> > I think it would be better to compare siblings based on the same criteria
> > independent of group_oom
On Mon, 25 Sep 2017, Johannes Weiner wrote:
> > True but we want to have the semantic reasonably understandable. And it
> > is quite hard to explain that the oom killer hasn't selected the largest
> > memcg just because it happened to be in a deeper hierarchy which has
> > been configured to cover
On Fri, 22 Sep 2017, Tejun Heo wrote:
> > If you have this low priority maintenance job charging memory to the high
> > priority hierarchy, you're already misconfigured unless you adjust
> > /proc/pid/oom_score_adj because it will oom kill any larger process than
> > itself in today's kernels a
On Thu, 21 Sep 2017, Johannes Weiner wrote:
> > The issue is that if you opt-in to the new feature, then you are forced to
> > change /proc/pid/oom_score_adj of all processes attached to a cgroup that
> > you do not want oom killed based on size to be oom disabled.
>
> You're assuming that most
On Fri, 22 Sep 2017, Tejun Heo wrote:
> > It doesn't have anything to do with my particular usecase, but rather the
> > ability of userspace to influence the decisions of the kernel. Previous
> > to this patchset, when selection is done based on process size, userspace
> > has full control ove
On Thu, 21 Sep 2017, Johannes Weiner wrote:
> That's a ridiculous nak.
>
> The fact that this patch series doesn't solve your particular problem
> is not a technical argument to *reject* somebody else's work to solve
> a different problem. It's not a regression when behavior is completely
> uncha
On Mon, 18 Sep 2017, Roman Gushchin wrote:
> > As said in other email. We can make priorities hierarchical (in the same
> > sense as hard limit or others) so that children cannot override their
> > parent.
>
> You mean they can set the knob to any value, but parent's value is enforced,
> if it's
On Wed, 20 Sep 2017, Roman Gushchin wrote:
> > It's actually much more complex because in our environment we'd need an
> > "activity manager" with CAP_SYS_RESOURCE to control oom priorities of user
> > subcontainers when today it need only be concerned with top-level memory
> > cgroups. Users
On Thu, 21 Sep 2017, Yang Shi wrote:
> Kernel may panic when oom happens without killable process sometimes it
> is caused by huge unreclaimable slabs used by kernel.
>
> Although kdump could help debug such problem, however, kdump is not
> available on all architectures and it might be malfuncti
On Thu, 21 Sep 2017, Yang Shi wrote:
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 99736e0..173c423 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -43,6 +43,7 @@
>
> #include
> #include "internal.h"
> +#include "slab.h"
>
> #define CREATE_TRACE_POINTS
> #include
> @@ -427
ment, NULL, 'L'},
> { "Xtotals", no_argument, NULL, 'X'},
> { "Bytes", no_argument, NULL, 'B'},
> + { "unreclaim", no_argument, NULL, 'U'},
> { NULL, 0, NULL, 0 }
> };
>
Same.
After that:
Acked-by: David Rientjes
Also, you may find it better to remove the "RFC" tag from the patchset's
header email since it's agreed that we want this.
On Wed, 20 Sep 2017, Yang Shi wrote:
> > > --- a/mm/slab_common.c
> > > +++ b/mm/slab_common.c
> > > @@ -35,6 +35,8 @@
> > > static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
> > > slab_caches_to_rcu_destroy_workfn);
> > > +#define K(x) ((x)/1024)
> > > +
> > > /*
>
On Tue, 19 Sep 2017, Yang Shi wrote:
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -35,6 +35,8 @@
> static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
> slab_caches_to_rcu_destroy_workfn);
>
> +#define K(x) ((x)/1024)
> +
> /*
> * Set of flags that will prevent s
On Fri, 15 Sep 2017, Roman Gushchin wrote:
> > > > But then you just enforce a structural restriction on your configuration
> > > > because
> > > > root
> > > > / \
> > > >AD
> > > > /\
> > > > B C
> > > >
> > > > is a different thing than
> > > >
On Mon, 18 Sep 2017, Michal Hocko wrote:
> > > > But then you just enforce a structural restriction on your configuration
> > > > because
> > > > root
> > > > / \
> > > >AD
> > > > /\
> > > > B C
> > > >
> > > > is a different thing than
> > > >
On Fri, 15 Sep 2017, Roman Gushchin wrote:
> > But then you just enforce a structural restriction on your configuration
> > because
> > root
> > / \
> >AD
> > /\
> > B C
> >
> > is a different thing than
> > root
> > / | \
> >B C D
> >
On Thu, 14 Sep 2017, Michal Hocko wrote:
> > It is certainly possible to add oom priorities on top before it is merged,
> > but I don't see why it isn't part of the patchset.
>
> Because the semantic of the priority for non-leaf memcgs is not fully
> clear and I would rather have the core of the
On Mon, 11 Sep 2017, Roman Gushchin wrote:
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 15af3da5af02..da2b12ea4667 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2661,6 +2661,231 @@ static inline bool memcg_has_children(struct
> mem_cgroup *memcg)
> return ret;
>
On Wed, 13 Sep 2017, Michal Hocko wrote:
> > > This patchset makes the OOM killer cgroup-aware.
> > >
> > > v8:
> > > - Do not kill tasks with OOM_SCORE_ADJ -1000
> > > - Make the whole thing opt-in with cgroup mount option control
> > > - Drop oom_priority for further discussions
> >
> >
On Tue, 12 Sep 2017, Roman Gushchin wrote:
> > I can't imagine that Tejun would be happy with a new mount option,
> > especially when it's not required.
> >
> > OOM behavior does not need to be defined at mount time and for the entire
> > hierarchy. It's possible to very easily implement a tun
On Mon, 11 Sep 2017, Vlastimil Babka wrote:
> > A follow-up change will set the pageblock skip for this memory since it is
> > never useful for either scanner.
> > """
> >
> >> Also there's now a danger that in cases where there's no direct
> >> compaction happening (just kcompactd), nothing wil
On Mon, 11 Sep 2017, Vlastimil Babka wrote:
> > Yes, any page where compound_order(page) == pageblock_order would probably
> > benefit from the same treatment. I haven't encountered such an issue,
> > however, so I thought it was best to restrict it only to hugetlb: hugetlb
> > memory usually
m cgroup. We don't need to print
> the debug information for the each task, as well as play
> with task selection (considering task's children),
> so we can't use the existing oom_kill_process().
>
> Signed-off-by: Roman Gushchin
> Cc: Michal Hocko
> Cc: Vladimi
On Mon, 11 Sep 2017, Roman Gushchin wrote:
> Add a "groupoom" cgroup v2 mount option to enable the cgroup-aware
> OOM killer. If not set, the OOM selection is performed in
> a "traditional" per-process way.
>
> The behavior can be changed dynamically by remounting the cgroupfs.
I can't imagine t
On Mon, 11 Sep 2017, Roman Gushchin wrote:
> This patchset makes the OOM killer cgroup-aware.
>
> v8:
> - Do not kill tasks with OOM_SCORE_ADJ -1000
> - Make the whole thing opt-in with cgroup mount option control
> - Drop oom_priority for further discussions
Nack, we specifically require
On Fri, 1 Sep 2017, Vlastimil Babka wrote:
> The pageblock_skip_persistent() function checks for HugeTLB pages of pageblock
> order. When clearing pageblock skip bits for compaction, the bits are not
> cleared for such pageblocks, because they cannot contain base pages suitable
> for migration, no
On Wed, 23 Aug 2017, Vlastimil Babka wrote:
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -217,6 +217,20 @@ static void reset_cached_positions(struct zone *zone)
> > pageblock_start_pfn(zone_end_pfn(zone) -
On Wed, 23 Aug 2017, Vlastimil Babka wrote:
> On 08/16/2017 01:39 AM, David Rientjes wrote:
> > Kcompactd is needlessly ignoring pageblock skip information. It is doing
> > MIGRATE_SYNC_LIGHT compaction, which is no more powerful than
> > MIGRATE_SYNC compaction.
> >
On Fri, 8 Sep 2017, Christopher Lameter wrote:
> Ok. Certainly there were scalability issues (lots of them) and the sysctl
> may have helped there if set globally. But the ability to kill the
> allocating tasks was primarily used in cpusets for constrained allocation.
>
I remember discussing it
On Thu, 7 Sep 2017, Christopher Lameter wrote:
> > I am not sure this is how things evolved actually. This is way before
> > my time so my git log interpretation might be imprecise. We do have
> > oom_badness heuristic since out_of_memory has been introduced and
> > oom_kill_allocating_task has be
On Thu, 7 Sep 2017, Christopher Lameter wrote:
> > SGI required it when it was introduced simply to avoid the very expensive
> > tasklist scan. Adding Christoph Lameter to the cc since he was involved
> > back then.
>
> Really? From what I know and worked on way back when: The reason was to be
>
with the overall patchset though :)
> To make a first step towards deprecation, let's warn potential
> users about deprecation plans.
>
> Signed-off-by: Roman Gushchin
> Cc: Andrew Morton
> Cc: Michal Hocko
> Cc: Johannes Weiner
> Cc: David Rientjes
> Cc: Vladimi
ll the vfree calls to use kvfree.
>
Hopefully this can make it into 4.13.
Fixes: 54f180d3c181 ("mm, swap: use kvzalloc to allocate some swap data
structures")
Cc: sta...@vger.kernel.org [4.12]
> Found by running generic/357 from xfstests.
>
> Signed-off-by: Darrick J. Won
On Thu, 31 Aug 2017, Roman Gushchin wrote:
> So, it looks to me that we're close to an acceptable version,
> and the only remaining question is the default behavior
> (when oom_group is not set).
>
Nit: without knowledge of the implementation, I still don't think I would
know what an "out of me
On Wed, 30 Aug 2017, Roman Gushchin wrote:
> I've spent some time to implement such a version.
>
> It really became shorter and more existing code were reused,
> howewer I've met a couple of serious issues:
>
> 1) Simple summing of per-task oom_score doesn't make sense.
>First, we calculate
On Thu, 24 Aug 2017, Roman Gushchin wrote:
> > > Do you have an example, which can't be effectively handled by an approach
> > > I'm suggesting?
> >
> > No, I do not have any which would be _explicitly_ requested but I do
> > envision new requirements will emerge. The most probable one would be
>
h a rare operation.
>
> Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages
> allocated from the PCP")
> Reported-and-tested-by: Wang, Wendy
> Cc: sta...@kernel.org
> Signed-off-by: Mel Gorman
Acked-by: David Rientjes
On Wed, 23 Aug 2017, Roman Gushchin wrote:
> Traditionally, the OOM killer is operating on a process level.
> Under oom conditions, it finds a process with the highest oom score
> and kills it.
>
> This behavior doesn't suit well the system with many running
> containers:
>
> 1) There is no fair
On Wed, 23 Aug 2017, Roman Gushchin wrote:
> > It's better to have newbies consult the documentation once than making
> > everybody deal with long and cumbersome names for the rest of time.
> >
> > Like 'ls' being better than 'read_and_print_directory_contents'.
>
> I don't think it's a good arg
ays reschedule in smaps_pte_range() if necessary since the pagewalk
iteration can be expensive.
Signed-off-by: David Rientjes
---
fs/proc/task_mmu.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c
+++ b/fs/p
601 - 700 of 3147 matches
Mail list logo