Newline per Hillf
Signed-off-by: David Rientjes
---
mm/oom_kill.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1767e50844ac..51c091849dcb 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -408,7 +408,7 @@ static void dump_header(struct
Signed-off-by: David Rientjes
---
mm/oom_kill.c | 16 +---
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -403,12 +403,14 @@ static void dump_tasks(struct mem_cgroup *memcg, const
nodemas
tting
the nodemask to cpuset_current_mems_allowed is redundant and prevents
debugging issues where ac->nodemask is not set properly in the page
allocator.
This provides better debugging output since
cpuset_print_current_mems_allowed() is already provided.
Signed-off-by: David Rientjes
---
mm
_alloc_pages_nodemask() can end up using a bogus nodemask, which could lead
> e.g. to premature OOM.
>
> Fixes: be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to
> alloc_pages_vma")
> Signed-off-by: Vlastimil Babka
> Cc: sta...@vger.kernel.org
> Cc: Aneesh Kuma
ned-off-by: Daniel Thompson
Acked-by: David Rientjes
On Tue, 17 Jan 2017, Michal Hocko wrote:
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 57dc3c3b53c1..3e35eb04a28a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1912,8 +1912,8 @@ extern void si_meminfo_node(struct sysinfo *val, int
> nid);
> extern unsigned l
On Tue, 17 Jan 2017, kwon wrote:
> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> index 1dfc209..2d30ace 100644
> >> --- a/mm/slab_common.c
> >> +++ b/mm/slab_common.c
> >> @@ -744,7 +744,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >>bool need_rcu_barrier = false;
> >>in
g task numa
> policy. Add this check to not pollute the output with the pointless
> information.
>
> Acked-by: Mel Gorman
> Acked-by: Johannes Weiner
> Signed-off-by: Michal Hocko
s/fileter/filter/
Acked-by: David Rientjes
On Sat, 14 Jan 2017, Johannes Weiner wrote:
> The OOM killer livelock was the motivation for this patch. With that
> ruled out, what's the point of this patch? Try a bit less hard to move
> charges during task migration?
>
Most important part is to fail ->can_attach() instead of oom killing
pro
& ~__GFP_NORETRY, which is
pointless as written.
Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge
path")
Acked-by: Michal Hocko
Signed-off-by: David Rientjes
---
mm/memcontrol.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/mem
OOM parameter in charge
path")
Signed-off-by: David Rientjes
---
mm/memcontrol.c | 7 +--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4353,9 +4353,12 @@ static int mem_cgroup_do_prec
.
This also restructures mem_cgroup_wait_acct_move() since it is not
possible for mc.moving_task to be current.
Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge
path")
Signed-off-by: David Rientjes
---
mm/memcontrol.c | 32 +++---
S(status) == 0);
> return 0;
> }
>
> Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously to
> the update in gup.c in the original commit. The same pattern exists in
> follow_devmap_pmd. However, we should not be able to reach that check
> with FOLL_COW set, so add WARN_ONCE to make sure we notice if we ever
> do.
>
> Signed-off-by: Keno Fischer
Tested-by: David Rientjes
rts five and triple_flag_store() was getting unnecessarily messy.
Signed-off-by: David Rientjes
---
v2: uses new naming suggested by Vlastimil
(defer+madvise order looks better in
"... defer defer+madvise madvise ...")
v1 was acked by Mel, and it probably could have been pre
On Tue, 10 Jan 2017, Vlastimil Babka wrote:
> > I get very confused by the /sys/kernel/mm/transparent_hugepage/defrag
> > versus enabled flags, and this may be a terrible, even more confusing,
> > idea: but I've been surprised and sad to see defrag with a "defer"
> > option, but poor enabled witho
On Mon, 9 Jan 2017, Vlastimil Babka wrote:
> > Any suggestions for a better name for "background" are more than welcome.
>
> Why not just "madvise+defer"?
>
Seeing no other activity regarding this issue (omg!), I'll wait a day or
so to see if there are any objections to "madvise+defer" or su
On Fri, 6 Jan 2017, Vlastimil Babka wrote:
> Deciding between "defer" and "background" is however confusing, and also
> doesn't indicate that the difference is related to madvise.
>
Any suggestions for a better name for "background" are more than welcome.
> > The kernel implementation takes l
On Thu, 5 Jan 2017, Vlastimil Babka wrote:
> Hmm that's probably why it's hard to understand, because "madvise
> request" is just setting a vma flag, and the THP allocation (and defrag)
> still happens at fault.
>
> I'm not a fan of either name, so I've tried to implement my own
> suggestion. Tur
serspace, was offered:
http://marc.info/?t=14823661273. This additional mode is a
compromise.
This patch also cleans up the helper function for storing to "enabled"
and "defrag" since the former supports three modes while the latter
supports five and triple_flag_st
On Wed, 4 Jan 2017, Vlastimil Babka wrote:
> > Hmm, is there a significant benefit to setting "defer" rather than "never"
> > if you can rely on khugepaged to trigger compaction when it tries to
> > allocate. I suppose if there is nothing to collapse that this won't do
> > compaction, but is t
On Wed, 4 Jan 2017, Mel Gorman wrote:
> There is a slight disconnect. The bug reports I'm aware of predate the
> introduction of "defer" and the current "madvise" semantics for defrag. The
> current semantics have not had enough time in the field to generate
> reports. I expect lag before users ar
On Mon, 2 Jan 2017, Vlastimil Babka wrote:
> I'm late to the thread (I did read it fully though), so instead of
> multiple responses, I'll just list my observations here:
>
> - "defer", e.g. background kswapd+compaction is not a silver bullet, it
> will also affect the system. Mel already mention
On Tue, 3 Jan 2017, Mel Gorman wrote:
> > I sympathize with that, I've dealt with a number of issues that we have
> > encountered where thp defrag was either at fault or wasn't, and there were
> > also suggestions to set defrag to "madvise" to rule it out and that
> > impacted other users.
> >
On Fri, 30 Dec 2016, Mel Gorman wrote:
> Michal is correct in that my intent for defer was to have "never stall"
> as the default behaviour. This was because of the number of severe stalls
> users experienced that lead to recommendations in tuning guides to always
> disable THP. I'd also seen mul
On Wed, 28 Dec 2016, Michal Hocko wrote:
> I do care more about _users_ and their _experience_ than what
> application _writers_ think is the best. This is the whole point
> of giving the defrag tunable. madvise(MADV_HUGEPAGE) is just a hint to
> the system that using transparent hugepages is _pre
On Tue, 27 Dec 2016, Michal Hocko wrote:
> > Important to who?
>
> To all users who want to have THP without stalls experience. This was
> the whole point of 444eb2a449ef ("mm: thp: set THP defrag by default to
> madvise and add a stall-free defrag option").
>
THEY DO NOT STALL. If the applica
On Mon, 26 Dec 2016, Michal Hocko wrote:
> But my primary argument is that if you tweak "defer" value behavior
> then you lose the only "stall free yet allow background compaction"
> option. That option is really important.
Important to who?
What regresses if we kick a background kthread to comp
On Fri, 23 Dec 2016, Michal Hocko wrote:
> > We have no way to compact memory for users who are not using
> > MADV_HUGEPAGE,
>
> yes we have. it is defrag=always. If you do not want direct compaction
> and the resulting allocation stalls then you have to rely on kcompactd
> which is something we
On Fri, 23 Dec 2016, Michal Hocko wrote:
> > The offering of defer breaks backwards compatibility with previous
> > settings of defrag=madvise, where we could set madvise(MADV_HUGEPAGE) on
> > .text segment remap and try to force thp backing if available but not
> > directly reclaim for non VM_
o provided most of the kerneldoc comment.)
>
> Cc: Andrew Morton
> Acked-by: Michal Hocko
> Signed-off-by: Vegard Nossum
Acked-by: David Rientjes
for the series
On Thu, 22 Dec 2016, Michal Hocko wrote:
> > Currently, when defrag is set to "madvise", thp allocations will direct
> > reclaim. However, when defrag is set to "defer", all thp allocations do
> > not attempt reclaim regardless of MADV_HUGEPAGE.
> >
> > This patch always directly reclaims for MA
tion").
In this form, "defer" is a stronger, more heavyweight version of
"madvise".
Signed-off-by: David Rientjes
---
Documentation/vm/transhuge.txt | 7 +--
mm/huge_memory.c | 10 ++
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/Doc
mpact_free_scanned" for compatibility.
> >
> > It could be argued that explicitly triggered compaction could also be
> > tracked separately, and that could be added if others find it useful.
> >
> > Signed-off-by: David Rientjes
>
> A bit of downside i
ively.
These values are still accounted for in the general
"compact_migrate_scanned" and "compact_free_scanned" for compatibility.
It could be argued that explicitly triggered compaction could also be
tracked separately, and that could be added if others find it useful.
Signe
be inferred by the difference in number of total objects and number
of active objects.
Suggested-by: Joonsoo Kim
Signed-off-by: David Rientjes
---
For -mm because this depends on
mm-slab-faster-active-and-free-stats.patch
mm/slab.c | 70
oids active
slab tracking when a slab goes from free to partial or partial to free.
Suggested-by: Joonsoo Kim
Signed-off-by: David Rientjes
---
mm/slab.c | 48 +---
mm/slab.h | 4 ++--
2 files changed, 23 insertions(+), 29 deletions(-)
diff --git a/
total number of free pages. This is exported to userspace as part of a
new /proc/vmstat field.
Signed-off-by: David Rientjes
---
v2: do not track free pages per migratetype since page allocator stress
testing reveals this tracking can impact workloads and there is no
substantial benefit
even
start async compaction in a scenario where free memory cannot be
isolated as a migration target.
This patch does not deem async compaction to be suitable when the
watermark checks using only the amount of free movable memory fails.
Signed-off-by: David Rientjes
---
v2: convert to per-zone
to be precise,
> so we don't need to take the dm-bufio lock.
>
> Signed-off-by: Mikulas Patocka
Acked-by: David Rientjes
On Sun, 20 Nov 2016, Eric Dumazet wrote:
> Another potential issue with CONFIG_VMAP_STACK is that we make no
> attempt to allocate 4 consecutive pages.
>
> Even if we have plenty of memory, 4 calls to alloc_page() are likely to
> give us 4 pages in completely different locations.
>
> Here I prin
On Thu, 17 Nov 2016, Douglas Anderson wrote:
> diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
> index b3ba142e59a4..885ba5482d9f 100644
> --- a/drivers/md/dm-bufio.c
> +++ b/drivers/md/dm-bufio.c
> @@ -89,6 +89,7 @@ struct dm_bufio_client {
>
> struct list_head lru[LIST_SIZE];
t;
Yes, sorry, I'll fix that in v2. I think less than half a kilobyte for
each memory zone is satisfactory for extra tracking, compaction
improvements, and optimized /proc/pagetypeinfo, though.
> > Signed-off-by: David Rientjes
>
> I'd be for this if there are no perfor
-zone metadata at worst by 48 bytes per memory zone (when CONFIG_CMA
and CONFIG_MEMORY_ISOLATION are enabled).
Signed-off-by: David Rientjes
---
include/linux/mmzone.h | 3 ++-
mm/compaction.c| 4 ++--
mm/page_alloc.c| 47 ---
mm/vms
above would easily
trigger earlier when async compaction will become very expensive.
It would also be possible to check zone watermarks in
__compaction_suitable() using the amount of MIGRATE_MOVABLE memory as
an alternative.
Signed-off-by: David Rientjes
---
fs/buffer.c| 2
On Fri, 11 Nov 2016, Joonsoo Kim wrote:
> Hello, David.
>
> Maintaining acitve/free_slab counters looks so complex. And, I think
> that we don't need to maintain these counters for faster slabinfo.
> Key point is to remove iterating n->slabs_partial list.
>
> We can calculate active slab/object
On Tue, 8 Nov 2016, Andrew Morton wrote:
> > Reading /proc/slabinfo or monitoring slabtop(1) can become very expensive
> > if there are many slab caches and if there are very lengthy per-node
> > partial and/or free lists.
> >
> > Commit 07a63c41fa1f ("mm/slab: improve performance of gathering sl
ather than iterating the lists at runtime when reading
/proc/slabinfo.
[rient...@google.com: changelog]
Signed-off-by: Greg Thelen
Signed-off-by: David Rientjes
---
mm/slab.c | 117 +-
mm/slab.h | 3 +-
2 files changed, 49 inserti
On Wed, 2 Nov 2016, Thomas Garnier wrote:
> >> diff --git a/mm/slab.h b/mm/slab.h
> >> index 9653f2e..58be647 100644
> >> --- a/mm/slab.h
> >> +++ b/mm/slab.h
> >> @@ -144,6 +144,9 @@ static inline unsigned long kmem_cache_flags(unsigned
> >> long object_size,
> >>
> >> #define CACHE_CREATE_MASK
mpol_rebind_preferred()) or when just printing
> the mempolicy structure (/proc/PID/numa_maps).
> Isolated tests done.
>
> Signed-off-by: Piotr Kwapulinski
Acked-by: David Rientjes
On Mon, 31 Oct 2016, Thomas Garnier wrote:
> While testing OBJFREELIST_SLAB integration with pagealloc, we found a
> bug where kmem_cache(sys) would be created with both CFLGS_OFF_SLAB &
> CFLGS_OBJFREELIST_SLAB.
>
> The original kmem_cache is created early making OFF_SLAB not possible.
> When km
() branch.
Avoid the unlikely() branch when in a context where pmd is known to be
good for __split_huge_pmd() directly.
Signed-off-by: David Rientjes
---
include/linux/huge_mm.h | 2 ++
mm/memory.c | 4 ++--
mm/mempolicy.c | 2 +-
mm/mprotect.c | 2 +-
4 files
On Tue, 27 Sep 2016, Ben Greear wrote:
>
> I have been running this patch for a while:
>
> ath10k: Use GPF_DMA32 for firmware swap memory.
>
> This fixes OS crash when using QCA 9984 NIC on x86-64 system
> without vt-d enabled.
>
> Also tested on ea8500 with 9980, and x86-64 w
On Thu, 22 Sep 2016, zijun_hu wrote:
> On 2016/9/22 5:21, David Rientjes wrote:
> > On Wed, 21 Sep 2016, zijun_hu wrote:
> >
> >> From: zijun_hu
> >>
> >> correct lazy_max_pages() return value if the number of online
> >> CPUs is power of 2
&
On Thu, 22 Sep 2016, zijun_hu wrote:
> > We don't support inserting when va->va_start == tmp_va->va_end, plain and
> > simple. There's no reason to do so. NACK to the patch.
> >
> i am sorry i disagree with you because
> 1) in almost all context of vmalloc, original logic treat the special cas
On Thu, 22 Sep 2016, zijun_hu wrote:
> >> correct a few logic error for __insert_vmap_area() since the else
> >> if condition is always true and meaningless
> >>
> >> in order to fix this issue, if vmap_area inserted is lower than one
> >> on rbtree then walk around left branch; if higher then rig
On Wed, 21 Sep 2016, zijun_hu wrote:
> From: zijun_hu
>
> correct lazy_max_pages() return value if the number of online
> CPUs is power of 2
>
> Signed-off-by: zijun_hu
> ---
> mm/vmalloc.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.
On Wed, 21 Sep 2016, zijun_hu wrote:
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index cc6ecd6..a125ae8 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2576,32 +2576,13 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int
> nr_vms)
> static void *s_start(struct seq_file *m, loff_t *pos
On Wed, 21 Sep 2016, zijun_hu wrote:
> From: zijun_hu
>
> correct a few logic error for __insert_vmap_area() since the else
> if condition is always true and meaningless
>
> in order to fix this issue, if vmap_area inserted is lower than one
> on rbtree then walk around left branch; if higher t
On Tue, 20 Sep 2016, Piotr Kwapulinski wrote:
> > There wasn't an MPOL_LOCAL when I introduced either of these flags, it's
> > an oversight to allow them to be passed.
> >
> > Want to try to update set_mempolicy(2) with the procedure outlined in
> > https://www.kernel.org/doc/man-pages/patches.
red()) or when just printing
> the mempolicy structure (/proc/PID/numa_maps).
> Isolated tests done.
>
> Signed-off-by: Piotr Kwapulinski
Acked-by: David Rientjes
There wasn't an MPOL_LOCAL when I introduced either of these flags, it's
an oversight to allow them to be passed.
W
On Sat, 17 Sep 2016, Anshuman Khandual wrote:
> > I'm questioning if this information can be inferred from information
> > already in /proc/zoneinfo and sysfs. We know the no-fallback zonelist is
> > going to include the local node, and we know the other zonelists are
> > either node ordered o
On Mon, 12 Sep 2016, Anshuman Khandual wrote:
> >> > after memory or node hot[un]plug is desirable. This change adds one
> >> > new sysfs interface (/sys/devices/system/memory/system_zone_details)
> >> > which will fetch and dump this information.
> > Doesn't this violate the "one value per file"
; disabled_cpus.
>
> Signed-off-by: Dou Liyang
Acked-by: David Rientjes
On Wed, 31 Aug 2016, Reza Arbab wrote:
> > Nope, the return value of changing state from online to online was
> > established almost 11 years ago in commit 3947be1969a9.
>
> Fair enough. So if online-to-online is -EINVAL,
online-to-online for state is -EINVAL, it has been since 2005.
> 1. Shou
On Wed, 31 Aug 2016, Reza Arbab wrote:
> > The correct fix is for store_mem_state() to return -EINVAL when
> > device_online() returns non-zero.
>
> Let me put it to you this way--which one of these sysfs operations is behaving
> correctly?
>
> # cd /sys/devices/system/memory/memory0
>
On Tue, 30 Aug 2016, wei.guo.si...@gmail.com wrote:
> From: Simon Guo
>
> This patch adds mlock() test for multiple invocation on
> the same address area, and verify it doesn't mess the
> rlimit mlock limitation.
>
Thanks for expanding mlock testing. I'm wondering if you are interested
in mo
ning node state.
>
That would mean that when node_reclaim_mode is enabled that we weren't
properly returning NODE_RECLAIM_NOSCAN if a remote node had its own cpus
and PGDAT_RECLAIM_LOCKED wasn't already set, so this seems like it could
result in a performance improvement.
&
On Wed, 31 Aug 2016, Andrew Morton wrote:
> > Attempting to online memory which is already online will cause this:
> >
> > 1. store_mem_state() called with buf="online"
> > 2. device_online() returns 1 because device is already online
> > 3. store_mem_state() returns 1
> > 4. calling code interpr
On Thu, 25 Aug 2016, Michal Hocko wrote:
> > I don't believe it has been an issue in the past for any archs that
> > don't use thp.
>
> Well, fragmentation is a real problem and order-0 reclaim will be never
> anywhere close to reliably provide higher order pages. Well, reclaiming
> a lot of memo
On Fri, 19 Aug 2016, a...@linux-foundation.org wrote:
> From: Vegard Nossum
> Subject: stackdepot: fix mempolicy use-after-free
>
> This patch fixes the following:
>
> BUG: KASAN: use-after-free in alloc_pages_current+0x363/0x370 at addr
> 88010b48102c
> Read of size 2 by task trin
On Tue, 23 Aug 2016, Michal Hocko wrote:
> From: Michal Hocko
>
> The current wording of the COMPACTION Kconfig help text doesn't
> emphasise that disabling COMPACTION might cripple the page allocator
> which relies on the compaction quite heavily for high order requests and
> an unexpected OOM
; nearest_obj()")
> Signed-off-by: Geert Uytterhoeven
Acked-by: David Rientjes
nt to slowpath just to wake up
> kswapd and then succeed on min watermark
> 2 - try all zones with min watermark before resorting to no watermark
> (if allowed), so we don't needlessly put below min watermark the first
> zone in zonelist, while some later zone would still be above watermark
>
The second point makes sense, thanks!
Acked-by: David Rientjes
On Wed, 20 Jul 2016, Michal Hocko wrote:
> > Any mempool_alloc() user that then takes a contended mutex can do this.
> > An example:
> >
> > taskA taskB taskC
> > - - -
> > mempool_alloc(a)
> > mutex_lock(b)
> >
On Mon, 18 Jul 2016, Vlastimil Babka wrote:
> Since THP allocations during page faults can be costly, extra decisions are
> employed for them to avoid excessive reclaim and compaction, if the initial
> compaction doesn't look promising. The detection has never been perfect as
> there is no gfp fla
On Mon, 18 Jul 2016, Vlastimil Babka wrote:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 30443804f156..a04a67745927 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3510,7 +3510,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int
> order,
> struct page *page = N
On Mon, 18 Jul 2016, Vlastimil Babka wrote:
> After __alloc_pages_slowpath() sets up new alloc_flags and wakes up kswapd, it
> first tries get_page_from_freelist() with the new alloc_flags, as it may
> succeed e.g. due to using min watermark instead of low watermark. It makes
> sense to to do this
RMARKS from
> gfp_to_alloc_flags() to gfp_pfmemalloc_allowed(). This means we don't have to
> mask out ALLOC_NO_WATERMARKS in numerous places in __alloc_pages_slowpath()
> anymore. The only two tests for the flag can instead call
> gfp_pfmemalloc_allowed().
>
> Signed-off-by: Vlastimil Babka
that's not what we usually expect, so probably better not to isolate it.
>
> When tested by stress-highalloc from mmtests, this has reduced the number of
> page migrate failures by 60-70%.
>
> Signed-off-by: Hugh Dickins
> Signed-off-by: Vlastimil Babka
> Acked-by: Michal Hocko
Acked-by: David Rientjes
On Tue, 19 Jul 2016, Johannes Weiner wrote:
> Mempool guarantees forward progress by having all necessary memory
> objects for the guaranteed operation in reserve. Think about it this
> way: you should be able to delete the pool->alloc() call entirely and
> still make reliable forward progress. It
On Tue, 19 Jul 2016, SF Markus Elfring wrote:
> > From: Markus Elfring
> > Date: Mon, 16 Nov 2015 08:20:36 +0100
> >
> > The mempool_destroy() function tests whether its argument is NULL
> > and then returns immediately. Thus the test around the calls is not needed.
> >
> > This issue was detec
On Tue, 19 Jul 2016, Wei Yongjun wrote:
> From: Wei Yongjun
>
> Using list_move() instead of list_del() + list_add().
>
... to prevent needlessly poisoning the next and prev values.
> Signed-off-by: Wei Yongjun
Acked-by: David Rientjes
On Tue, 19 Jul 2016, Xishi Qiu wrote:
> Memory offline could happen on both movable zone and non-movable zone, and we
> can offline the whole node if the zone is movable_zone(the node only has one
> movable_zone), and if the zone is normal_zone, we cannot offline the whole
> node,
> because some
On Mon, 18 Jul 2016, Michal Hocko wrote:
> David Rientjes was objecting that such an approach wouldn't help if the
> oom victim was blocked on a lock held by process doing mempool_alloc. This
> is very similar to other oom deadlock situations and we have oom_reaper
> to deal w
On Mon, 18 Jul 2016, Michal Hocko wrote:
> > There's
> > two fundamental ways to go about it: (1) ensure mempool_alloc() can make
> > forward progress (whether that's by way of gfp flags or access to memory
> > reserves, which may depend on the process context such as PF_MEMALLOC) or
> > (2) r
On Fri, 15 Jul 2016, Mikulas Patocka wrote:
> And what about the oom reaper? It should have freed all victim's pages
> even if the victim is looping in mempool_alloc. Why the oom reaper didn't
> free up memory?
>
Is that possible with mlock or shared memory? Nope. The oom killer does
not ha
On Fri, 15 Jul 2016, Michal Hocko wrote:
> > If PF_MEMALLOC context is allocating too much memory reserves, then I'd
> > argue that is a problem independent of using mempool_alloc() since
> > mempool_alloc() can evolve directly into a call to the page allocator.
> > How does such a process gua
On Fri, 15 Jul 2016, Mikulas Patocka wrote:
> > There is no guarantee that _anything_ can return memory to the mempool,
>
> You misunderstand mempools if you make such claims.
>
> There is in fact guarantee that objects will be returned to mempool. In
> the past I reviewed device mapper thoroug
On Fri, 15 Jul 2016, Mikulas Patocka wrote:
> > Umm, show me an explicit guarantee where the oom reaper will free memory
> > such that other threads may return memory to this process's mempool so it
> > can make forward progress in mempool_alloc() without the need of utilizing
> > memory reserv
On Thu, 14 Jul 2016, Xishi Qiu wrote:
> alloc_migrate_target() is called from migrate_pages(), and the page
> is always from user space, so we can add __GFP_HIGHMEM directly.
>
> Second, when we offline a node, the new page should alloced from other
> nodes instead of the current node, because re
On Fri, 15 Jul 2016, Tetsuo Handa wrote:
> Whether the OOM reaper will free some memory no longer matters. Instead,
> whether the OOM reaper will let the OOM killer select next OOM victim matters.
>
> Are you aware that the OOM reaper will let the OOM killer select next OOM
> victim (currently by
On Thu, 14 Jul 2016, Michal Hocko wrote:
> > It prevents the whole system from livelocking due to an oom killed process
> > stalling forever waiting for mempool_alloc() to return. No other threads
> > may be oom killed while waiting for it to exit.
>
> But it is true that the patch has uninten
On Thu, 14 Jul 2016, Tetsuo Handa wrote:
> David Rientjes wrote:
> > On Wed, 13 Jul 2016, Mikulas Patocka wrote:
> >
> > > What are the real problems that f9054c70d28bc214b2857cf8db8269f4f45a5e23
> > > tries to fix?
> > >
> >
> > It pr
On Thu, 14 Jul 2016, Mikulas Patocka wrote:
> > schedule
> > schedule_timeout
> > io_schedule_timeout
> > mempool_alloc
> > __split_and_process_bio
> > dm_request
> > generic_make_request
> > submit_bio
> > mpage_readpages
> > ext4_readpages
> > __do_page_cache_readahead
> > ra_submit
> > filemap_
On Wed, 13 Jul 2016, Tetsuo Handa wrote:
> I wonder whether commit f9054c70d28bc214 ("mm, mempool: only set
> __GFP_NOMEMALLOC if there are free elements") is doing correct thing.
> It says
>
> If an oom killed thread calls mempool_alloc(), it is possible that it'll
> loop forever if ther
On Wed, 13 Jul 2016, Mikulas Patocka wrote:
> What are the real problems that f9054c70d28bc214b2857cf8db8269f4f45a5e23
> tries to fix?
>
It prevents the whole system from livelocking due to an oom killed process
stalling forever waiting for mempool_alloc() to return. No other threads
may be
valid PFNs. No caller of early_pfn_to_nid
> cares except early_page_uninitialised. This patch has early_pfn_to_nid
> always return a valid node.
>
> Signed-off-by: Mel Gorman
> Cc: # 4.2+
Acked-by: David Rientjes
This makes me wonder about meminit_pfn_in_nid(), however, since
PFN order. This is not guaranteed so this patch adds robustness by always
> checking if the node being checked is online.
>
> Signed-off-by: Mel Gorman
> Cc: # 4.2+
Acked-by: David Rientjes
On Thu, 30 Jun 2016, Joonsoo Kim wrote:
> We need to find a root cause of this problem, first.
>
> I guess that this problem would happen when isolate_freepages_block()
> early stop due to watermark check (if your patch is applied to your
> kernel). If scanner meets, cached pfn will be reset and
cc->free_pfn go
> backward though it would not be a big problem. Just leaving
> isolate_start_pfn as isolate_freepages_block returns would be a proper
> solution here.
>
I guess, but I don't see what value there is in starting free page
isolation within a pageblock
801 - 900 of 3171 matches
Mail list logo