wever this
> architecture that overrides the task possible mask is unlikely to be
> willing to integrate new development.
>
> Suggested-by: Michal Hocko
> Signed-off-by: Frederic Weisbecker
Thanks, this makes sense to me. Up to scheduler maitainers whether this
makes sense in
On Wed 18-09-24 11:37:42, Frederic Weisbecker wrote:
> Le Tue, Sep 17, 2024 at 01:07:25PM +0200, Michal Hocko a écrit :
[...]
> > I am not objecting to patch per se. I am just not sure this is really
> > needed. It is great to have kernel threads bound to non isolated cpus by
>
On Tue 17-09-24 12:34:51, Frederic Weisbecker wrote:
> Le Tue, Sep 17, 2024 at 08:26:49AM +0200, Michal Hocko a écrit :
> > On Tue 17-09-24 00:49:16, Frederic Weisbecker wrote:
> > > Kthreads attached to a preferred NUMA node for their task structure
> > > allocation
On Tue 17-09-24 09:01:08, Vlastimil Babka wrote:
> On 9/17/24 8:26 AM, Michal Hocko wrote:
> > On Tue 17-09-24 00:49:16, Frederic Weisbecker wrote:
> >> Kthreads attached to a preferred NUMA node for their task structure
> >> allocation can also be assumed to run prefer
s how is that different from
tasksetting an userspace task to a cpu that goes offline? We still do
allow such a task to run, right? We just do not care about affinity
anymore.
--
Michal Hocko
SUSE Labs
util.h | 45 +-
> .../testing/selftests/kvm/include/test_util.h | 18 +
> tools/testing/selftests/kvm/lib/kvm_util.c| 443 +++--
> tools/testing/selftests/kvm/lib/test_util.c | 99 ++
> .../kvm/x86_64/private_mem_conversions_test.c | 158 +-
> .../x86_64/private_mem_conversions_test.sh| 91 +
> .../kvm/x86_64/private_mem_kvm_exits_test.c | 11 +-
> virt/kvm/guest_memfd.c| 1563 -
> virt/kvm/kvm_main.c | 17 +
> virt/kvm/kvm_mm.h | 16 +
> 27 files changed, 3288 insertions(+), 443 deletions(-)
> create mode 100644
> tools/testing/selftests/kvm/guest_memfd_hugetlb_reporting_test.c
> create mode 100644 tools/testing/selftests/kvm/guest_memfd_pin_test.c
> create mode 100644 tools/testing/selftests/kvm/guest_memfd_sharing_test.c
> create mode 100755
> tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.sh
>
> --
> 2.46.0.598.g6f2099f65c-goog
--
Michal Hocko
SUSE Labs
can track the time process has been preempted by other means, no? We
have context switching tracepoints in place. Have you considered that
option?
--
Michal Hocko
SUSE Labs
On Wed 21-02-24 13:30:51, Carlos Galo wrote:
> On Tue, Feb 20, 2024 at 11:55 PM Michal Hocko wrote:
> >
> > Hi,
> > sorry I have missed this before.
> >
> > On Thu 11-01-24 21:05:30, Carlos Galo wrote:
> > > The current implementation of the mark_victim
pid_nr(victim), victim->comm, K(mm->total_vm),
K(get_mm_counter(mm, MM_ANONPAGES)),
K(get_mm_counter(mm, MM_FILEPAGES)),
K(get_mm_counter(mm, MM_SHMEMPAGES)),
from_kuid(&init_user_ns, task_uid(victim)),
mm_pgtabl
Why an add-hoc dynamic tracepoints or BPF for a very
special situation is not sufficient?
In other words, tell us more about the usecases and why is this
generally useful.
Thanks!
--
Michal Hocko
SUSE Labs
s actually the interesting case at all.
--
Michal Hocko
SUSE Labs
On Tue 20-04-21 15:57:08, Michal Hocko wrote:
[...]
> Usual memory consumption is usually something like LRU pages + Slab
> memory + kernel stack + vmalloc used + pcp.
>
> > But I know that KernelStack is allocated through vmalloc these days,
> > and I don't know wh
Similarly, is Mlocked a subset of Unevictable?
>
> There is some attempt at explaining how these numbers fit together, but
> it's outdated, and doesn't include Mlocked, Unevictable or KernelStack
Agreed there is a lot of tribal knowledge or even misconceptions flying
around and it will take much more work to put everything into shape.
This is only one tiny step forward.
--
Michal Hocko
SUSE Labs
usage.
> >
> > Signed-off-by: Mike Rapoport
>
> Ooops, forgot to add Michal's Ack, sorry.
Let's make it more explicit
Acked-by: Michal Hocko
Thanks!
--
Michal Hocko
SUSE Labs
On Tue 20-04-21 09:25:51, peter.enderb...@sony.com wrote:
> On 4/20/21 11:12 AM, Michal Hocko wrote:
> > On Tue 20-04-21 09:02:57, peter.enderb...@sony.com wrote:
> >>>> But that isn't really system memory at all, it's just allocated device
> >>>>
On Fri 16-04-21 13:24:10, Oscar Salvador wrote:
> Enable x86_64 platform to use the MHP_MEMMAP_ON_MEMORY feature.
>
> Signed-off-by: Oscar Salvador
> Reviewed-by: David Hildenbrand
Acked-by: Michal Hocko
> ---
> arch/x86/Kconfig | 3 +++
> 1 file changed, 3 insertions
return -EINVAL;
> + }
> +
> + /*
> + * Let remove_pmd_table->free_hugepage_table do the
> + * right thing if we used vmem_altmap when hot-adding
> + * the range.
> + */
> + mhp_altmap.alloc = nr_vmemmap_pages;
> + altmap = &mhp_altmap;
> + }
> + }
> +
> /* remove memmap entry */
> firmware_map_remove(start, start + size, "System RAM");
I have to say I still dislike this and I would just wrap it inside out
and do the operation from within walk_memory_blocks but I will not
insist.
--
Michal Hocko
SUSE Labs
to modify the number of present pages.
>
> Signed-off-by: David Hildenbrand
> Signed-off-by: Oscar Salvador
> Reviewed-by: Oscar Salvador
Not sure self review counts ;)
Acked-by: Michal Hocko
Btw. I strongly suspect the resize lock is quite pointless here.
Something for a follow up p
hich is a special
case we want to allow."
> Signed-off-by: Oscar Salvador
> Reviewed-by: David Hildenbrand
With the changelog extended and the comment clarification (se below)
feel free to add
Acked-by: Michal Hocko
> ---
> mm/memory_hotplug.c | 18 ++
> 1 f
viewed-by: David Hildenbrand
Acked-by: Michal Hocko
> ---
> drivers/base/memory.c | 33 +
> 1 file changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index f35298425575..f209925a5d4e
Because a single counter without a wider context cannot be put into any
reasonable context. There is no notion of the total amount of device
memory usable for dma-buf. As Christian explained some of it can be RAM
based. So a single number is rather pointless on its own in many cases.
Or let me just ask
to the right
direction.
Acked-by: Michal Hocko
one nit below
> ---
> Documentation/filesystems/proc.rst | 11 +--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/filesystems/proc.rst
> b/Documentation/filesystems/proc.rst
> index
t is now replaced with dma-buf. ION had some overview metrics that was
> similar.
The discussion around the previous version is still not over and as it
seems your proposed approach is not really viable. So please do not send
new versions until that is sorted out.
Thanks!
--
Michal Hocko
SUSE Labs
On Tue 20-04-21 10:00:07, Christian König wrote:
> Am 20.04.21 um 09:46 schrieb Michal Hocko:
> > On Tue 20-04-21 09:32:14, Christian König wrote:
> > > Am 20.04.21 um 09:04 schrieb Michal Hocko:
> > > > On Mon 19-04-21 18:37:13, Christian König wrote:
> > >
n file makes sense to me as well. If the code is
not conditional (e.g. like swap accounting and some others) then moving
it would make memecontrol.c easier to navigate through.
--
Michal Hocko
SUSE Labs
On Tue 20-04-21 10:20:43, Mike Rapoport wrote:
> On Tue, Apr 20, 2021 at 09:04:51AM +0200, Michal Hocko wrote:
> > On Mon 19-04-21 18:37:13, Christian König wrote:
> > > Am 19.04.21 um 18:11 schrieb Michal Hocko:
> > [...]
> > > > The question is not whethe
On Tue 20-04-21 09:32:14, Christian König wrote:
> Am 20.04.21 um 09:04 schrieb Michal Hocko:
> > On Mon 19-04-21 18:37:13, Christian König wrote:
> > > Am 19.04.21 um 18:11 schrieb Michal Hocko:
[...]
> > What I am trying to bring up with NUMA side is that the same probl
On Mon 19-04-21 18:37:13, Christian König wrote:
> Am 19.04.21 um 18:11 schrieb Michal Hocko:
[...]
> > The question is not whether it is NUMA aware but whether it is useful to
> > know per-numa data for the purpose the counter is supposed to serve.
>
> No, not at all. The
onitor arbitrary metrics and if that can be done without any
> allocations.
A kernel module or eBPF to implement oom decisions has already been
discussed few years back. But I am afraid this would be hard to wire in
for anything except for the victim selection. I am not sure it is
maintainable to also control when the OOM handling should trigger.
--
Michal Hocko
SUSE Labs
On Mon 19-04-21 17:44:13, Christian König wrote:
> Am 19.04.21 um 17:19 schrieb peter.enderb...@sony.com:
> > On 4/19/21 5:00 PM, Michal Hocko wrote:
> > > On Mon 19-04-21 12:41:58, peter.enderb...@sony.com wrote:
> > > > On 4/19/21 2:16 PM, Michal Hocko wrote:
>
On Mon 19-04-21 12:41:58, peter.enderb...@sony.com wrote:
> On 4/19/21 2:16 PM, Michal Hocko wrote:
> > On Sat 17-04-21 12:40:32, Peter Enderborg wrote:
> >> This adds a total used dma-buf memory. Details
> >> can be found in debugfs, however it is not for everyone
&g
explanation and secondly is this information useful for OOM situations
analysis? If yes then show_mem should dump the value as well.
>From the implementation point of view, is there any reason why this
hasn't used the existing global_node_page_state infrastructure?
--
Michal Hocko
SUSE Labs
On Fri 16-04-21 07:26:43, Dave Hansen wrote:
> On 4/16/21 5:35 AM, Michal Hocko wrote:
> > I have to confess that I haven't grasped the initialization
> > completely. There is a nice comment explaining a 2 socket system with
> > 3 different NUMA nodes attached
There are some more details
but they do not seem that important.
I am still trying to digest the whole thing but at least jamming
node_reclaim logic into kswapd seems strange to me. Need to think more
about that though.
Btw. do you have any numbers from running this with some real work
workload?
-
On Thu 15-04-21 15:31:46, Tim Chen wrote:
>
>
> On 4/9/21 12:24 AM, Michal Hocko wrote:
> > On Thu 08-04-21 13:29:08, Shakeel Butt wrote:
> >> On Thu, Apr 8, 2021 at 11:01 AM Yang Shi wrote:
> > [...]
> >>> The low priority jobs should be able to be res
On Fri 16-04-21 13:14:04, Muchun Song wrote:
> lruvec_holds_page_lru_lock() doesn't check anything about locking and is
> used to check whether the page belongs to the lruvec. So rename it to
> page_matches_lruvec().
>
> Signed-off-by: Muchun Song
Acked-by: M
enbrand
> Acked-by: Mike Kravetz
Acked-by: Michal Hocko
> ---
> mm/page_alloc.c | 6 --
> 1 file changed, 6 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b5a94de3cdde..c5338e912ace 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @
from 6c0371490140
("hugetlb: convert PageHugeFreed to HPageFreed flag"). Previously the
explicit clearing was necessary because compound allocations do not get
this initialization (see prep_compound_page).
> Signed-off-by: Oscar Salvador
with that
Acked-by: Michal Hocko
> ---
> mm/hug
On Thu 15-04-21 11:13:16, Muchun Song wrote:
> On Wed, Apr 14, 2021 at 6:15 PM Michal Hocko wrote:
> >
> > On Wed 14-04-21 18:04:35, Muchun Song wrote:
> > > On Wed, Apr 14, 2021 at 5:24 PM Michal Hocko wrote:
> > > >
> > > > On Tue 13-04-21 14:51:
On Wed 14-04-21 13:49:56, Johannes Weiner wrote:
> On Wed, Apr 14, 2021 at 06:00:42PM +0800, Muchun Song wrote:
> > On Wed, Apr 14, 2021 at 5:44 PM Michal Hocko wrote:
> > >
> > > On Tue 13-04-21 14:51:50, Muchun Song wrote:
> > > > We already have a help
> fs/super.c | 27 +++
> include/linux/fs_context.h | 2 ++
> mm/shmem.c | 1 +
> 5 files changed, 24 insertions(+), 8 deletions(-)
[...]
--
Michal Hocko
SUSE Labs
/* fallback to all nodes */
nodemask = NULL;
}
page = alloc_surplus_huge_page(h, gfp_mask, nodemask);
got_page:
> mpol_cond_put(mpol);
You can have a dedicated gfp mask here if you prefer of course but I
calling out MPOL_PREFERRED_MANY explicitly will make the code easier to
read.
> return page;
--
Michal Hocko
SUSE Labs
ow. And alloc_pages_policy doesn't really help I have to
say. I would have expected that a dedicated alloc_pages_preferred and a
general fallback to __alloc_pages_nodemask would have been much easier
to follow.
--
Michal Hocko
SUSE Labs
PREFERRED_MANY's semantic is more like MPOL_PREFERRED
> that it will first try the preferred node/nodes, and fallback to all
> other nodes when first try fails. Thanks to Michal Hocko for suggestions
> on this.
>
> For now, only interleaved policy will be used so there should be no
>
; reuses BIND.
No, this is a bug step back. I think we really want to treat this as
PREFERRED. It doesn't have much to do with the BIND semantic at all.
At this stage there should be 2 things remaining - syscalls plumbing and
2 pass allocation request (optimistic preferred nodes restricted a
sier to grep for preferred_nodes than nodes.
--
Michal Hocko
SUSE Labs
ixed typos in commit message. (Ben)
> Merged bits from other patches. (Ben)
> annotate mpol_rebind_preferred_many as unused (Ben)
I am giving up on the rebinding code for now until we clarify that in my
earlier email.
--
Michal Hocko
SUSE Labs
gt; + if (nodes_empty(*nodes))
> + return -EINVAL;
> +
> + tmp = nodemask_of_node(first_node(*nodes));
> + return mpol_new_preferred_many(pol, &tmp);
> + }
> +
> + return mpol_new_preferred_many(pol, NULL);
> +}
> +
> static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes)
> {
> if (nodes_empty(*nodes))
> --
> 2.7.4
--
Michal Hocko
SUSE Labs
plumbing everything in it should really be as simple as node_isset
check.
> default:
> BUG();
Besides that, this should really go!
> @@ -3035,6 +3066,9 @@ void mpol_to_str(char *buffer, int maxlen, struct
> mempolicy *pol)
> switch (mode) {
> case MPOL_DEFAULT:
> break;
> + case MPOL_PREFERRED_MANY:
> + WARN_ON(flags & MPOL_F_LOCAL);
Why WARN_ON here?
> + fallthrough;
> case MPOL_PREFERRED:
> if (flags & MPOL_F_LOCAL)
> mode = MPOL_LOCAL;
> --
> 2.7.4
--
Michal Hocko
SUSE Labs
pol->v.preferred_nodes = tmp;
> pol->w.cpuset_mems_allowed = *nodes;
> }
I have to say that I really disliked the original code (becasuse it
fiddles with user provided input behind the back) I got lost here
completely. What the heck is going on?
a) why do we even care remaping a hint which is overriden by the cpuset
at the page allocator level and b) why do we need to allocate _two_
potentially large temporary bitmaps for that here?
I haven't spotted anything unexpected in the rest.
--
Michal Hocko
SUSE Labs
gt; Dave Hansen (4):
> mm/mempolicy: convert single preferred_node to full nodemask
> mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes
> mm/mempolicy: allow preferred code to take a nodemask
> mm/mempolicy: refactor rebind code for PREFERRED_MANY
>
> Feng Tang (1):
> mem/mempolicy: unify mpol_new_preferred() and
> mpol_new_preferred_many()
>
> .../admin-guide/mm/numa_memory_policy.rst | 22 +-
> include/linux/mempolicy.h | 6 +-
> include/uapi/linux/mempolicy.h | 6 +-
> mm/hugetlb.c | 26 +-
> mm/mempolicy.c | 272
> ++---
> 5 files changed, 225 insertions(+), 107 deletions(-)
>
> --
> 2.7.4
--
Michal Hocko
SUSE Labs
On Wed 14-04-21 12:49:53, Oscar Salvador wrote:
> On Wed, Apr 14, 2021 at 12:32:58PM +0200, Michal Hocko wrote:
[...]
> > > I checked, and when we get there in __alloc_bootmem_huge_page,
> > > page->private is
> > > still zeroed, so I guess it should be safe to as
On Wed 14-04-21 12:01:47, Oscar Salvador wrote:
> On Wed, Apr 14, 2021 at 10:28:33AM +0200, Michal Hocko wrote:
> > You are right it doesn't do it there. But all struct pages, even those
> > that are allocated by the bootmem allocator should initialize its struct
> > pag
On Wed 14-04-21 18:04:35, Muchun Song wrote:
> On Wed, Apr 14, 2021 at 5:24 PM Michal Hocko wrote:
> >
> > On Tue 13-04-21 14:51:48, Muchun Song wrote:
> > > When mm is NULL, we do not need to hold rcu lock and call css_tryget for
> > > the root memcg. And we
Weiner
> Acked-by: Roman Gushchin
> Reviewed-by: Shakeel Butt
Acked-by: Michal Hocko
> ---
> mm/vmscan.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 64bf07cc20f2..e40b21298d77 100644
nd
> CONFIG_MEMCG.
Neat. While you are at it wouldn't it make sesne to rename the function
as well. I do not want to bikeshed but this is really a misnomer. it
doesn't check anything about locking. page_belongs_lruvec?
> Signed-off-by: Muchun Song
> Acked-by: Johannes Weiner
his case it even doesn't
give any advantage for most callers.
Acked-by: Michal Hocko
> ---
> include/linux/memcontrol.h | 10 +-
> mm/compaction.c| 2 +-
> mm/memcontrol.c| 9 +++--
> mm/swap.c | 2 +-
> mm/working
dereference(mm->owner));
> - if (unlikely(!memcg))
> - memcg = root_mem_cgroup;
> - }
> } while (!css_tryget(&memcg->css));
> rcu_read_unlock();
> return memcg;
> --
> 2.11.0
--
Michal Hocko
SUSE Labs
a WARN_ON_ONCE in the page_counter_cancel(). Who knows if it
> will trigger? So it is better to fix it.
>
> Signed-off-by: Muchun Song
> Acked-by: Johannes Weiner
> Reviewed-by: Shakeel Butt
Acked-by: Michal Hocko
> ---
> mm/memcontrol.c | 8 +---
> 1 file chan
On Wed 14-04-21 09:41:32, Oscar Salvador wrote:
> On Wed, Apr 14, 2021 at 08:04:21AM +0200, Michal Hocko wrote:
> > On Tue 13-04-21 14:19:03, Mike Kravetz wrote:
> > > On 4/13/21 6:23 AM, Michal Hocko wrote:
> > > The only place where page->private may no
On Tue 13-04-21 14:19:03, Mike Kravetz wrote:
> On 4/13/21 6:23 AM, Michal Hocko wrote:
> > On Tue 13-04-21 12:47:43, Oscar Salvador wrote:
[...]
> > Or do we need it for giga pages which are not allocated by the page
> > allocator? If yes then moving it to prep_compound_g
>
> In the case above we retry as the window race is quite small and we have high
> chances to succeed next time.
>
> With regard to the allocation, we restrict it to the node the page belongs
> to with __GFP_THISNODE, meaning we do not fa
h->nr_huge_pages_node[nid]++;
> + __prep_account_new_huge_page(h, nid);
> spin_unlock_irq(&hugetlb_lock);
> }
Any reason to decouple the locking from the accounting?
>
> --
> 2.16.3
--
Michal Hocko
SUSE Labs
On Tue 13-04-21 15:24:32, Michal Hocko wrote:
> On Tue 13-04-21 12:47:44, Oscar Salvador wrote:
> [...]
> > +static void prep_new_huge_page(struct hstate *h, struct page *page, int
> > nid)
> > +{
> > + __prep_new_huge_page(page);
> > spin_lock_irq(&
eFreed(page);
> spin_lock_irq(&hugetlb_lock);
> h->nr_huge_pages++;
> h->nr_huge_pages_node[nid]++;
> - ClearHPageFreed(page);
> spin_unlock_irq(&hugetlb_lock);
> }
>
> --
> 2.16.3
--
Michal Hocko
SUSE Labs
Not sure this is worth it TBH. Even an idea of any
pcp access synchronization with memory hotplug makes for a decent headache.
--
Michal Hocko
SUSE Labs
On Fri 09-04-21 16:26:53, Tim Chen wrote:
>
> On 4/8/21 4:52 AM, Michal Hocko wrote:
>
> >> The top tier memory used is reported in
> >>
> >> memory.toptier_usage_in_bytes
> >>
> >> The amount of top tier memory usable by each cgroup wit
islike kmem and LRU pages to be handled
differently so for that reason
Nacked-by: Michal Hocko
If the optimization really can be provent then the patch would require
to be much more invasive.
> Signed-off-by: Chen Xiaoguang
> Signed-off-by: Chen He
> ---
> incl
ry_this_zone:, then
> gets stalled/scheduled out while hotremove rebuilds the zonelist and destroys
> the pcplists, then the first task is resumed and proceeds with
> rmqueue_pcplist().
>
> So that's very rare thus not urgent, and this patch doesn't make it less rare
> so
> not a reason to block it.
Completely agreed here. Not an urgent thing to work on but something to
look into long term.
--
Michal Hocko
SUSE Labs
h that an existing
race was likely never observed.
Acked-by: Michal Hocko
Thanks!
> Signed-off-by: Mel Gorman
> ---
> Resending for email address correction and adding lists
>
> Changelog since v1
> o Minimal fix
>
> mm/page_alloc.c | 4
> 1 file changed, 4 delet
OK. Let's do that for now and I will put a follow up on my todo list.
Thanks!
--
Michal Hocko
SUSE Labs
On Fri 09-04-21 14:42:21, Mel Gorman wrote:
> On Fri, Apr 09, 2021 at 02:48:12PM +0200, Michal Hocko wrote:
> > On Fri 09-04-21 14:42:58, Michal Hocko wrote:
> > > On Fri 09-04-21 13:09:57, Mel Gorman wrote:
> > > > zone_pcp_reset allegedly protects against a race
On Fri 09-04-21 14:42:58, Michal Hocko wrote:
> On Fri 09-04-21 13:09:57, Mel Gorman wrote:
> > zone_pcp_reset allegedly protects against a race with drain_pages
> > using local_irq_save but this is bogus. local_irq_save only operates
> > on the local CPU. If memory hotplug is
reset pcp of an empty
zone at all? The whole point of this exercise seems to be described in
340175b7d14d5. setup_zone_pageset can check for an already allocated pcp
and simply reinitialize it.
--
Michal Hocko
SUSE Labs
o any
memcg.
The behavior of those limits would be quite tricky for OOM situations
as well due to a lack of NUMA aware oom killer.
--
Michal Hocko
SUSE Labs
dering that the system is already botched and counters
cannot be trusted this is definitely better than a potentially
completely unusable memcg. It would be nice to mention that in the above
paragraph as a caveat.
> Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
> ---
> mm/pa
On Wed 07-04-21 15:33:26, Tim Chen wrote:
>
>
> On 4/6/21 2:08 AM, Michal Hocko wrote:
> > On Mon 05-04-21 10:08:24, Tim Chen wrote:
> > [...]
> >> To make fine grain cgroup based management of the precious top tier
> >> DRAM memory possible, this p
On Wed 07-04-21 19:13:42, Bharata B Rao wrote:
> On Wed, Apr 07, 2021 at 01:54:48PM +0200, Michal Hocko wrote:
> > On Mon 05-04-21 11:18:48, Bharata B Rao wrote:
> > > Hi,
> > >
> > > When running 1 (more-or-less-empty-)containers on a bare-metal Power9
global memory reclaim iterating over
10k memcgs will likely be very visible. I do remember playing with
similar setups few years back and the overhead was very high.
--
Michal Hocko
SUSE Labs
> changing spin_*lock calls to spin_*lock_irq* calls.
> > - Make subpool lock irq safe in a similar manner.
> > - Revert the !in_task check and workqueue handoff.
> >
> > [1]
> > https://lore.kernel.org/linux-mm/f1c03b05bc43a...@google.com/
> >
or_each_entry_safe(page, next, &page_list, lru) {
> > + update_and_free_page(h, page);
> > + cond_resched();
> > + }
> > + spin_lock(&hugetlb_lock);
>
> Can we get here with an empty list?
An emoty page_list? If yes then sure, this can happen but
list_for_each_entry_safe will simply not iterate. Or what do you mean?
--
Michal Hocko
SUSE Labs
On Tue 06-04-21 09:49:13, Mike Kravetz wrote:
> On 4/6/21 2:56 AM, Michal Hocko wrote:
> > On Mon 05-04-21 16:00:39, Mike Kravetz wrote:
[...]
> >> @@ -2298,6 +2312,7 @@ static int alloc_and_dissolve_huge_page(struct
> >> hstat
On Tue 06-04-21 23:12:34, Neil Sun wrote:
>
>
> On 2021/4/6 22:39, Michal Hocko wrote:
> >
> > Have you considered using high limit for the pro-active memory reclaim?
>
> Thanks, Michal, do you mean the procfs interfaces?
> We have set vm.vfs_cache_pressure=1000
On Tue 06-04-21 22:34:02, Neil Sun wrote:
>
>
> On 2021/4/6 19:39, Michal Hocko wrote:
> > On Tue 06-04-21 19:30:22, Neil Sun wrote:
> > > On 2021/4/6 15:21, Michal Hocko wrote:
> > > >
> > > > You are changing semantic of the existing user i
On Tue 06-04-21 19:30:22, Neil Sun wrote:
> On 2021/4/6 15:21, Michal Hocko wrote:
> >
> > You are changing semantic of the existing user interface. This knob has
> > never been memcg aware and it is supposed to have a global impact. I do
> > not think we can simply cha
return;
> + goto out;
> if (PageHighMem(page))
> continue;
> remove_hugetlb_page(h, page, false);
> - update_and_free_page(h, page);
> + list_add(&page->lru, &page_list);
> }
> }
> +
> +out:
> + spin_unlock(&hugetlb_lock);
> + list_for_each_entry_safe(page, next, &page_list, lru) {
> + update_and_free_page(h, page);
> + cond_resched();
> + }
> + spin_lock(&hugetlb_lock);
> }
> #else
> static inline void try_to_free_low(struct hstate *h, unsigned long count,
> --
> 2.30.2
>
--
Michal Hocko
SUSE Labs
false);
> update_and_free_page(h, new_page);
> spin_unlock(&hugetlb_lock);
> if (!isolate_huge_page(old_page, list))
the page is not enqued anywhere here so remove_hugetlb_page would blow
when linked list debugging is enabled.
--
Michal Hocko
SUSE Labs
gt;
> Signed-off-by: Mike Kravetz
I belive I have acked the previous version already. Anyway
Acked-by: Michal Hocko
> ---
> mm/cma.c | 18 +-
> mm/cma.h | 2 +-
> mm/cma_debug.c | 8
> 3 files changed, 14 insertions(+), 14 deletions(-)
&g
would be rather alien concept to the existing memcg
infrastructure IMO. It looks like it is fusing borders between memcg and
cputset controllers.
You also seem to be basing the interface on the very specific usecase.
Can we expect that there will be many different tiers requiring their
own balancing?
--
Michal Hocko
SUSE Labs
memcg = mem_cgroup_iter(NULL, NULL, NULL);
> + memcg = mem_cgroup_from_task(current);
> do {
> freed += shrink_slab(GFP_KERNEL, nid, memcg, 0);
> } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
> --
> 2.7.4
--
Michal Hocko
SUSE Labs
On Thu 01-04-21 21:59:13, Muchun Song wrote:
> On Thu, Apr 1, 2021 at 6:26 PM Michal Hocko wrote:
[...]
> > Even if the css ref count is not really necessary it shouldn't cause any
> > harm and it makes the code easier to understand. At least a comment
> > explaining
into it's own patch. With more explanation why NOIO is
required.
> Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction
> and use it for stat updates")
> Signed-off-by: Muchun Song
For the css part feel free to add
Acked-by: Michal Hocko
Even if the cs
q_start = sysfs_kf_seq_start,
> .seq_show = sysfs_kf_seq_show,
> };
>
> static const struct kernfs_ops sysfs_file_kfops_wo = {
> + .seq_start = sysfs_kf_seq_start,
> .write = sysfs_kf_write,
> };
>
> static const struct kernfs_ops sysfs_file_kfops_rw = {
> + .seq_start = sysfs_kf_seq_start,
> .seq_show = sysfs_kf_seq_show,
> .write = sysfs_kf_write,
> };
> --
> 2.25.1
--
Michal Hocko
SUSE Labs
ros to confirm that they don't enable it.
> > >
> > > I can confirm that it's certainly not enabled on any of the machines I
> > > have, but..
> >
> > Debian has CONFIG_DEVKMEM disabled since 2.6.31.
>
> SLES, too. (but no idea since when exactly)
15-SP2 IIRC
--
Michal Hocko
SUSE Labs
On Mon 22-03-21 14:49:35, Michal Hocko wrote:
> On Mon 22-03-21 15:00:37, Mike Rapoport wrote:
> > On Mon, Mar 22, 2021 at 11:14:37AM +0100, Michal Hocko wrote:
> > > Le'ts Andrea and Mike
> > >
> > > On Fri 19-03-21 22:24:28, Bui Quang Minh wrot
n.
>
> Signed-off-by: Mike Kravetz
Acked-by: Michal Hocko
> ---
> mm/cma.c | 18 +-
> mm/cma.h | 2 +-
> mm/cma_debug.c | 8
> 3 files changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/mm/cma.c b/mm/cma.c
> index b2
EMCG so the below one
is not needed though. It would be great if the changelog mentioned that
so that.
> Signed-off-by: Wan Jiabing
Acked-by: Michal Hocko
> ---
> include/linux/memcontrol.h | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/include/linux/memcontrol.h
On Tue 30-03-21 16:08:36, Muchun Song wrote:
> On Tue, Mar 30, 2021 at 4:01 PM Michal Hocko wrote:
> >
> > On Mon 29-03-21 16:23:55, Mike Kravetz wrote:
> > > Ideally, cma_release could be called from any context. However, that is
> > > not possible because a
accounting effectively
> reverting the commit.
>
> Signed-off-by: Mike Kravetz
Please drop INIT_LIST_HEAD which seems to be a left over from rebasing
to use LIST_HEAD.
Acked-by: Michal Hocko
> ---
> mm/hugetlb.c | 95 +---
>
On Mon 29-03-21 16:23:56, Mike Kravetz wrote:
> Now that cma_release is non-blocking and irq safe, there is no need to
> drop hugetlb_lock before calling.
>
> Signed-off-by: Mike Kravetz
Acked-by: Michal Hocko
> ---
> mm/hugetlb.c | 6 --
> 1 file changed, 6 deletio
1 - 100 of 4886 matches
Mail list logo