Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2024-01-12 Thread Michal Hocko
On Thu 30-11-23 20:04:59, Baoquan He wrote: > On 11/30/23 at 11:16am, Michal Hocko wrote: > > On Thu 30-11-23 11:00:48, Baoquan He wrote: > > [...] > > > Now, we are worried if there's risk if the CMA area is retaken into kdump > > > kernel as system RAM. E.g

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2024-01-12 Thread Michal Hocko
ot this might have negative impact on kernel allocations - userspace memory dumping in the crash kernel is fundamentally incomplete. Just my 2c -- Michal Hocko SUSE Labs ___ kexec mailing list kexec@lists.

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2024-01-12 Thread Michal Hocko
On Fri 08-12-23 09:55:39, Baoquan He wrote: > On 12/07/23 at 12:52pm, Michal Hocko wrote: > > On Thu 07-12-23 12:13:14, Philipp Rudo wrote: [...] > > > Thing is that users don't only want to reduce the memory usage but also > > > the downtime of kdump. In the end I'm

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2024-01-12 Thread Michal Hocko
rongly believe this is something that needs addressing because crash dumps are very often the only tool to investigate complex issues. -- Michal Hocko SUSE Labs ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2024-01-12 Thread Michal Hocko
On Wed 06-12-23 14:49:51, Michal Hocko wrote: > On Wed 06-12-23 12:08:05, Philipp Rudo wrote: [...] > > If I understand Documentation/core-api/pin_user_pages.rst correctly you > > missed case 1 Direct IO. In that case "short term" DMA is allowed for > > pages

Re: [RFC 0/3] kdump: Check mem_map of CMA area in kdump

2023-12-18 Thread Michal Hocko
The value is then checked when the page is allocated. > Please share your thoughts. Having a sanity check on exported cma pages makes some sense to me. The exact check might be more involved with false positives but they shouldn't be a major problem unless there are too many of them. --

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-07 Thread Michal Hocko
On Thu 07-12-23 12:13:14, Philipp Rudo wrote: > On Thu, 7 Dec 2023 09:55:20 +0100 > Michal Hocko wrote: > > > On Thu 07-12-23 12:23:13, Baoquan He wrote: > > [...] > > > We can't guarantee how swift the DMA transfer could be in the cma, case, > > > it

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-06 Thread Michal Hocko
On Wed 06-12-23 12:08:05, Philipp Rudo wrote: > On Fri, 1 Dec 2023 17:59:02 +0100 > Michal Hocko wrote: > > > On Fri 01-12-23 16:51:13, Philipp Rudo wrote: > > > On Fri, 1 Dec 2023 12:55:52 +0100 > > > Michal Hocko wrote: > > > > >

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-01 Thread Michal Hocko
On Fri 01-12-23 16:51:13, Philipp Rudo wrote: > On Fri, 1 Dec 2023 12:55:52 +0100 > Michal Hocko wrote: > > > On Fri 01-12-23 12:33:53, Philipp Rudo wrote: > > [...] > > > And yes, those are all what-if concerns but unfortunately that is all > > > we have r

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-01 Thread Michal Hocko
I follow you here. Are you suggesting once crashkernel=cma is added it would become a user api and therefore impossible to get rid of? -- Michal Hocko SUSE Labs ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-12-01 Thread Michal Hocko
appreciate your carefulness! But I do not really see how such a detection would work and be maintained over time. What exactly is the scope of such a tooling? Should it be limited to RDMA drivers? Should we protect from stray writes in general? Also to make it clear. Are you going to nak the proposed solutio

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-30 Thread Michal Hocko
ose the new method as a default. Only time can tell how safe this really is. It is hard to protect agains theoretical issues though. Bugs should be fixed. I believe this option would allow to configure kdump much easier and less fragile. > My personal opinion, thanks for sharing your thought. T

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-30 Thread Michal Hocko
On Thu 30-11-23 21:33:04, Pingfan Liu wrote: > On Thu, Nov 30, 2023 at 9:29 PM Michal Hocko wrote: > > > > On Thu 30-11-23 20:04:59, Baoquan He wrote: > > > On 11/30/23 at 11:16am, Michal Hocko wrote: > > > > On Thu 30-11-23 11:00:48, Baoquan He wrote: > &g

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-29 Thread Michal Hocko
n to consume much > more memory than before. We have CI testing cases to watch this. We ever > found one NIC even eat up GB level memory, then this need be > investigated and fixed. How do you simulate all different HW configuration setups that are using out there in the wild? -- M

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-28 Thread Michal Hocko
ff. So this is not an easy to maintain solution. CMA backed crash memory can be much more generous while still usable. -- Michal Hocko SUSE Labs ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

2023-11-28 Thread Michal Hocko
her this approach > is worthwhile, considering the trade-off between benefits and > complexity. No, a zone is definitely not an answer to that because because a) userspace would need to be able to use that memory and userspace might pin memory for direct IO and others. So in the end longterm pinning would need to be used anyway. -- Michal Hocko SUSE Labs ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 08:57:48, Suren Baghdasaryan wrote: > On Wed, Jan 25, 2023 at 1:38 AM 'Michal Hocko' via kernel-team > wrote: > > > > On Wed 25-01-23 00:38:49, Suren Baghdasaryan wrote: > > > Replace indirect modifications to vma->vm_flags with calls to modif

Re: [PATCH v2 5/6] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

2023-01-25 Thread Michal Hocko
tistics and freeing VMAs */ > mas_set(_detach, start); > remove_mt(mm, _detach); > @@ -2704,7 +2708,7 @@ unsigned long mmap_region(struct file *file, unsigned > long addr, > > /* Undo any partial mapping done by a device driver. */ >

Re: [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-01-25 Thread Michal Hocko
; operations. Introduce modifier functions for vm_flags to be used whenever > flags are updated. This way we can better check and control correct > locking behavior during these updates. > > Signed-off-by: Suren Baghdasaryan Acked-by: Michal Hocko > --- >

Re: [PATCH v2 4/6] mm: replace vma->vm_flags indirect modification in ksm_madvise

2023-01-25 Thread Michal Hocko
cation attempts. Those BUG_ONs scream to much IMHO. KSM is an MM internal code so I gueess we should be willing to trust it. > Signed-off-by: Suren Baghdasaryan Acked-by: Michal Hocko -- Michal Hocko SUSE Labs ___ kexec mailing list kexec@lists.

Re: [PATCH v2 6/6] mm: export dump_mm()

2023-01-25 Thread Michal Hocko
; > Signed-off-by: Suren Baghdasaryan Acked-by: Michal Hocko > --- > mm/debug.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/debug.c b/mm/debug.c > index 9d3d893dc7f4..96d594e16292 100644 > --- a/mm/debug.c > +++ b/mm/debug.c > @@ -215,6 +215

Re: [PATCH v2 2/6] mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK

2023-01-25 Thread Michal Hocko
On Wed 25-01-23 00:38:47, Suren Baghdasaryan wrote: > To simplify the usage of VM_LOCKED_CLEAR_MASK in clear_vm_flags(), > replace it with VM_LOCKED_MASK bitmask and convert all users. > > Signed-off-by: Suren Baghdasaryan Acked-by: Michal Hocko > --- > include/linux/mm.h

Re: [PATCH v2 3/6] mm: replace vma->vm_flags direct modifications with modifier calls

2023-01-25 Thread Michal Hocko
sors which would also prevent any future direct setting of those flags in uncontrolled way as well. Anyway Acked-by: Michal Hocko -- Michal Hocko SUSE Labs ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH 1/2] mm/memcontrol: Fix OOPS inside mem_cgroup_get_nr_swap_pages()

2020-07-03 Thread Michal Hocko
[Cc Andrew - the patch is http://lkml.kernel.org/r/1593641660-13254-2-git-send-email-bhsha...@redhat.com] On Thu 02-07-20 08:00:27, Michal Hocko wrote: > On Thu 02-07-20 03:44:19, Bhupesh Sharma wrote: > > Prabhakar reported an OOPS inside mem_cgroup_get_nr_swap_pages() > > funct

Re: [PATCH 1/2] mm/memcontrol: Fix OOPS inside mem_cgroup_get_nr_swap_pages()

2020-07-02 Thread Michal Hocko
e: aa1403e3 91106000 97f82a27 1411 (f940c663) > [0.507770] ---[ end trace 9795948475817de4 ]--- > [0.512429] Kernel panic - not syncing: Fatal exception > [0.517705] Rebooting in 10 seconds.. > > Cc: Johannes Weiner > Cc: Michal Hocko > Cc: V

Re: [PATCHv2] mm/sparse: reset section's mem_map when fully deactivated

2020-01-23 Thread Michal Hocko
On Thu 23-01-20 19:10:47, Andrew Morton wrote: > On Mon, 20 Jan 2020 08:29:39 +0100 Michal Hocko wrote: > > > On Mon 20-01-20 10:33:14, Pingfan Liu wrote: > > > After commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug"), > > > when a mem secti

Re: [PATCHv2] mm/sparse: reset section's mem_map when fully deactivated

2020-01-19 Thread Michal Hocko
ery closely to the kernel and occasional breakage is to be expected I still believe that Fixes: ba72b4c8cf60 is due. > [1]: makedumpfile, commit e73016540293 ("[v1.6.7] Update version") > > Signed-off-by: Pingfan Liu > To: linux...@kvack.org > Cc: Andrew Morton > Cc:

Re: [PATCH] mm/sparse: reset section's mem_map when fully deactivated

2020-01-16 Thread Michal Hocko
On Thu 16-01-20 23:14:02, Dan Williams wrote: > On Thu, Jan 16, 2020 at 10:23 PM Pingfan Liu wrote: > > > > On Thu, Jan 16, 2020 at 3:50 PM Michal Hocko wrote: > > > > > > On Thu 16-01-20 11:01:08, Pingfan Liu wrote: > > > > When fully dea

Re: [PATCH] mm/sparse: reset section's mem_map when fully deactivated

2020-01-15 Thread Michal Hocko
sh, and save vmcore by makedumpfile > > Signed-off-by: Pingfan Liu > To: linux...@kvack.org > Cc: Andrew Morton > Cc: David Hildenbrand > Cc: Dan Williams > Cc: Oscar Salvador > Cc: Michal Hocko > Cc: kexec@lists.infradead.org > Cc: Kazuhito Hagio > --- > mm/sparse.

Re: Crash kernel with 256 MB reserved memory runs into OOM condition

2019-08-12 Thread Michal Hocko
ry is below min watermark (node zone DMA has lowmem protection for GFP_KERNEL allocation). [...] > [4.923156] Out of memory and no killable processes... and there is no task existing to be killed so we go and panic. -- Michal Hocko SUSE Labs ___

Re: [PATCH v1 7/8] PM / Hibernate: use pfn_to_online_page()

2018-11-19 Thread Michal Hocko
ew Morton > Cc: Matthew Wilcox > Cc: Michal Hocko > Cc: "Michael S. Tsirkin" > Suggested-by: Michal Hocko > Signed-off-by: David Hildenbrand I have only a very vague understanding of this specific code but I do not really see any real reason for checking offlined

Re: [PATCH] mm: convert totalram_pages, totalhigh_pages and managed_pages to atomic.

2018-10-25 Thread Michal Hocko
s hard to review manually. -- Michal Hocko SUSE Labs ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec

Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required

2018-07-30 Thread Michal Hocko
On Thu 26-07-18 15:12:42, Michal Hocko wrote: > On Thu 26-07-18 21:09:04, Baoquan He wrote: > > On 07/26/18 at 02:59pm, Michal Hocko wrote: > > > On Wed 25-07-18 14:48:13, Baoquan He wrote: > > > > On 07/23/18 at 04:34pm, Michal Hocko wrote: > > > > &g

Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required

2018-07-30 Thread Michal Hocko
On Thu 26-07-18 21:37:05, Baoquan He wrote: > On 07/26/18 at 03:14pm, Michal Hocko wrote: > > On Thu 26-07-18 15:12:42, Michal Hocko wrote: > > > On Thu 26-07-18 21:09:04, Baoquan He wrote: > > > > On 07/26/18 at 02:59pm, Michal Hocko wrote: > > > > &g

Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required

2018-07-30 Thread Michal Hocko
On Thu 26-07-18 21:09:04, Baoquan He wrote: > On 07/26/18 at 02:59pm, Michal Hocko wrote: > > On Wed 25-07-18 14:48:13, Baoquan He wrote: > > > On 07/23/18 at 04:34pm, Michal Hocko wrote: > > > > On Thu 19-07-18 23:17:53, Baoquan He wrote: > > > > >

Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required

2018-07-30 Thread Michal Hocko
On Wed 25-07-18 14:48:13, Baoquan He wrote: > On 07/23/18 at 04:34pm, Michal Hocko wrote: > > On Thu 19-07-18 23:17:53, Baoquan He wrote: > > > Kexec has been a formal feature in our distro, and customers owning > > > those kind of very large machine can make use of thi

Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required

2018-07-23 Thread Michal Hocko
ot have the full context here but let me note that you should be careful when doing top-down reservation because you can easily get into hotplugable memory and break the hotremove usecase. We even warn when this is done. See memblock_find_in_range_node --

Re: Capturing crash with 4.6.0 and above kernel does not work

2016-08-25 Thread Michal Hocko
:17 dut4110 kdump: kexec: failed to load kdump kernel > Aug 15 10:41:17 dut4110 kdump: failed to start up > > Note, that same option is able to load kdump service for 4.5.7 kernel. > > I can provide any details needed to help resolve this issue. > > Thanks, > -

Re: [PATCH] Revert "mm: rename _count, field of the struct page, to _refcount"

2016-06-16 Thread Michal Hocko
On Thu 16-06-16 13:22:27, Vitaly Kuznetsov wrote: > Michal Hocko <mho...@kernel.org> writes: > > > On Thu 16-06-16 12:30:16, Vitaly Kuznetsov wrote: > >> Christoph Hellwig <h...@infradead.org> writes: > >> > >> > On Thu, Jun 16, 2016 at 11

Re: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-11-24 Thread Michal Hocko
CPU 1: > -- -- > nmi_panic(); > > nmi_panic(); > > nmi_panic(); I thought that nmi_panic is called only from the nmi context. If so how c

Re: [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

2015-11-24 Thread Michal Hocko
ation.org> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Ingo Molnar <mi...@redhat.com> > Cc: "H. Peter Anvin" <h...@zytor.com> > Cc: Peter Zijlstra <pet...@infradead.org> > Cc: Michal Hocko <mho...@kernel.org> I've finally seen testing results f

Re: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

2015-11-24 Thread Michal Hocko
w Morton <a...@linux-foundation.org> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Ingo Molnar <mi...@redhat.com> > Cc: "H. Peter Anvin" <h...@zytor.com> > Cc: Peter Zijlstra <pet...@infradead.org> > Cc: Eric Biederman <ebied...@xmiss

Re: [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

2015-11-24 Thread Michal Hocko
t > > V2: > - Use atomic_cmpxchg() instead of spin_trylock() on panic_lock > to exclude concurrent accesses > - Don't introduce no-lock version of crash_kexec() > > Signed-off-by: Hidehiro Kawai <hidehiro.kawai...@hitachi.com> > Cc: Eric Biederman <ebied...

Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-08-04 Thread Michal Hocko
On Fri 31-07-15 11:23:00, 河合英宏 / KAWAI,HIDEHIRO wrote: From: Michal Hocko [mailto:mho...@kernel.org] [...] I am saying that watchdog_overflow_callback might trigger on more CPUs and panic from NMI context as well. So this is not reduced to the NMI button sends NMI to more CPUs. I

Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread Michal Hocko
On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI,HIDEHIRO wrote: From: Michal Hocko [mailto:mho...@kernel.org] [...] Could you point me to the code which does that, please? Maybe we are missing that in our 3.0 kernel. I was quite surprised to see this behavior as well. Please see the snippet

Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread Michal Hocko
On Thu 30-07-15 01:45:35, 河合英宏 / KAWAI,HIDEHIRO wrote: Hi, From: Michal Hocko [mailto:mho...@kernel.org] On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote: [...] #define nmi_panic(fmt, ...)\ do

Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-30 Thread Michal Hocko
On Thu 30-07-15 07:33:15, 河合英宏 / KAWAI,HIDEHIRO wrote: [...] Are you using SGI UV? On that platform, NMIs may be delivered to all cpus because LVT1 of all cpus are not masked as follows: This is Compute Blade 520XB1 from Hitachi with 240 cpus. -- Michal Hocko SUSE Labs

Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread Michal Hocko
On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote: From: Michal Hocko [mailto:mho...@kernel.org] On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote: Hi, From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai

Re: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-29 Thread Michal Hocko
On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote: Hi, From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai (2015/07/27 23:34), Michal Hocko wrote: On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote: [...] The check could

Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-28 Thread Michal Hocko
or do like this: void nmi_panic(const char *msg) { ... panic(%s, msg); } If there is no objection, I'm going to use a macro. Your other patch needs panic_cpu externally visible so the macro should be OK. -- Michal Hocko SUSE Labs

Re: [V2 PATCH 2/3] kexec: Fix race between panic() and crash_kexec() called directly

2015-07-27 Thread Michal Hocko
) + atomic_set(panicking_cpu, -1); This do the opposite what the comment says, wouldn't it? You should check old_cpu == -1. Also atomic_set doesn't imply memory barriers which might be a problem. -- Michal Hocko SUSE Labs ___ kexec mailing list kexec

Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

2015-07-27 Thread Michal Hocko
if the NMI has preempted an ongoing panic and +* allow it to finish +*/ + if (atomic_read(panic_cpu) == raw_smp_processor_id()) + return; + + panic(); +} +EXPORT_SYMBOL(nmi_panic); struct tnt { u8 bit; -- Michal Hocko SUSE Labs

Re: [PATCH 0/3] x86: Fix panic vs. NMI issues

2015-07-23 Thread Michal Hocko
On Thu 23-07-15 19:11:03, Hidehiro Kawai wrote: Hi, Thanks for the feedback. (2015/07/23 17:25), Michal Hocko wrote: Hi, On Wed 22-07-15 11:14:21, Hidehiro Kawai wrote: When an HA cluster software or administrator detects non-response of a host, they issue an NMI to the host