On Thu 30-11-23 20:04:59, Baoquan He wrote:
> On 11/30/23 at 11:16am, Michal Hocko wrote:
> > On Thu 30-11-23 11:00:48, Baoquan He wrote:
> > [...]
> > > Now, we are worried if there's risk if the CMA area is retaken into kdump
> > > kernel as system RAM. E.g
ot this might have
negative impact on kernel allocations
- userspace memory dumping in the crash kernel is fundamentally
incomplete.
Just my 2c
--
Michal Hocko
SUSE Labs
___
kexec mailing list
kexec@lists.
On Fri 08-12-23 09:55:39, Baoquan He wrote:
> On 12/07/23 at 12:52pm, Michal Hocko wrote:
> > On Thu 07-12-23 12:13:14, Philipp Rudo wrote:
[...]
> > > Thing is that users don't only want to reduce the memory usage but also
> > > the downtime of kdump. In the end I'm
rongly believe this is something
that needs addressing because crash dumps are very often the only tool
to investigate complex issues.
--
Michal Hocko
SUSE Labs
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
On Wed 06-12-23 14:49:51, Michal Hocko wrote:
> On Wed 06-12-23 12:08:05, Philipp Rudo wrote:
[...]
> > If I understand Documentation/core-api/pin_user_pages.rst correctly you
> > missed case 1 Direct IO. In that case "short term" DMA is allowed for
> > pages
The value is then checked when the page is
allocated.
> Please share your thoughts.
Having a sanity check on exported cma pages makes some sense to me. The
exact check might be more involved with false positives but they
shouldn't be a major problem unless there are too many of them.
--
On Thu 07-12-23 12:13:14, Philipp Rudo wrote:
> On Thu, 7 Dec 2023 09:55:20 +0100
> Michal Hocko wrote:
>
> > On Thu 07-12-23 12:23:13, Baoquan He wrote:
> > [...]
> > > We can't guarantee how swift the DMA transfer could be in the cma, case,
> > > it
On Wed 06-12-23 12:08:05, Philipp Rudo wrote:
> On Fri, 1 Dec 2023 17:59:02 +0100
> Michal Hocko wrote:
>
> > On Fri 01-12-23 16:51:13, Philipp Rudo wrote:
> > > On Fri, 1 Dec 2023 12:55:52 +0100
> > > Michal Hocko wrote:
> > >
> >
On Fri 01-12-23 16:51:13, Philipp Rudo wrote:
> On Fri, 1 Dec 2023 12:55:52 +0100
> Michal Hocko wrote:
>
> > On Fri 01-12-23 12:33:53, Philipp Rudo wrote:
> > [...]
> > > And yes, those are all what-if concerns but unfortunately that is all
> > > we have r
I follow you here. Are you suggesting once
crashkernel=cma is added it would become a user api and therefore
impossible to get rid of?
--
Michal Hocko
SUSE Labs
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
appreciate your carefulness! But I do not really see how such a
detection would work and be maintained over time. What exactly is the
scope of such a tooling? Should it be limited to RDMA drivers? Should we
protect from stray writes in general?
Also to make it clear. Are you going to nak the proposed solutio
ose the new method
as a default. Only time can tell how safe this really is. It is hard to
protect agains theoretical issues though. Bugs should be fixed.
I believe this option would allow to configure kdump much easier and
less fragile.
> My personal opinion, thanks for sharing your thought.
T
On Thu 30-11-23 21:33:04, Pingfan Liu wrote:
> On Thu, Nov 30, 2023 at 9:29 PM Michal Hocko wrote:
> >
> > On Thu 30-11-23 20:04:59, Baoquan He wrote:
> > > On 11/30/23 at 11:16am, Michal Hocko wrote:
> > > > On Thu 30-11-23 11:00:48, Baoquan He wrote:
> &g
n to consume much
> more memory than before. We have CI testing cases to watch this. We ever
> found one NIC even eat up GB level memory, then this need be
> investigated and fixed.
How do you simulate all different HW configuration setups that are using
out there in the wild?
--
M
ff. So this is not an easy to maintain solution.
CMA backed crash memory can be much more generous while still usable.
--
Michal Hocko
SUSE Labs
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
her this approach
> is worthwhile, considering the trade-off between benefits and
> complexity.
No, a zone is definitely not an answer to that because because a)
userspace would need to be able to use that memory and userspace might
pin memory for direct IO and others. So in the end longterm pinning
would need to be used anyway.
--
Michal Hocko
SUSE Labs
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
On Wed 25-01-23 08:57:48, Suren Baghdasaryan wrote:
> On Wed, Jan 25, 2023 at 1:38 AM 'Michal Hocko' via kernel-team
> wrote:
> >
> > On Wed 25-01-23 00:38:49, Suren Baghdasaryan wrote:
> > > Replace indirect modifications to vma->vm_flags with calls to modif
tistics and freeing VMAs */
> mas_set(_detach, start);
> remove_mt(mm, _detach);
> @@ -2704,7 +2708,7 @@ unsigned long mmap_region(struct file *file, unsigned
> long addr,
>
> /* Undo any partial mapping done by a device driver. */
>
; operations. Introduce modifier functions for vm_flags to be used whenever
> flags are updated. This way we can better check and control correct
> locking behavior during these updates.
>
> Signed-off-by: Suren Baghdasaryan
Acked-by: Michal Hocko
> ---
>
cation attempts.
Those BUG_ONs scream to much IMHO. KSM is an MM internal code so I
gueess we should be willing to trust it.
> Signed-off-by: Suren Baghdasaryan
Acked-by: Michal Hocko
--
Michal Hocko
SUSE Labs
___
kexec mailing list
kexec@lists.
;
> Signed-off-by: Suren Baghdasaryan
Acked-by: Michal Hocko
> ---
> mm/debug.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/debug.c b/mm/debug.c
> index 9d3d893dc7f4..96d594e16292 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -215,6 +215
On Wed 25-01-23 00:38:47, Suren Baghdasaryan wrote:
> To simplify the usage of VM_LOCKED_CLEAR_MASK in clear_vm_flags(),
> replace it with VM_LOCKED_MASK bitmask and convert all users.
>
> Signed-off-by: Suren Baghdasaryan
Acked-by: Michal Hocko
> ---
> include/linux/mm.h
sors which would also prevent any
future direct setting of those flags in uncontrolled way as well.
Anyway
Acked-by: Michal Hocko
--
Michal Hocko
SUSE Labs
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
[Cc Andrew - the patch is
http://lkml.kernel.org/r/1593641660-13254-2-git-send-email-bhsha...@redhat.com]
On Thu 02-07-20 08:00:27, Michal Hocko wrote:
> On Thu 02-07-20 03:44:19, Bhupesh Sharma wrote:
> > Prabhakar reported an OOPS inside mem_cgroup_get_nr_swap_pages()
> > funct
e: aa1403e3 91106000 97f82a27 1411 (f940c663)
> [0.507770] ---[ end trace 9795948475817de4 ]---
> [0.512429] Kernel panic - not syncing: Fatal exception
> [0.517705] Rebooting in 10 seconds..
>
> Cc: Johannes Weiner
> Cc: Michal Hocko
> Cc: V
On Thu 23-01-20 19:10:47, Andrew Morton wrote:
> On Mon, 20 Jan 2020 08:29:39 +0100 Michal Hocko wrote:
>
> > On Mon 20-01-20 10:33:14, Pingfan Liu wrote:
> > > After commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug"),
> > > when a mem secti
ery closely to the kernel and occasional
breakage is to be expected I still believe that Fixes: ba72b4c8cf60
is due.
> [1]: makedumpfile, commit e73016540293 ("[v1.6.7] Update version")
>
> Signed-off-by: Pingfan Liu
> To: linux...@kvack.org
> Cc: Andrew Morton
> Cc:
On Thu 16-01-20 23:14:02, Dan Williams wrote:
> On Thu, Jan 16, 2020 at 10:23 PM Pingfan Liu wrote:
> >
> > On Thu, Jan 16, 2020 at 3:50 PM Michal Hocko wrote:
> > >
> > > On Thu 16-01-20 11:01:08, Pingfan Liu wrote:
> > > > When fully dea
sh, and save vmcore by makedumpfile
>
> Signed-off-by: Pingfan Liu
> To: linux...@kvack.org
> Cc: Andrew Morton
> Cc: David Hildenbrand
> Cc: Dan Williams
> Cc: Oscar Salvador
> Cc: Michal Hocko
> Cc: kexec@lists.infradead.org
> Cc: Kazuhito Hagio
> ---
> mm/sparse.
ry is below min watermark (node zone DMA has
lowmem protection for GFP_KERNEL allocation).
[...]
> [4.923156] Out of memory and no killable processes...
and there is no task existing to be killed so we go and panic.
--
Michal Hocko
SUSE Labs
___
ew Morton
> Cc: Matthew Wilcox
> Cc: Michal Hocko
> Cc: "Michael S. Tsirkin"
> Suggested-by: Michal Hocko
> Signed-off-by: David Hildenbrand
I have only a very vague understanding of this specific code but I do
not really see any real reason for checking offlined
s hard to review manually.
--
Michal Hocko
SUSE Labs
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > &g
On Thu 26-07-18 21:37:05, Baoquan He wrote:
> On 07/26/18 at 03:14pm, Michal Hocko wrote:
> > On Thu 26-07-18 15:12:42, Michal Hocko wrote:
> > > On Thu 26-07-18 21:09:04, Baoquan He wrote:
> > > > On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > > > &g
On Thu 26-07-18 21:09:04, Baoquan He wrote:
> On 07/26/18 at 02:59pm, Michal Hocko wrote:
> > On Wed 25-07-18 14:48:13, Baoquan He wrote:
> > > On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > > > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > > >
On Wed 25-07-18 14:48:13, Baoquan He wrote:
> On 07/23/18 at 04:34pm, Michal Hocko wrote:
> > On Thu 19-07-18 23:17:53, Baoquan He wrote:
> > > Kexec has been a formal feature in our distro, and customers owning
> > > those kind of very large machine can make use of thi
ot have the full context here but let me note that you should be
careful when doing top-down reservation because you can easily get into
hotplugable memory and break the hotremove usecase. We even warn when
this is done. See memblock_find_in_range_node
--
:17 dut4110 kdump: kexec: failed to load kdump kernel
> Aug 15 10:41:17 dut4110 kdump: failed to start up
>
> Note, that same option is able to load kdump service for 4.5.7 kernel.
>
> I can provide any details needed to help resolve this issue.
>
> Thanks,
> -
On Thu 16-06-16 13:22:27, Vitaly Kuznetsov wrote:
> Michal Hocko <mho...@kernel.org> writes:
>
> > On Thu 16-06-16 12:30:16, Vitaly Kuznetsov wrote:
> >> Christoph Hellwig <h...@infradead.org> writes:
> >>
> >> > On Thu, Jun 16, 2016 at 11
CPU 1:
> -- --
> nmi_panic();
>
> nmi_panic();
>
> nmi_panic();
I thought that nmi_panic is called only from the nmi context. If so how
c
ation.org>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: "H. Peter Anvin" <h...@zytor.com>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Michal Hocko <mho...@kernel.org>
I've finally seen testing results f
w Morton <a...@linux-foundation.org>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: "H. Peter Anvin" <h...@zytor.com>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Eric Biederman <ebied...@xmiss
t
>
> V2:
> - Use atomic_cmpxchg() instead of spin_trylock() on panic_lock
> to exclude concurrent accesses
> - Don't introduce no-lock version of crash_kexec()
>
> Signed-off-by: Hidehiro Kawai <hidehiro.kawai...@hitachi.com>
> Cc: Eric Biederman <ebied...
On Fri 31-07-15 11:23:00, 河合英宏 / KAWAI,HIDEHIRO wrote:
From: Michal Hocko [mailto:mho...@kernel.org]
[...]
I am saying that watchdog_overflow_callback might trigger on more CPUs
and panic from NMI context as well. So this is not reduced to the NMI
button sends NMI to more CPUs.
I
On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI,HIDEHIRO wrote:
From: Michal Hocko [mailto:mho...@kernel.org]
[...]
Could you point me to the code which does that, please? Maybe we are
missing that in our 3.0 kernel. I was quite surprised to see this
behavior as well.
Please see the snippet
On Thu 30-07-15 01:45:35, 河合英宏 / KAWAI,HIDEHIRO wrote:
Hi,
From: Michal Hocko [mailto:mho...@kernel.org]
On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
[...]
#define nmi_panic(fmt, ...)\
do
On Thu 30-07-15 07:33:15, 河合英宏 / KAWAI,HIDEHIRO wrote:
[...]
Are you using SGI UV? On that platform, NMIs may be delivered to
all cpus because LVT1 of all cpus are not masked as follows:
This is Compute Blade 520XB1 from Hitachi with 240 cpus.
--
Michal Hocko
SUSE Labs
On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI,HIDEHIRO wrote:
From: Michal Hocko [mailto:mho...@kernel.org]
On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
Hi,
From: linux-kernel-ow...@vger.kernel.org
[mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
On Wed 29-07-15 05:48:47, 河合英宏 / KAWAI,HIDEHIRO wrote:
Hi,
From: linux-kernel-ow...@vger.kernel.org
[mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Hidehiro Kawai
(2015/07/27 23:34), Michal Hocko wrote:
On Mon 27-07-15 10:58:50, Hidehiro Kawai wrote:
[...]
The check could
or do like this:
void nmi_panic(const char *msg)
{
...
panic(%s, msg);
}
If there is no objection, I'm going to use a macro.
Your other patch needs panic_cpu externally visible so the macro should
be OK.
--
Michal Hocko
SUSE Labs
)
+ atomic_set(panicking_cpu, -1);
This do the opposite what the comment says, wouldn't it? You should
check old_cpu == -1. Also atomic_set doesn't imply memory barriers which
might be a problem.
--
Michal Hocko
SUSE Labs
___
kexec mailing list
kexec
if the NMI has preempted an ongoing panic and
+* allow it to finish
+*/
+ if (atomic_read(panic_cpu) == raw_smp_processor_id())
+ return;
+
+ panic();
+}
+EXPORT_SYMBOL(nmi_panic);
struct tnt {
u8 bit;
--
Michal Hocko
SUSE Labs
On Thu 23-07-15 19:11:03, Hidehiro Kawai wrote:
Hi,
Thanks for the feedback.
(2015/07/23 17:25), Michal Hocko wrote:
Hi,
On Wed 22-07-15 11:14:21, Hidehiro Kawai wrote:
When an HA cluster software or administrator detects non-response
of a host, they issue an NMI to the host
53 matches
Mail list logo