RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-17 Thread Luck, Tony
> What I'm planning to do in the altera_edac notifier is: > > if (kdump_is_set) > return; Yes. That's what I think should happen. -Tony ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec

RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-17 Thread Luck, Tony
> Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else > we run the code as-is? Does that make sense to you? The "skip" option sounds like it needs some special flag associated with an entry on the notifier chain. But there are other notifier chains ... so that sounds messy to m

RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-17 Thread Luck, Tony
> So, my reasoning here is: this notifier should fit the info list, > definitely! But...it's very high risk for kdump. It deep dives into the > regmap API (there are locks in such code) plus there is an (MM)IO write > to the device and an ARM firmware call. So, despite the nature of this > notifier

Re: [PATCH v4] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made

2017-03-06 Thread Luck, Tony
t kernel should > > not do anything except clearing MCG_STATUS. This is useful > > for kdump to let vmcore dumping perform as hard as it can. > > Ok, I went and rewrote the text to make it more succinct, to the point > and correct spelling and format

Re: [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made

2017-02-22 Thread Luck, Tony
On Wed, Feb 22, 2017 at 12:11:14PM +0800, Xunlei Pang wrote: > + /* > + * Cases to bail out to avoid rendezvous process timeout: > + * 1)If this CPU is offline. > + * 2)If crashing_cpu was set, e.g. entering kdump, > + * we need to skip cpus remaining in 1st kernel. > +

RE: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-21 Thread Luck, Tony
> It's from my understanding, I didn't get the explicit description from the > intel SDM on this point. > If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each > cpu have MCG_STATUS_RIPV bit set? MCG_STATUS is a per-thread MSR and will contain the status appropriate for th

Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Luck, Tony
On Mon, Jan 23, 2017 at 06:51:30PM +0100, Borislav Petkov wrote: > Hey Tony, > > a "welcome back" is in order? :-) Yes - first day back today. Lots of catching up to do. > And apparently crash knows about poisoned pages and handles them: > > static int __init crash_save_vmcoreinfo_init(void) >

Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Luck, Tony
On Mon, Jan 23, 2017 at 03:50:56PM +0100, Borislav Petkov wrote: > On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote: > > One possible timing sequence would be: > > 1st kernel running on multiple cpus panicked > > then the crash dump code starts > > the crash dump code stops the others cp

RE: [PATCH v2] kdump: Fix crash_kexec - smp_send_stop race in panic

2011-11-02 Thread Luck, Tony
> Instead of introducing the panic lock, as an alternative we could move > smp_send_stop() to the beginning of panic(). Eric told me that the > function is currently "insufficiently reliable" for that, but perhaps we > could make it more reliable. That's tough to do. We are in panic because somet

RE: [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump

2011-10-11 Thread Luck, Tony
> So, in any case we may not be able to disable machine-check exceptions > (MCEs) only within the context of kexec'ed kernel. Let me know if I've > missed something here. Linux sets the CR4.MCE bit - look for "set_in_cr4(X86_CR4_MCE)" for places where it does so. You can ask it not to do that wit

RE: [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump

2011-10-11 Thread Luck, Tony
> Frankly, I don't think that it is undefined - you basically should be > able to read DRAM albeit with the corrupted data in it. However, you > probably best disable the whole DRAM error detection first by clearing > a couple of bits in MC4_CTL_MASK (at least on AMD that should work, I > dunno how

RE: [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump

2011-10-05 Thread Luck, Tony
> > The plan is to pass-down the list of poisoned memory pages to the second > > kernel using an elf-note so that these pages are left untouched during > > dump capture. I'm working on an implementation of the same and should > > have patches soon. > > I would say let us first figure out what happe

RE: [Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump

2011-10-03 Thread Luck, Tony
> It totally doesn't make sense to do this in the kernel when we can > filter this from userspace just fine. Patch 1 is the kernel part that provides the clue for user space tools to do this filtering. The other three parts are patches to tools that see the hint and act on it. Eric: Do you see a

RE: [RFC] Kdump and memory error handling

2011-05-04 Thread Luck, Tony
Your first suggestion of a "slim" dump makes the most sense. The purpose of a crash dump is a research resource to find out why the system crashed - but in the case of a machine check, we already have the reasons for the crash captured by the machine check handler. Perhaps you could include __log_

RE: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic

2011-01-26 Thread Luck, Tony
>- The latest approach (proposed by Linus) is to forget the disk: jump to > real-mode, but display the kernel log in a fancy format (with scroll > ups and downs) instead. A while ago (first Plumbers conference?) someone was talking about using a 2-d barcode to display the tail of the kernel log

RE: [PATCH 0/2][concept RFC] x86: BIOS-save kernel log to disk upon panic

2011-01-26 Thread Luck, Tony
> How is this more useful than a photograph of the backtrace ? You can fit a lot more data into the 2-d barcode that will fit on the screen. You can also automate the recovery of the data (e.g. for posting to kerneloops.org). -Tony ___ kexec mailing l

RE: [PATCH][EFI] Run EFI in physical mode

2010-08-13 Thread Luck, Tony
> does this affect ia64 in any way? I remember Eric complaining that set_virtual_address_map() was a one way trap door with no way to get back to physical mode ... and thus this was a big problem to support kexec on ia64. And yet we still call it, and ia64 can do kexec. So some other work around m

RE: kdump broken on Altix 350

2008-09-29 Thread Luck, Tony
Does this make kexec/kdump happier? Bare minimum testing so far (builds and boots on tiger ... didn't try kexec yet). [IA64] Put the space for cpu0 per-cpu area into .data section Initial fix for making sure that we can access percpu variables in all C code commit: 10617bbe84628eb18ab5f723d3ba

RE: kdump broken on Altix 350

2008-09-29 Thread Luck, Tony
Maybe I'm starting to see what happened ... and it could well be my fault. I wanted to allocate the per-cpu memory for cpu0 statically in the vmlinux ... so it would be available in head.S to set up everything before we move to any C code that might try to access per cpu variables. To make life e

RE: kdump broken on Altix 350

2008-08-29 Thread Luck, Tony
> your commit > > commit 10617bbe84628eb18ab5f723d3ba35005adde143 > Author: Tony Luck <[EMAIL PROTECTED]> > Date: Tue Aug 12 10:34:20 2008 -0700 > > [IA64] Ensure cpu0 can access per-cpu variables in early boot code > > broke kdump on our Altix 350. I get following early crash in

RE: [PATCH 0/3] vmcoreinfo support for dump filtering

2007-10-17 Thread Luck, Tony
> This? That does the trick, yes. > (please tell me if you want me to send this to Linus) I've put it in my tree now ... so I'll ask Linus to pull it from there. -Tony ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailma

RE: [IA64] [kdump] machveg=dig on hpzx1 platforms

2007-07-11 Thread Luck, Tony
> Here is a patch to do that. We use this internally, but > I had forgotten to post it. Terry, Just got to applying some older patches ... perhaps this one has been sitting too long because I had some apply & build problems with it. Building with arch/ia64/configs/zx1_defconfig and with this pa