> What I'm planning to do in the altera_edac notifier is:
>
> if (kdump_is_set)
> return;
Yes. That's what I think should happen.
-Tony
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
> Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
> we run the code as-is? Does that make sense to you?
The "skip" option sounds like it needs some special flag associated with
an entry on the notifier chain. But there are other notifier chains ... so that
sounds messy to m
> So, my reasoning here is: this notifier should fit the info list,
> definitely! But...it's very high risk for kdump. It deep dives into the
> regmap API (there are locks in such code) plus there is an (MM)IO write
> to the device and an ARM firmware call. So, despite the nature of this
> notifier
t kernel should
> > not do anything except clearing MCG_STATUS. This is useful
> > for kdump to let vmcore dumping perform as hard as it can.
>
> Ok, I went and rewrote the text to make it more succinct, to the point
> and correct spelling and format
On Wed, Feb 22, 2017 at 12:11:14PM +0800, Xunlei Pang wrote:
> + /*
> + * Cases to bail out to avoid rendezvous process timeout:
> + * 1)If this CPU is offline.
> + * 2)If crashing_cpu was set, e.g. entering kdump,
> + * we need to skip cpus remaining in 1st kernel.
> +
> It's from my understanding, I didn't get the explicit description from the
> intel SDM on this point.
> If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each
> cpu have MCG_STATUS_RIPV bit set?
MCG_STATUS is a per-thread MSR and will contain the status appropriate for th
On Mon, Jan 23, 2017 at 06:51:30PM +0100, Borislav Petkov wrote:
> Hey Tony,
>
> a "welcome back" is in order? :-)
Yes - first day back today. Lots of catching up to do.
> And apparently crash knows about poisoned pages and handles them:
>
> static int __init crash_save_vmcoreinfo_init(void)
>
On Mon, Jan 23, 2017 at 03:50:56PM +0100, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
> > One possible timing sequence would be:
> > 1st kernel running on multiple cpus panicked
> > then the crash dump code starts
> > the crash dump code stops the others cp
> Instead of introducing the panic lock, as an alternative we could move
> smp_send_stop() to the beginning of panic(). Eric told me that the
> function is currently "insufficiently reliable" for that, but perhaps we
> could make it more reliable.
That's tough to do. We are in panic because somet
> So, in any case we may not be able to disable machine-check exceptions
> (MCEs) only within the context of kexec'ed kernel. Let me know if I've
> missed something here.
Linux sets the CR4.MCE bit - look for "set_in_cr4(X86_CR4_MCE)" for places
where it does so. You can ask it not to do that wit
> Frankly, I don't think that it is undefined - you basically should be
> able to read DRAM albeit with the corrupted data in it. However, you
> probably best disable the whole DRAM error detection first by clearing
> a couple of bits in MC4_CTL_MASK (at least on AMD that should work, I
> dunno how
> > The plan is to pass-down the list of poisoned memory pages to the second
> > kernel using an elf-note so that these pages are left untouched during
> > dump capture. I'm working on an implementation of the same and should
> > have patches soon.
>
> I would say let us first figure out what happe
> It totally doesn't make sense to do this in the kernel when we can
> filter this from userspace just fine.
Patch 1 is the kernel part that provides the clue for user space
tools to do this filtering. The other three parts are patches to
tools that see the hint and act on it.
Eric: Do you see a
Your first suggestion of a "slim" dump makes the most sense. The
purpose of a crash dump is a research resource to find out why
the system crashed - but in the case of a machine check, we already
have the reasons for the crash captured by the machine check handler.
Perhaps you could include __log_
>- The latest approach (proposed by Linus) is to forget the disk: jump to
> real-mode, but display the kernel log in a fancy format (with scroll
> ups and downs) instead.
A while ago (first Plumbers conference?) someone was talking about
using a 2-d barcode to display the tail of the kernel log
> How is this more useful than a photograph of the backtrace ?
You can fit a lot more data into the 2-d barcode that will fit on
the screen. You can also automate the recovery of the data (e.g.
for posting to kerneloops.org).
-Tony
___
kexec mailing l
> does this affect ia64 in any way?
I remember Eric complaining that set_virtual_address_map() was a one
way trap door with no way to get back to physical mode ... and thus
this was a big problem to support kexec on ia64. And yet we still call
it, and ia64 can do kexec. So some other work around m
Does this make kexec/kdump happier? Bare minimum testing so far
(builds and boots on tiger ... didn't try kexec yet).
[IA64] Put the space for cpu0 per-cpu area into .data section
Initial fix for making sure that we can access percpu variables
in all C code commit: 10617bbe84628eb18ab5f723d3ba
Maybe I'm starting to see what happened ... and it could well
be my fault.
I wanted to allocate the per-cpu memory for cpu0 statically
in the vmlinux ... so it would be available in head.S to set
up everything before we move to any C code that might try to
access per cpu variables. To make life e
> your commit
>
> commit 10617bbe84628eb18ab5f723d3ba35005adde143
> Author: Tony Luck <[EMAIL PROTECTED]>
> Date: Tue Aug 12 10:34:20 2008 -0700
>
> [IA64] Ensure cpu0 can access per-cpu variables in early boot code
>
> broke kdump on our Altix 350. I get following early crash in
> This?
That does the trick, yes.
> (please tell me if you want me to send this to Linus)
I've put it in my tree now ... so I'll ask Linus to pull
it from there.
-Tony
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailma
> Here is a patch to do that. We use this internally, but
> I had forgotten to post it.
Terry,
Just got to applying some older patches ... perhaps this one has
been sitting too long because I had some apply & build problems with it.
Building with arch/ia64/configs/zx1_defconfig and with this pa
22 matches
Mail list logo