>> - While at it, don't forget to:
>>
>>s/ADDR/MISC/SYND
>> /addr/misc/synd
>>
> These are actually acronyms for Address, Miscellaneous and Syndrome registers.
They look like abbreviations, not acronyms to me.
-Tony
> Export microcode version through the tracepoint to prevent ambiguity over
> the active version on the system when the MCE was received.
>
> Signed-off-by: Avadhut Naik
> Reviewed-by: Sohil Mehta
> Reviewed-by: Steven Rostedt (Google)
Reviewed-by: Tony Luck
> Export PPIN through the tracepoint as it provides a unique identifier for
> the system (or socket in case of multi-socket systems) on which the MCE
> has been received.
>
> Signed-off-by: Avadhut Naik
> Reviewed-by: Sohil Mehta
> Reviewed-by: Steven Rostedt (Google)
Reviewed-by: Tony Luck
> But no, that's not the right question to ask.
>
> It is rather: which bits of information are very relevant to an error
> record and which are transient enough so that they cannot be gathered
> from a system by other means or only gathered in a difficult way, and
> should be part of that record.
> > Is it so very different to add this to a trace record so that rasdaemon
> > can have feature parity with mcelog(8)?
>
> I knew you were gonna say that. When someone decides that it is
> a splendid idea to add more stuff to struct mce then said someone would
> want it in the tracepoint too.
>
>
> > You've spent enough time with Ashok and Thomas tweaking the Linux
> > microcode driver to know that going back to the machine the next day
> > to ask about microcode version has a bunch of ways to get a wrong
> > answer.
>
> Huh, what does that have to do with this?
If deployment of a
> > 8 bytes for PPIN, 4 more for microcode.
>
> I know, nothing leads to bloat like 0.01% here, 0.001% there...
12 extra bytes divided by (say) 64GB (a very small server these days, may
laptop has that much)
= 0.0001746%
We will need 57000 changes like this one before we get to 0.001%
> > The first patch adds PPIN (Protected Processor Inventory Number) field to
> > the tracepoint.
> >
> > The second patch adds the microcode field (Microcode Revision) to the
> > tracepoint.
>
> This is a lot of static information to add to *every* MCE.
8 bytes for PPIN, 4 more for microcode.
> + return in_task() ? task_pid_nr(current) | (smp_processor_id() <<
> CPU_ID_SHIFT) :
There are contexts and CONFIG options around pre-emption where
smp_processor_id()
will throw a warning. Use raw_smp_processor_id().
-Tony
> `strncpy` is deprecated for use on NUL-terminated destination strings [1].
>
> We should prefer more robust and less ambiguous string interfaces.
>
> A suitable replacement is `strscpy_pad` [2] due to the fact that it guarantees
> NUL-termination on the destination buffer whilst maintaining the
> What I'm missing from this text here is, what *is* the mce->misc LSB
> field in human speak? What does that field denote?
The SDM says:
Recoverable Address LSB (bits 5:0): The lowest valid recoverable address bit.
Indicates the position of the least
significant bit (LSB) of the recoverable
> struct mce m;
> + int lsb = PAGE_SHIFT;
Some maintainers like to order local declaration lines from longest to shortest
> + /*
> + * Even if the ->validation_bits are set for address mask,
> + * to be extra safe, check and reject an error radius '0',
> + * and
On Mon, Jul 18, 2022 at 09:11:33PM +, Jane Chu wrote:
> On 7/18/2022 12:22 PM, Luck, Tony wrote:
> >> It appears the kernel is trusting that ->physical_addr_mask is non-zero
> >> in other paths. So this is at least equally broken in the presence of a
> >> broke
> It appears the kernel is trusting that ->physical_addr_mask is non-zero
> in other paths. So this is at least equally broken in the presence of a
> broken BIOS. The impact is potentially larger though with this change,
> so it might be a good follow-on patch to make sure that
>
+ m.misc = (MCI_MISC_ADDR_PHYS << 6) |
__ffs64(mem_err->physical_addr_mask);
Do we want to unconditionally trust the sanity of the BIOS provided
physical_address_mask?
There's a warning comment on the kernel __ffs64() function:
* The result is not defined if no bits are set, so check
On Sat, Jan 22, 2022 at 12:40:18AM +, Jane Chu wrote:
> On 1/21/2022 4:31 PM, Jane Chu wrote:
> > On baremetal Intel platform with DCPMEM installed and configured to
> > provision daxfs, say a poison was consumed by a load from a user thread,
> > and then daxfs takes action and clears the
> I meant in this case (racing to access the same poisoned pages), the
> page mapping should have been removed by and the hwpoison swap pte
> installed by the winner thread?
My "mutex" patch that Horiguchi-san has included his summary series should
make this happen in most cases. Only problem is
On Mon, Apr 19, 2021 at 07:03:01PM -0700, Jue Wang wrote:
> On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote:
>
> > This patch suggests to do page table walk to find the error virtual
> > address. If we find multiple virtual addresses in walking, we now can't
> > determine which one is
On Mon, Apr 19, 2021 at 06:49:15PM -0700, Jue Wang wrote:
> On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote:
> ...
> > + * This function is intended to handle "Action Required" MCEs on already
> > + * hardware poisoned pages. They could happen, for example, when
> > + * memory_failure()
>> But there are places in the kernel where the code assumes that this
>> EFAULT return was simply because of a page fault. The code takes some
>> action to fix that, and then retries the access. This results in a second
>> machine check.
>
> What about return EHWPOISON instead of EFAULT and
> So what I'm missing with all this fun is, yeah, sure, we have this
> facility out there but who's using it? Is anyone even using it at all?
Even if no applications ever do anything with it, it is still useful to avoid
crashing the whole system and just terminate one application/guest.
> If so,
On Thu, Apr 08, 2021 at 10:49:58AM +0200, Borislav Petkov wrote:
> On Wed, Apr 07, 2021 at 02:43:10PM -0700, Luck, Tony wrote:
> > On Wed, Apr 07, 2021 at 11:18:16PM +0200, Borislav Petkov wrote:
> > > On Thu, Mar 25, 2021 at 05:02:34PM -0700, Tony Luck wrote:
> > > &g
> What I'm still unclear on, does this new version address that
> "mysterious" hang or panic which the validation team triggered or you
> haven't checked yet?
No :-(
They are triggering some case where multiple threads in a process hit the same
poison, and somehow memory_failure() fails to
> I have one scenario, may you take into account:
>
> If one copyin case occurs, write() returned by your patch, the user process
> may
> check the return values, for errors, it may exit the process, then the error
> page
> will be freed, and then the page maybe alloced to other process or to
On Wed, Apr 07, 2021 at 11:18:16PM +0200, Borislav Petkov wrote:
> On Thu, Mar 25, 2021 at 05:02:34PM -0700, Tony Luck wrote:
> > Andy Lutomirski pointed out that sending SIGBUS to tasks that
> > hit poison in the kernel copying syscall parameters from user
> > address space is not the right
>> Combined with my "mutex" patch (to get rid of races where 2nd process returns
>> early, but first process is still looking for mappings to unmap and tasks
>> to signal) this patch moves forward a bit. But I think it needs an
>> additional change here in kill_me_maybe() to just "return" if there
On Wed, Mar 31, 2021 at 07:25:40PM +0800, Aili Yao wrote:
> When the page is already poisoned, another memory_failure() call in the
> same page now return 0, meaning OK. For nested memory mce handling, this
> behavior may lead to one mce looping, Example:
>
> 1.When LCME is enabled, and there are
- set_bit(MCE_OVERFLOW, (unsigned long *)>flags);
+ mcelog->flags |= BIT(MCE_OVERFLOW);
set_bit() is an atomic operation because it might race with the code to
get and clear this bit:
do {
flags = mcelog->flags;
> Yeah, into
>
> fd258dc4442c ("x86/mce: Add Skylake quirk for patrol scrub reported errors")
Thanks ... my memory is failing, and I forgot that the patch had been improved
and
moved from core.c (where I looked for it) into severity.c
-Tony
> What is the justifucation for making this rate limit per UID and not
> per task, per process or systemwide?
The concern is that a malicious user is running a workload that loops
obtaining the buslock. This brings the whole system to its knees.
Limiting per task doesn't help. The user can
On Fri, Mar 12, 2021 at 11:48:31PM +, Luck, Tony wrote:
> Thanks to the decode of the instruction we do have the virtual address. So we
> just need
> a safe walk of pgd->p4d->pud->pmd->pte (truncated if we hit a huge page) with
> a write
> of a "not-present&
>> Nothing new - just the next spammer attempt.
> But this was a new class of Spam. So far i got only mass mailing... This was
> personalized based on my previous e-Mail (did not include this part in my
> mail)
Somewhat new - combining trawling of public mailing lists for addresses with
a
>> will memory_failure() find it and unmap it? if succeed, then the current
>> will be
>> signaled with correct vaddr and shift?
>
> That's a very good question. I didn't see a SIGBUS when I first wrote this
> code,
> hence all the p->mce_vaddr. But now I'm
> a) not sure why there wasn't a
> Sorry to interrupt as I am really confused here:
> If it's a copyin case, has the page been mapped for the current process?
Yes. The kernel has the current process mapped to the low address range
and code like get_user(), put_user() "simply" reaches down to those addresses
(maybe not so simply,
>> I think the "sapphire_rapids" is the code name for the server platform.
>
> If that's really the case, then:
>
> #define INTEL_FAM6_SAPPHIRERAPIDS_X 0x8F
>
> is wrong, those things should be uarch name, not platform name. Tony?
0x8F is the model number of the CPU that is named Sapphire
> I guess that p->mce_vaddr stores the virtual address of the error here.
> If so, sending SIGBUS with the address looks enough as we do now, so why
> do you walk page table to find the error virtual address?
p->mce_vaddr only has the virtual address for the COPYIN case. In that code
path we
> I think we need to at least fix the existing bug before we add more
> signals. AFAICS the MCE_IN_KERNEL_COPYIN code is busted for kernel
> threads.
Can a kernel thread do get_user() or copy_from_user()? Do we have kernel
threads
that have an associated user address space to copy from?
-Tony
On Tue, Mar 09, 2021 at 08:28:24AM +, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Tue, Mar 09, 2021 at 02:35:34PM +0800, Aili Yao wrote:
> > When the page is already poisoned, another memory_failure() call in the
> > same page now return 0, meaning OK. For nested memory mce handling, this
> > behavior
There can be races when multiple CPUs consume poison from the same
page. The first into memory_failure() atomically sets the HWPoison
page flag and begins hunting for tasks that map this page. Eventually
it invalidates those mappings and may send a SIGBUS to the affected
tasks.
But while all that
>> So it should be safe to grab and hold a mutex. See patch below.
>
> The mutex approach looks simpler and safer, so I'm fine with it.
Thanks. Is that an "Acked-by:"?
>> /**
>> * memory_failure - Handle memory failure of a page.
>> * @pfn: Page Number of the corrupted page
>> @@ -1424,12
> Can you point me at that SIGBUS code in a current kernel?
It is in kill_me_maybe(). mce_vaddr is setup when we disassemble whatever
get_user()
or copy from user variant was in use in the kernel when the poison memory was
consumed.
if (p->mce_vaddr != (void __user *)-1l) {
This whole page table walking patch is trying to work around the
races caused by multiple calls to memory_failure() for the same
page.
Maybe better to just avoid the races. The comment right above
memory_failure says:
* Must run in process context (e.g. a work queue) with interrupts
* enabled
> From the walk, it seems we have got the virtual address, can we just send a
> SIGBUS with it?
If the walk wins the race and the pte for the poisoned page is still valid,
then yes.
But we could have:
CPU1CPU2
memory_failure sets poison
bit for struct page
rmap
On Thu, Mar 04, 2021 at 02:45:24PM +0800, Aili Yao wrote:
> > > if your methods works, should it be like this?
> > >
> > > 1582 pteval =
> > > swp_entry_to_pte(make_hwpoison_entry(subpage));
> > > 1583 if (PageHuge(page)) {
> > > 1584
> For error address with sigbus, i think this is not an issue resulted by the
> patch i post, before my patch, the issue is already there.
> I don't find a realizable way to get the correct address for same reason ---
> we don't know whether the page mapping is there or not when
> we got to
On Fri, Feb 26, 2021 at 10:59:15AM +0800, Aili Yao wrote:
> Hi naoya, tony:
> > >
> > > Idea for what we should do next ... Now that x86 is calling
> > > memory_failure()
> > > from user context ... maybe parallel calls for the same page should
> > > be blocked until the first caller completes
> Some programs may use read(2), write(2), etc as ways to check if
> memory is valid without getting a signal. They might not want
> signals, which means that this feature might need to be configurable.
That sounds like an appalling hack. If users need such a mechanism
we should create some
> Programs that get a signal might expect that the RIP that the signal
> frame points to is the instruction that caused the signal and that the
> instruction faulted without side effects. For SIGSEGV, I would be
> especially nervous about this. Maybe SIGBUS is safer. For SIGSEGV,
> it's
On Fri, Feb 26, 2021 at 10:52:50AM +0800, Aili Yao wrote:
> Hi naoya,Oscar,david:
> >
> > > We could use some negative value (error code) to report the reported case,
> > > then as you mentioned above, some callers need change to handle the
> > > new case, and the same is true if you use some
On Thu, Feb 25, 2021 at 12:38:06PM +, HORIGUCHI NAOYA(堀口 直也) wrote:
> Thank you for shedding light on this, this race looks worrisome to me.
> We call try_to_unmap() inside memory_failure(), where we find affected
> ptes by page_vma_mapped_walk() and convert into hwpoison entires in
>
On Tue, Feb 23, 2021 at 07:33:46AM -0800, Andy Lutomirski wrote:
>
> > On Feb 23, 2021, at 4:44 AM, Aili Yao wrote:
> >
> > On Fri, 5 Feb 2021 17:01:35 +0800
> > Aili Yao wrote:
> >
> >> When one page is already hwpoisoned by MCE AO action, processes may not
> >> be killed, processes mapping
> What I think is qemu has not an easy to get the MCE signature from host or
> currently no methods for this
> So qemu treat all AR will be No RIPV, Do more is better than do less.
RIPV would be important in the guest in the case where the guest can fix the
problem that caused
the machine check
Would it be possible to teach checkpatch not to warn about
canonical references to earlier commits? E.g.
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per
line)
#7:
commit e80634a75aba ("EDAC, skx: Retrieve and print retry_rd_err_log registers")
Thanks
-Tony
> +#define X86_BUG_NUMA_SHARES_LLC X86_BUG(25) /* CPU may
> enumerate an LLC shared by multiple NUMA nodes */
During internal review I wondered why this is a "BUG" rather than a "FEATURE"
bit.
Apparently, the suggestion for "BUG" came from earlier community discussions.
> Can we use existing backend pstore ram driver (fs/pstore/ram.c) for DDR
> instead of SRAM ?
The expectation for pstore is that the system will go through a reset when it
crashes. Most systems do not preserve DDR contents across reset.
> Was the current driver written only to support
On Mon, Feb 08, 2021 at 02:04:24PM -0500, Liang, Kan wrote:
> On 2/8/2021 12:56 PM, Borislav Petkov wrote:
>
> I think it's good enough for perf, but I'm not sure whether other codes need
> the CPU type information.
>
> Ricardo, do you know?
>
> Maybe we should implement a generic function as
On Fri, Feb 05, 2021 at 01:03:18PM -0700, Shuah Khan wrote:
> On 2/5/21 2:58 AM, Greg KH wrote:
> > On Wed, Feb 03, 2021 at 11:11:57AM -0700, Shuah Khan wrote:
> > > +static inline u32 seqnum32_inc(struct seqnum32 *seq)
> > > +{
> > > + atomic_t val = ATOMIC_INIT(seq->seqnum);
> > > +
> > > +
> So that "system hang or panic" which the validation folks triggered,
> that cannot be reproduced anymore? Did they run the latest version of
> the patch?
I will get the validation folks to run the latest version (and play around with
hyperthreading if they see problems).
-Tony
> And the much more important question is, what is the code supposed to
> do when that overflow *actually* happens in real life? Because IINM,
> an overflow condition on the same page would mean killing the task to
> contain the error and not killing the machine...
Correct. The cases I've
> In any case, this patch needs rebasing on top of my big fault series
Is that series in some GIT tree? Or can you give a lore.kernel.org link?
Thanks
-Tony
On Thu, Jan 28, 2021 at 06:57:35PM +0100, Borislav Petkov wrote:
> Crazy idea: if you still can reproduce on -rc3, you could bisect: i.e.,
> if you apply the patch on -rc3 and it explodes and if you apply the same
> patch on -rc5 and it works, then that could be a start... Yeah, don't
> have a
Thanks for the explanation and test code. I think I see better what
is going on here.
[I took your idea for using madvise(...MADV_HWPOISON) and added a new "-S"
option to my einj_mem_uc test program to use madvise instead of ACPI/EINJ
for injections. Update pushed here:
Stephen,
Yes. Most stuff I do these days goes through the RAS tree I share with Boris.
-Tony
-Original Message-
From: Stephen Rothwell
Sent: Thursday, January 28, 2021 11:53 AM
To: Luck, Tony
Cc: Arnd Bergmann ; Linux Kernel Mailing List
; Linux Next Mailing List
Subject: linux
On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote:
> when one page is already hwpoisoned by AO action, process may not be
> killed, the process mapping this page may make a syscall include this
> page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel
> mode it may be fixed
On Tue, Jan 26, 2021 at 12:03:14PM +0100, Borislav Petkov wrote:
> On Mon, Jan 25, 2021 at 02:55:09PM -0800, Luck, Tony wrote:
> > And now I've changed it back to non-atomic (but keeping the
> > slightly cleaner looking code style that I used for the atomic
> > version).
On Thu, Jan 21, 2021 at 01:09:59PM -0800, Luck, Tony wrote:
> On Wed, Jan 20, 2021 at 01:18:12PM +0100, Borislav Petkov wrote:
> But, on a whim, I changed the type of mce_count from "int" to "atomic_t" and
> fixeed up the increment & clear to use atomic_inc_return(
On Wed, Jan 20, 2021 at 01:18:12PM +0100, Borislav Petkov wrote:
> On Tue, Jan 19, 2021 at 03:57:59PM -0800, Luck, Tony wrote:
> > But the real validation folks took my patch and found that it has
> > destabilized cases 1 & 2 (and case 3 also chokes if you repeat a few
>
> Yah, some printk sprinkling might be a good start. But debugging in that
> atomic context is always nasty. ;-\
Some very light printk sprinkling (one message in queue_task_work() in atomic
context, one each in kill_me_now() and kill_me_maybe() to check when task_work
actually called them.
On Tue, Jan 19, 2021 at 11:56:32AM +0100, Borislav Petkov wrote:
> On Fri, Jan 15, 2021 at 03:23:46PM -0800, Luck, Tony wrote:
> > On Fri, Jan 15, 2021 at 12:51:03PM -0800, Luck, Tony wrote:
> > > static void kill_me_now(struct callback_head *ch)
> > >
On Fri, Jan 15, 2021 at 12:51:03PM -0800, Luck, Tony wrote:
> static void kill_me_now(struct callback_head *ch)
> {
> + p->mce_count = 0;
> force_sig(SIGBUS);
> }
Brown paper bag time ... I just pasted that line from kill_me_maybe()
and I thought I did a re-compile
Recovery action when get_user() triggers a machine check uses the fixup
path to make get_user() return -EFAULT. Also queue_task_work() sets up
so that kill_me_maybe() will be called on return to user mode to send a
SIGBUS to the current process.
But there are places in the kernel where the code
On Fri, Jan 15, 2021 at 04:27:54PM +0100, Borislav Petkov wrote:
> On Thu, Jan 14, 2021 at 04:38:17PM -0800, Tony Luck wrote:
> > Add a "mce_busy" counter so that task_work_add() is only called once
> > per faulty page in this task.
>
> Yeah, that sentence can be removed now too.
I will update
On Thu, Jan 14, 2021 at 09:22:13PM +0100, Borislav Petkov wrote:
> On Mon, Jan 11, 2021 at 01:44:50PM -0800, Tony Luck wrote:
> > @@ -1431,8 +1433,11 @@ noinstr void do_machine_check(struct pt_regs *regs)
> > mce_panic("Failed kernel mode recovery", ,
> > msg);
> >
>> Maybe the other difference in approach between Andy and me is whether to
>> go for a solution that covers all the corner cases, or just make an
>> incremental
>> improvement that allows for recover in some useful subset of remaining fatal
>> cases, but still dies in other cases.
>
> Does that
> I think you guys have veered off into the weeds with this. The current
> solution - modulo error messages not destined for humans :) - is not soo
> bad, considering the whole MCA trainwreck.
>
> Or am I missing something completely unacceptable?
Maybe the other difference in approach between
On Tue, Jan 12, 2021 at 02:04:55PM -0800, Andy Lutomirski wrote:
> > But we know that the fault happend in a get_user() or copy_from_user() call
> > (i.e. an RIP with an extable recovery address). Does context switch
> > access user memory?
>
> No, but NMI can.
>
> The case that would be very
On Tue, Jan 12, 2021 at 10:57:07AM -0800, Andy Lutomirski wrote:
> On Tue, Jan 12, 2021 at 10:24 AM Luck, Tony wrote:
> >
> > On Tue, Jan 12, 2021 at 09:21:21AM -0800, Andy Lutomirski wrote:
> > > Well, we need to do *something* when the first __get_user() trips the
>
On Tue, Jan 12, 2021 at 09:21:21AM -0800, Andy Lutomirski wrote:
> Well, we need to do *something* when the first __get_user() trips the
> #MC. It would be nice if we could actually fix up the page tables
> inside the #MC handler, but, if we're in a pagefault_disable() context
> we might have
On Tue, Jan 12, 2021 at 09:00:14AM -0800, Andy Lutomirski wrote:
> > On Jan 11, 2021, at 2:21 PM, Luck, Tony wrote:
> >
> > On Mon, Jan 11, 2021 at 02:11:56PM -0800, Andy Lutomirski wrote:
> >>
> >>>> On Jan 11, 2021, at 1:45 PM, Tony Luck wrote:
On Mon, Jan 11, 2021 at 02:11:56PM -0800, Andy Lutomirski wrote:
>
> > On Jan 11, 2021, at 1:45 PM, Tony Luck wrote:
> >
> > Recovery action when get_user() triggers a machine check uses the fixup
> > path to make get_user() return -EFAULT. Also queue_task_work() sets up
> > so that
> Yeah, saw that, but that should be trivially fixable I'm thinking.
Trivial, maybe ... but then follows the audit of other get_user() calls:
git grep get_user\( | wc -l
2003
:-(
-Tony
> I think this is horrid; why can't we have it return something different
> then -EFAULT instead?
I did consider this ... but it appears that architectures aren't unified in the
return value from get_user()
Here's another function involved in the futex call chain leading to this:
static int
res there.
But that's a useful improvement (eliminating the other three sockets on this
system).
Tested-by: Tony Luck
-Tony
On Wed, Jan 06, 2021 at 11:17:08AM -0800, Paul E. McKenney wrote:
> On Wed, Jan 06, 2021 at 06:39:30PM +0000, Luck, Tony wrote:
> > > The "Timeout: Not all CPUs entered broadcast exception handler" message
> > > will appear from time to time given enough
> The "Timeout: Not all CPUs entered broadcast exception handler" message
> will appear from time to time given enough systems, but this message does
> not identify which CPUs failed to enter the broadcast exception handler.
> This information would be valuable if available, for example, in order
On Tue, Jan 05, 2021 at 02:17:41PM +0100, Arnd Bergmann wrote:
> On Mon, Jan 4, 2021 at 5:00 PM Luck, Tony wrote:
> >
> > > I have not received any reply from the ia64 maintainers, I assume they
> > > were
> > > both out of office for Christmas.
> &g
> I have not received any reply from the ia64 maintainers, I assume they were
> both out of office for Christmas.
I'm back in the office ... but have no working ia64 machines, nor time to look
at patches :-(
Should drop me from the MAINTAINTERS file.
-Tony
> ../drivers/edac/igen6_edac.c: In function 'ecclog_nmi_handler':
> ../drivers/edac/igen6_edac.c:525:10: error: 'NMI_DONE' undeclared (first use
> in this function); did you mean 'DMI_NONE'?
>return NMI_DONE;
This driver has a #include
But inside that file it says:
#if
> Of course is this not only an x86 problem. Every architecture which
> supports virtualization has the same issue. ARM(64) has no way to tell
> for sure whether the machine runs bare metal either. No idea about the
> other architectures.
Sounds like a hypervisor problem. If the VMM provides
Booting as a guest under KVM results in error messages about
unchecked MSR access:
[6.814328][T0] unchecked MSR access error: RDMSR from 0x17f at rIP:
0x84483f16 (mce_intel_feature_init+0x156/0x270)
because KVM doesn't provide emulation for random model specific registers.
I still think there is a reasonable case to claim that this usage is right to
check
whether it is running as a guest.
Look at what it is trying to do ... change the behavior of the platform w.r.t.
logging
of memory errors. How does that make any sense for a guest ... that doesn't
even
know
> I wouldn't expect all hypervisors to necessarily set CPUID.01H:ECX[bit
> 31]. Architecturally, on Intel CPUs, that bit is simply defined as
> "not used." There is no documented contract between Intel and
> hypervisor vendors regarding the use of that bit. (AMD, on the other
> hand, *does*
Booting as a guest under KVM results in error messages about
unchecked MSR access:
[6.814328][T0] unchecked MSR access error: RDMSR from 0x17f at rIP:
0x84483f16 (mce_intel_feature_init+0x156/0x270)
because KVM doesn't provide emulation for random model specific registers.
> I thought Linux had long ago gone the route of turning rdmsr/wrmsr
> into rdmsr_safe/wrmsr_safe, so that the guest would ignore the #GPs on
> writes and return zero to the caller for #GPs on reads.
Linux just switched that around for the machine check banks ... if they #GP
fault, then something
What does KVM do with model specific MSRs?
Looks like you let the guest believe it was running on one of Sandy Bridge, Ivy
Bridge, Haswell (Xeon).
So, the core MCE code tried to enable extended error reporting.
If there is a mode to have KVM let the guest think that it read/wrote MSR 0x17F,
On Fri, Oct 30, 2020 at 12:04:03PM -0700, Luck, Tony wrote:
Bah, didn't notice this conversation didn't include LKML.
> The Xeon versions of Sandy Bridge, Ivy Bridge and Haswell support an
> optional additional error logging mode which is enabled by an MSR.
>
> Previously this mode
The following changes since commit f4d51dffc6c01a9e94650d95ce0104964f8ae822:
Linux 5.9-rc4 (2020-09-06 17:11:40 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
tags/ia64_for_5.10
for you to fetch changes up to
>> Machine checks are more serious. Just give up at the point where the
>> main copy loop triggered the #MC and return from the copy code as if
>> the copy succeeded. The machine check handler will use task_work_add() to
>> make sure that the task is sent a SIGBUS.
>
> Isn't that just plain wrong?
On Mon, Oct 05, 2020 at 06:32:47PM +0200, Borislav Petkov wrote:
> On Wed, Sep 30, 2020 at 04:26:10PM -0700, Tony Luck wrote:
> > arch/x86/kernel/cpu/mce/core.c | 33 +
> > include/linux/sched.h | 2 ++
> > 2 files changed, 23 insertions(+), 12
On Sat, Oct 03, 2020 at 03:39:29AM +0200, Thomas Gleixner wrote:
> On Fri, Oct 02 2020 at 13:19, Ricardo Neri wrote:
> > Add support to discover and enumerate CPUs in Intel hybrid parts. A hybrid
> > part has CPUs with more than one type of micro-architecture. Thus, certain
> > features may only
1 - 100 of 2074 matches
Mail list logo