Re: MCE reports errors that can't be verified

2018-10-30 Thread Daniel Aberger - Profihost AG
Am 29.10.18 um 19:25 schrieb Luck, Tony: > On Mon, Oct 29, 2018 at 06:51:29PM +0100, Borislav Petkov wrote: >> On Mon, Oct 29, 2018 at 04:59:44PM +, Luck, Tony wrote: >>> The EDAC driver printed out those messages, >> >> I don't think so - that's __print_mce() in mce.c which dumps the three >

Re: MCE reports errors that can't be verified

2018-10-29 Thread Luck, Tony
On Mon, Oct 29, 2018 at 06:51:29PM +0100, Borislav Petkov wrote: > On Mon, Oct 29, 2018 at 04:59:44PM +, Luck, Tony wrote: > > The EDAC driver printed out those messages, > > I don't think so - that's __print_mce() in mce.c which dumps the three > lines under "... events logged". Which means,

Re: MCE reports errors that can't be verified

2018-10-29 Thread Borislav Petkov
On Mon, Oct 29, 2018 at 04:59:44PM +, Luck, Tony wrote: > The EDAC driver printed out those messages, I don't think so - that's __print_mce() in mce.c which dumps the three lines under "... events logged". Which means, that's the lowest prio, fallback notifier which runs when nothing else befo

RE: MCE reports errors that can't be verified

2018-10-29 Thread Luck, Tony
> The fact that you see this, means, the error has reached the last > notifier. So the EDAC notifier must've run too and handed the error to > the EDAC driver. > > Can you send a full dmesg from that machine, privately to Tony and me is > fine too. Some system configuration information would be he

Re: MCE reports errors that can't be verified

2018-10-29 Thread Borislav Petkov
On Mon, Oct 29, 2018 at 11:45:04AM +0100, Daniel Aberger - Profihost AG wrote: ... > [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: CPU 15: Machine Check: > 0 Bank 7: cc810091 > [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: TSC 0 ADDR 70fb1b3ec0 > MISC 142189886 > [Mi Aug 22 13:54:47

MCE reports errors that can't be verified

2018-10-29 Thread Daniel Aberger - Profihost AG
Hello, We currently have several servers reporting faulty memory through MCE. Example dmesg output: [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: Machine check events logged [Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: CPU 12: Machine Check: 0 Bank 7: cc027c010091 [Mi Aug 22 13:54:47