Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2014-06-29 Thread Xie XiuQi
On 2014/6/28 6:10, Luck, Tony wrote: >>> Not all machine checks are fatal - it would be bad for us to go into >>> an infinite spin instead of executing the recovery code. >> >> Then for the time being extlog shouldn't hook into the decoder chain >> but into mce_process_work, i.e. the last should ca

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2014-06-27 Thread Borislav Petkov
On Fri, Jun 27, 2014 at 10:10:48PM +, Luck, Tony wrote: > I spoke too quickly. The only MCE for which we have recovery code are > those that hit in application code. So the processor that is trying to > do the printk() can't possibly be holding the locks. Other processors > might have held the

RE: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2014-06-27 Thread Luck, Tony
>> Not all machine checks are fatal - it would be bad for us to go into >> an infinite spin instead of executing the recovery code. > > Then for the time being extlog shouldn't hook into the decoder chain > but into mce_process_work, i.e. the last should call it. Or maybe add > another notifier whi

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2014-06-27 Thread Borislav Petkov
On Fri, Jun 27, 2014 at 08:43:14PM +, Luck, Tony wrote: > Not all machine checks are fatal - it would be bad for us to go into > an infinite spin instead of executing the recovery code. Then for the time being extlog shouldn't hook into the decoder chain but into mce_process_work, i.e. the las

RE: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2014-06-27 Thread Luck, Tony
>> There's a logbuf_lock in printk. If logbuf_lock is held by other cpu, >> it'll lead to an infinity spin here. Isn't it? > > Yes, but we want to take the risk and print something out before the > machine dies instead of waiting to get into printk-safe context first > and maybe corrupt state. Not

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2014-06-27 Thread Borislav Petkov
On Fri, Jun 27, 2014 at 01:34:45PM +0800, Xie XiuQi wrote: > The call graph is like this, > do_machine_check > -> mce_log > -> atomic_notifier_call_chain(&x86_mce_decoder_chain ...) >-> ... > -> extlog_print > -> print_extlog_rcd > -> __print_extlog_rcd >-> printk > >

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2014-06-26 Thread Xie XiuQi
On 2013/10/18 20:37, Naveen N. Rao wrote: > On 10/18/2013 01:53 PM, Chen, Gong wrote: >> This H/W error log driver (a.k.a eMCA driver) is implemented based on >> http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html >> >> After errors are captured,

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-22 Thread Naveen N. Rao
On 10/22/2013 12:33 AM, Luck, Tony wrote: But yes, this is possible and it would make it all even cleaner and simpler by simply not needing the reg/dereg interfaces for mce_ext_err_print but adding it to the chain. So this is on top of the 9 patch series (using the V4 that Chen Gong posted for

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-22 Thread Borislav Petkov
On Mon, Oct 21, 2013 at 03:39:20PM -0700, Tony Luck wrote: > I folded that back into the series. Also switched out the test on > whether to print the "No further action is required" message to only > do so for corrected errors. Cleaned up some of the commit messages, > > The result is sitting at: >

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-21 Thread Tony Luck
On Mon, Oct 21, 2013 at 12:03 PM, Luck, Tony wrote: > So this is on top of the 9 patch series (using the V4 that Chen Gong > posted for part 4/9 and V3 for all the others). Obviously it should > be folded back into the series if we go this way. > > It's a bit simplistic right now - the registered

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-21 Thread Luck, Tony
> But yes, this is possible and it would make it all even cleaner > and simpler by simply not needing the reg/dereg interfaces for > mce_ext_err_print but adding it to the chain. So this is on top of the 9 patch series (using the V4 that Chen Gong posted for part 4/9 and V3 for all the others). O

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-21 Thread Naveen N. Rao
On 10/20/2013 01:51 PM, Borislav Petkov wrote: On Sun, Oct 20, 2013 at 03:06:15AM -0400, Chen Gong wrote: Oh, yes it is. Furthermore, it reminds me where is the best place to put cper.c from I write this patch series. CPER really doesn't dpend on APEI even ACPI. Maybe lib/ ia an option. I can up

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-20 Thread Borislav Petkov
Btw, your mailer is generating that Mail-Followup-To header which removes you from the To: list and puts everyone else on To: instead. And of course, the patches you've sent with git-send-email don't have that header and replying to all there is fine. And Tony's replies don't have it so replying

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-20 Thread Chen Gong
[...] > >diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig > >index 22327e6..c67ec61 100644 > >--- a/drivers/acpi/Kconfig > >+++ b/drivers/acpi/Kconfig > >@@ -372,4 +372,24 @@ config ACPI_BGRT > > > > source "drivers/acpi/apei/Kconfig" > > > >+config ACPI_EXTLOG > >+tristate "Extended

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-19 Thread Chen Gong
..@vger.kernel.org, > linux-kernel@vger.kernel.org > Subject: Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 > platform > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 > Thunderbird/24.0 > [...] > >+ > >+MODULE_AUTHOR(

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-19 Thread Borislav Petkov
On Fri, Oct 18, 2013 at 10:22:26PM +, Luck, Tony wrote: > @@ -154,6 +154,10 @@ void mce_log(struct mce *mce) > /* Emit the trace record: */ > trace_mce_record(mce); > > + if (mce_ext_err_print) > + if (mce_ext_err_print(NULL, m.extcpu, i)) > + r

RE: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-18 Thread Luck, Tony
@@ -154,6 +154,10 @@ void mce_log(struct mce *mce) /* Emit the trace record: */ trace_mce_record(mce); + if (mce_ext_err_print) + if (mce_ext_err_print(NULL, m.extcpu, i)) + return; + ret = atomic_notifier_call_chain(&x86_mce_decod

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-18 Thread Borislav Petkov
On Fri, Oct 18, 2013 at 08:57:22PM +, Luck, Tony wrote: > Long term ... I'd be happy to see mce_log() go away. But we need to > have a robust, well tested replacement in place for some time before > such a move is up for discussion. Basically a userspace daemon consuming the tracepoint or plur

RE: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-18 Thread Luck, Tony
> Hmm, that's a good question you raise: but the more important question > is, do you guys - Gong and Tony - want to replace the logging we're > already doing, i.e. mce_log() with extlog or not. Long term ... I'd be happy to see mce_log() go away. But we need to have a robust, well tested replace

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-18 Thread Borislav Petkov
On Fri, Oct 18, 2013 at 06:07:56PM +0530, Naveen N. Rao wrote: > >@@ -624,6 +641,9 @@ void machine_check_poll(enum mcp_flags flags, > >mce_banks_t *b) > > (m.status & (mca_cfg.ser ? MCI_STATUS_S : MCI_STATUS_UC))) > > continue; > > > >+if (mce_ext_er

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-18 Thread Naveen N. Rao
On 10/18/2013 01:53 PM, Chen, Gong wrote: This H/W error log driver (a.k.a eMCA driver) is implemented based on http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html After errors are captured, more valuable information can be got via this new enh