On 2014/6/28 6:10, Luck, Tony wrote:
>>> Not all machine checks are fatal - it would be bad for us to go into
>>> an infinite spin instead of executing the recovery code.
>>
>> Then for the time being extlog shouldn't hook into the decoder chain
>> but into mce_process_work, i.e. the last should ca
On Fri, Jun 27, 2014 at 10:10:48PM +, Luck, Tony wrote:
> I spoke too quickly. The only MCE for which we have recovery code are
> those that hit in application code. So the processor that is trying to
> do the printk() can't possibly be holding the locks. Other processors
> might have held the
>> Not all machine checks are fatal - it would be bad for us to go into
>> an infinite spin instead of executing the recovery code.
>
> Then for the time being extlog shouldn't hook into the decoder chain
> but into mce_process_work, i.e. the last should call it. Or maybe add
> another notifier whi
On Fri, Jun 27, 2014 at 08:43:14PM +, Luck, Tony wrote:
> Not all machine checks are fatal - it would be bad for us to go into
> an infinite spin instead of executing the recovery code.
Then for the time being extlog shouldn't hook into the decoder chain
but into mce_process_work, i.e. the las
>> There's a logbuf_lock in printk. If logbuf_lock is held by other cpu,
>> it'll lead to an infinity spin here. Isn't it?
>
> Yes, but we want to take the risk and print something out before the
> machine dies instead of waiting to get into printk-safe context first
> and maybe corrupt state.
Not
On Fri, Jun 27, 2014 at 01:34:45PM +0800, Xie XiuQi wrote:
> The call graph is like this,
> do_machine_check
> -> mce_log
> -> atomic_notifier_call_chain(&x86_mce_decoder_chain ...)
>-> ...
> -> extlog_print
> -> print_extlog_rcd
> -> __print_extlog_rcd
>-> printk
>
>
On 2013/10/18 20:37, Naveen N. Rao wrote:
> On 10/18/2013 01:53 PM, Chen, Gong wrote:
>> This H/W error log driver (a.k.a eMCA driver) is implemented based on
>> http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html
>>
>> After errors are captured,
On 10/22/2013 12:33 AM, Luck, Tony wrote:
But yes, this is possible and it would make it all even cleaner
and simpler by simply not needing the reg/dereg interfaces for
mce_ext_err_print but adding it to the chain.
So this is on top of the 9 patch series (using the V4 that Chen Gong
posted for
On Mon, Oct 21, 2013 at 03:39:20PM -0700, Tony Luck wrote:
> I folded that back into the series. Also switched out the test on
> whether to print the "No further action is required" message to only
> do so for corrected errors. Cleaned up some of the commit messages,
>
> The result is sitting at:
>
On Mon, Oct 21, 2013 at 12:03 PM, Luck, Tony wrote:
> So this is on top of the 9 patch series (using the V4 that Chen Gong
> posted for part 4/9 and V3 for all the others). Obviously it should
> be folded back into the series if we go this way.
>
> It's a bit simplistic right now - the registered
> But yes, this is possible and it would make it all even cleaner
> and simpler by simply not needing the reg/dereg interfaces for
> mce_ext_err_print but adding it to the chain.
So this is on top of the 9 patch series (using the V4 that Chen Gong
posted for part 4/9 and V3 for all the others). O
On 10/20/2013 01:51 PM, Borislav Petkov wrote:
On Sun, Oct 20, 2013 at 03:06:15AM -0400, Chen Gong wrote:
Oh, yes it is. Furthermore, it reminds me where is the best place
to put cper.c from I write this patch series. CPER really doesn't
dpend on APEI even ACPI. Maybe lib/ ia an option. I can up
Btw, your mailer is generating that Mail-Followup-To header which
removes you from the To: list and puts everyone else on To: instead.
And of course, the patches you've sent with git-send-email don't have
that header and replying to all there is fine.
And Tony's replies don't have it so replying
[...]
> >diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> >index 22327e6..c67ec61 100644
> >--- a/drivers/acpi/Kconfig
> >+++ b/drivers/acpi/Kconfig
> >@@ -372,4 +372,24 @@ config ACPI_BGRT
> >
> > source "drivers/acpi/apei/Kconfig"
> >
> >+config ACPI_EXTLOG
> >+tristate "Extended
..@vger.kernel.org,
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86
> platform
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101
> Thunderbird/24.0
>
[...]
> >+
> >+MODULE_AUTHOR(
On Fri, Oct 18, 2013 at 10:22:26PM +, Luck, Tony wrote:
> @@ -154,6 +154,10 @@ void mce_log(struct mce *mce)
> /* Emit the trace record: */
> trace_mce_record(mce);
>
> + if (mce_ext_err_print)
> + if (mce_ext_err_print(NULL, m.extcpu, i))
> + r
@@ -154,6 +154,10 @@ void mce_log(struct mce *mce)
/* Emit the trace record: */
trace_mce_record(mce);
+ if (mce_ext_err_print)
+ if (mce_ext_err_print(NULL, m.extcpu, i))
+ return;
+
ret = atomic_notifier_call_chain(&x86_mce_decod
On Fri, Oct 18, 2013 at 08:57:22PM +, Luck, Tony wrote:
> Long term ... I'd be happy to see mce_log() go away. But we need to
> have a robust, well tested replacement in place for some time before
> such a move is up for discussion.
Basically a userspace daemon consuming the tracepoint or plur
> Hmm, that's a good question you raise: but the more important question
> is, do you guys - Gong and Tony - want to replace the logging we're
> already doing, i.e. mce_log() with extlog or not.
Long term ... I'd be happy to see mce_log() go away. But we need to have
a robust, well tested replace
On Fri, Oct 18, 2013 at 06:07:56PM +0530, Naveen N. Rao wrote:
> >@@ -624,6 +641,9 @@ void machine_check_poll(enum mcp_flags flags,
> >mce_banks_t *b)
> > (m.status & (mca_cfg.ser ? MCI_STATUS_S : MCI_STATUS_UC)))
> > continue;
> >
> >+if (mce_ext_er
On 10/18/2013 01:53 PM, Chen, Gong wrote:
This H/W error log driver (a.k.a eMCA driver) is implemented based on
http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html
After errors are captured, more valuable information can be
got via this new enh
21 matches
Mail list logo