[Xen-API] kernel bug report: panic on MCE on recoverable ECC error

George Shuklin Wed, 21 Aug 2013 13:22:08 -0700

Good day.

Today we've god very unfunny bug.

Server with multibit ECC memory catch recoverable error. It was loggedto IMPI SEL (system event log) on hardware and sent to OS on that host.

Usually those messages simply prints event to dmesg and admins lazilyevacuate host and replace memory (or even ignore single error). Thoseerrors are non-fatal.

But we've got panic and trace inarch/x86/kernel/cpu/mcheck/mce_dom0.c:39 convert_log with followingreboot instead of harmless dmesg message.

Trace is below, but my question is: where I should send bugreports oncitrix kernel?

Aug 21 18:07:14 10.1.3.44 [ 7162.416812] ------------[ cut here]------------Aug 21 18:07:14 10.1.3.44 [ 7162.420590] WARNING: atarch/x86/kernel/cpu/mcheck/mce_dom0.c:39 convert_log+0x199/0x1b0()

Aug 21 18:07:14 10.1.3.44 [ 7162.421332] Hardware name: X9DRT-HF+
(modules skip)
Aug 21 18:07:14 10.1.3.44 [last unloaded: microcode]
Aug 21 18:07:14 10.1.3.44

Aug 21 18:07:14 10.1.3.44 [ 7162.520280] Pid: 0, comm: swapper Nottainted 2.6.32.43-0.4.1.xs1.6.10.741.170752xen #1

Aug 21 18:07:14 10.1.3.44 [ 7162.521387] Call Trace:

Aug 21 18:07:14 10.1.3.44 [ 7162.522144] [<c0110169>] ?convert_log+0x199/0x1b0Aug 21 18:07:14 10.1.3.44 [ 7162.522882] [<c01343f1>]warn_slowpath_common+0x81/0xa0Aug 21 18:07:14 10.1.3.44 [ 7162.524701] [<c0110169>] ?convert_log+0x199/0x1b0Aug 21 18:07:14 10.1.3.44 [ 7162.525433] [<c013442a>]warn_slowpath_null+0x1a/0x20Aug 21 18:07:14 10.1.3.44 [ 7162.525813] [<c0110169>]convert_log+0x199/0x1b0Aug 21 18:07:14 10.1.3.44 [ 7162.526177] [<c0110223>]mce_dom0_interrupt+0xa3/0x120Aug 21 18:07:14 10.1.3.44 [ 7162.526211] [<c016a7c5>]handle_IRQ_event+0x55/0x180Aug 21 18:07:14 10.1.3.44 [ 7162.526592] [<c016a7c5>] ?handle_IRQ_event+0x55/0x180Aug 21 18:07:14 10.1.3.44 [ 7162.528148] [<c016cc4a>]handle_level_irq+0x8a/0x130

Aug 21 18:07:14 10.1.3.44 [ 7162.528547]  [<c0105ec9>] handle_irq+0x39/0x60

Aug 21 18:07:14 10.1.3.44 [ 7162.528939] [<c03d9645>]evtchn_do_upcall+0x135/0x326Aug 21 18:07:14 10.1.3.44 [ 7162.529671] [<c03d2ed5>] ?schedule+0x375/0xae0Aug 21 18:07:14 10.1.3.44 [ 7162.529703] [<c010477f>]hypervisor_callback+0x43/0x4bAug 21 18:07:14 10.1.3.44 [ 7162.532047] [<c0106b05>] ?xen_safe_halt+0xb5/0x150

Aug 21 18:07:14 10.1.3.44 [ 7162.532840]  [<c010a6ce>] xen_idle+0x2e/0x80
Aug 21 18:07:14 10.1.3.44 [ 7162.533231]  [<c0102acf>] cpu_idle+0x3f/0x70
Aug 21 18:07:14 10.1.3.44 [ 7162.533985]  [<c03c29d2>] rest_init+0x62/0x70

Aug 21 18:07:14 10.1.3.44 [ 7162.535448] [<c056bd05>]start_kernel+0x2a5/0x340Aug 21 18:07:14 10.1.3.44 [ 7162.536189] [<c056b5f0>] ?unknown_bootoption+0x0/0x1f0Aug 21 18:07:14 10.1.3.44 [ 7162.536914] [<c056b07c>]i386_start_kernel+0x7c/0x90Aug 21 18:07:14 10.1.3.44 [ 7162.537654] ---[ end trace 76553ff173258821]---


_______________________________________________
Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

[Xen-API] kernel bug report: panic on MCE on recoverable ECC error

Reply via email to