** Changed in: linux (Ubuntu) Assignee: Chris J Arges (arges) => (unassigned)
** Changed in: linux (Ubuntu Vivid) Assignee: (unassigned) => Chris J Arges (arges) ** Changed in: linux (Ubuntu Vivid) Importance: Undecided => High ** Changed in: linux (Ubuntu Vivid) Status: New => In Progress ** Changed in: linux (Ubuntu) Status: In Progress => New ** Changed in: linux (Ubuntu) Importance: High => Undecided ** Description changed: + SRU Justification: + [Impact] + Users of 3.19 kernel with power8 machines get a kernel crash on boot. + + [Test Case] + Boot system. + + [Fix] + commit 792f96e9a769b799a2944e9369e4ea1e467135b2 needed to be backported in addition to d7cf83fcaf1b1668201eae4cdd6e6fe7a2448654. Our 3.19 kernel had a partial backport of the first patch. + + -- + + ---Problem Description--- Installed Ubuntu 14.04.3 LTS on Palmetto and its crashing after booting to login. This happens every time I boot Ubuntu 14.04.3 LTS. I've reinstalled Ubuntu and replaced the hard disk as well and re-installed. Still crashing. - + ---uname output--- Linux paul40 3.19.0-26-generic #28~14.04.1-Ubuntu SMP Wed Aug 12 14:10:52 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux - - Machine Type = Palmetto - + + Machine Type = Palmetto + ---System Hang--- - Ubuntu OS crashes and cannot access host. Must reboot system - + Ubuntu OS crashes and cannot access host. Must reboot system + ---Steps to Reproduce--- - Boot system - + Boot system + Oops output: - [ 33.132376] Unable to handle kernel paging request for data at address 0x200000000000000 - [ 33.132565] Faulting instruction address: 0xc0000000000dbc60 - [ 33.133422] Oops: Kernel access of bad area, sig: 11 [#1] - [ 33.134410] SMP NR_CPUS=2048 NUMA PowerNV - [ 33.134478] Modules linked in: ast ttm drm_kms_helper joydev mac_hid drm hid_generic usbhid hid syscopyarea sysfillrect sysimgblt i2c_algo_bit ofpart cmdlinepart at24 uio_pdrv_genirq powernv_flash mtd ipmi_powernv powernv_rng opal_prd ipmi_msghandler uio uas usb_storage ahci libahci - [ 33.139112] CPU: 24 PID: 0 Comm: swapper/24 Not tainted 3.19.0-26-generic #28~14.04.1-Ubuntu - [ 33.139943] task: c0000000013cccb0 ti: c000000fff700000 task.ti: c000000001448000 - [ 33.141642] NIP: c0000000000dbc60 LR: c0000000000dbd94 CTR: 0000000000000000 - [ 33.142605] REGS: c000000fff703980 TRAP: 0300 Not tainted (3.19.0-26-generic) - [ 33.143417] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28002888 XER: 00000000 - [ 33.144244] CFAR: c000000000008468 DAR: 0200000000000000 DSISR: 40000000 SOFTE: 0 - GPR00: c0000000000dbd94 c000000fff703c00 c00000000144cc00 c0000000015f03c0 - GPR04: 0000000000000007 c0000000015f03b8 ffffffffffffffff 0000000000000000 - GPR08: 0000000000000000 0200000000000000 c00000000006c394 9000000000001003 - GPR12: 0000000000002200 c00000000fb8d800 0000000000000058 0000000000000000 - GPR16: c000000001448000 c000000001448000 c000000001448080 c000000000e9a880 - GPR20: c000000001448080 0000000000000001 0000000000000002 0000000000000012 - GPR24: c000000f1e432200 0000000000000000 0000000000000000 c0000000015f03b8 - GPR28: 0000000000000007 0000000000000000 c0000000015f03c0 ffffffffffffffff - [ 33.157013] NIP [c0000000000dbc60] notifier_call_chain+0x70/0x100 - [ 33.157818] LR [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60 - [ 33.162090] Call Trace: - [ 33.162845] [c000000fff703c00] [0000000000000008] 0x8 (unreliable) - [ 33.163644] [c000000fff703c50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60 - [ 33.164647] [c000000fff703c90] [c00000000006f2a8] opal_message_notify+0xa8/0x100 - [ 33.165476] [c000000fff703d00] [c0000000000dbc88] notifier_call_chain+0x98/0x100 - [ 33.167007] [c000000fff703d50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60 - [ 33.167816] [c000000fff703d90] [c00000000006f654] opal_do_notifier.part.5+0x74/0xa0 - [ 33.172166] [c000000fff703dd0] [c00000000006f6d8] opal_interrupt+0x58/0x70 - [ 33.172997] [c000000fff703e10] [c0000000001273d0] handle_irq_event_percpu+0x90/0x2b0 - [ 33.174507] [c000000fff703ed0] [c000000000127658] handle_irq_event+0x68/0xd0 - [ 33.175312] [c000000fff703f00] [c00000000012baf4] handle_fasteoi_irq+0xe4/0x240 - [ 33.176124] [c000000fff703f30] [c0000000001265c8] generic_handle_irq+0x58/0x90 - [ 33.176936] [c000000fff703f60] [c000000000010f10] __do_irq+0x80/0x190 - [ 33.182406] [c000000fff703f90] [c00000000002476c] call_do_irq+0x14/0x24 - [ 33.183258] [c00000000144ba30] [c0000000000110c0] do_IRQ+0xa0/0x120 - [ 33.184072] [c00000000144ba90] [c0000000000025d8] hardware_interrupt_common+0x158/0x180 - [ 33.184907] --- interrupt: 501 at arch_local_irq_restore+0x5c/0x90 - [ 33.184907] LR = arch_local_irq_restore+0x40/0x90 - [ 33.186473] [c00000000144bd80] [c000000f2ae19808] 0xc000000f2ae19808 (unreliable) - [ 33.188024] [c00000000144bda0] [c00000000085d5d8] cpuidle_enter_state+0xa8/0x260 - [ 33.192695] [c00000000144be00] [c000000000108be8] cpu_startup_entry+0x488/0x4e0 - [ 33.193543] [c00000000144bee0] [c00000000000bdb4] rest_init+0xa4/0xc0 - [ 33.194327] [c00000000144bf00] [c000000000da3e80] start_kernel+0x53c/0x558 - [ 33.195084] [c00000000144bf90] [c000000000008c6c] start_here_common+0x20/0xa8 - [ 33.196569] Instruction dump: - [ 33.196619] 7cfd3b78 60000000 60000000 e93e0000 2fa90000 419e00a4 2fbf0000 419e009c - [ 33.197605] 2e3d0000 60000000 60000000 60420000 <e9490000> ebc90008 7d234b78 7f84e378 - [ 33.202763] ---[ end trace 71076895a9f126ba ]--- - [ 33.202836] - [ 35.203605] Kernel panic - not syncing: Fatal exception in interrupt - [ 35.203727] drm_kms_helper: panic occurred, switching back to text console - [ 35.204692] ---[ end Kernel panic - not syncing: Fatal exception in interrupt - - Ah! This is due to notifier chain array overflow while handling opal message. The upstream commit 792f96e fixes this issue.. But what I see is the commit 792f96e has been partially applied to ubuntu 14.04.3 kernel sources. And hence you are seeing this issue. + [ 33.132376] Unable to handle kernel paging request for data at address 0x200000000000000 + [ 33.132565] Faulting instruction address: 0xc0000000000dbc60 + [ 33.133422] Oops: Kernel access of bad area, sig: 11 [#1] + [ 33.134410] SMP NR_CPUS=2048 NUMA PowerNV + [ 33.134478] Modules linked in: ast ttm drm_kms_helper joydev mac_hid drm hid_generic usbhid hid syscopyarea sysfillrect sysimgblt i2c_algo_bit ofpart cmdlinepart at24 uio_pdrv_genirq powernv_flash mtd ipmi_powernv powernv_rng opal_prd ipmi_msghandler uio uas usb_storage ahci libahci + [ 33.139112] CPU: 24 PID: 0 Comm: swapper/24 Not tainted 3.19.0-26-generic #28~14.04.1-Ubuntu + [ 33.139943] task: c0000000013cccb0 ti: c000000fff700000 task.ti: c000000001448000 + [ 33.141642] NIP: c0000000000dbc60 LR: c0000000000dbd94 CTR: 0000000000000000 + [ 33.142605] REGS: c000000fff703980 TRAP: 0300 Not tainted (3.19.0-26-generic) + [ 33.143417] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28002888 XER: 00000000 + [ 33.144244] CFAR: c000000000008468 DAR: 0200000000000000 DSISR: 40000000 SOFTE: 0 + GPR00: c0000000000dbd94 c000000fff703c00 c00000000144cc00 c0000000015f03c0 + GPR04: 0000000000000007 c0000000015f03b8 ffffffffffffffff 0000000000000000 + GPR08: 0000000000000000 0200000000000000 c00000000006c394 9000000000001003 + GPR12: 0000000000002200 c00000000fb8d800 0000000000000058 0000000000000000 + GPR16: c000000001448000 c000000001448000 c000000001448080 c000000000e9a880 + GPR20: c000000001448080 0000000000000001 0000000000000002 0000000000000012 + GPR24: c000000f1e432200 0000000000000000 0000000000000000 c0000000015f03b8 + GPR28: 0000000000000007 0000000000000000 c0000000015f03c0 ffffffffffffffff + [ 33.157013] NIP [c0000000000dbc60] notifier_call_chain+0x70/0x100 + [ 33.157818] LR [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60 + [ 33.162090] Call Trace: + [ 33.162845] [c000000fff703c00] [0000000000000008] 0x8 (unreliable) + [ 33.163644] [c000000fff703c50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60 + [ 33.164647] [c000000fff703c90] [c00000000006f2a8] opal_message_notify+0xa8/0x100 + [ 33.165476] [c000000fff703d00] [c0000000000dbc88] notifier_call_chain+0x98/0x100 + [ 33.167007] [c000000fff703d50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60 + [ 33.167816] [c000000fff703d90] [c00000000006f654] opal_do_notifier.part.5+0x74/0xa0 + [ 33.172166] [c000000fff703dd0] [c00000000006f6d8] opal_interrupt+0x58/0x70 + [ 33.172997] [c000000fff703e10] [c0000000001273d0] handle_irq_event_percpu+0x90/0x2b0 + [ 33.174507] [c000000fff703ed0] [c000000000127658] handle_irq_event+0x68/0xd0 + [ 33.175312] [c000000fff703f00] [c00000000012baf4] handle_fasteoi_irq+0xe4/0x240 + [ 33.176124] [c000000fff703f30] [c0000000001265c8] generic_handle_irq+0x58/0x90 + [ 33.176936] [c000000fff703f60] [c000000000010f10] __do_irq+0x80/0x190 + [ 33.182406] [c000000fff703f90] [c00000000002476c] call_do_irq+0x14/0x24 + [ 33.183258] [c00000000144ba30] [c0000000000110c0] do_IRQ+0xa0/0x120 + [ 33.184072] [c00000000144ba90] [c0000000000025d8] hardware_interrupt_common+0x158/0x180 + [ 33.184907] --- interrupt: 501 at arch_local_irq_restore+0x5c/0x90 + [ 33.184907] LR = arch_local_irq_restore+0x40/0x90 + [ 33.186473] [c00000000144bd80] [c000000f2ae19808] 0xc000000f2ae19808 (unreliable) + [ 33.188024] [c00000000144bda0] [c00000000085d5d8] cpuidle_enter_state+0xa8/0x260 + [ 33.192695] [c00000000144be00] [c000000000108be8] cpu_startup_entry+0x488/0x4e0 + [ 33.193543] [c00000000144bee0] [c00000000000bdb4] rest_init+0xa4/0xc0 + [ 33.194327] [c00000000144bf00] [c000000000da3e80] start_kernel+0x53c/0x558 + [ 33.195084] [c00000000144bf90] [c000000000008c6c] start_here_common+0x20/0xa8 + [ 33.196569] Instruction dump: + [ 33.196619] 7cfd3b78 60000000 60000000 e93e0000 2fa90000 419e00a4 2fbf0000 419e009c + [ 33.197605] 2e3d0000 60000000 60000000 60420000 <e9490000> ebc90008 7d234b78 7f84e378 + [ 33.202763] ---[ end trace 71076895a9f126ba ]--- + [ 33.202836] + [ 35.203605] Kernel panic - not syncing: Fatal exception in interrupt + [ 35.203727] drm_kms_helper: panic occurred, switching back to text console + [ 35.204692] ---[ end Kernel panic - not syncing: Fatal exception in interrupt + + Ah! This is due to notifier chain array overflow while handling opal + message. The upstream commit 792f96e fixes this issue.. But what I see + is the commit 792f96e has been partially applied to ubuntu 14.04.3 + kernel sources. And hence you are seeing this issue. commit 792f96e9a769b799a2944e9369e4ea1e467135b2 Author: Neelesh Gupta <neele...@linux.vnet.ibm.com> Date: Wed Feb 11 11:57:06 2015 +0530 - powerpc/powernv: Fix the overflow of OPAL message notifiers head array - - Fixes the condition check of incoming message type which can - otherwise shoot beyond the message notifiers head array. - - Signed-off-by: Neelesh Gupta <neele...@linux.vnet.ibm.com> - Reviewed-by: Vasant Hegde <hegdevas...@linux.vnet.ibm.com> - Reviewed-by: Anshuman Khandual <khand...@linux.vnet.ibm.com> - Signed-off-by: Benjamin Herrenschmidt <b...@kernel.crashing.org> + powerpc/powernv: Fix the overflow of OPAL message notifiers head + array + + Fixes the condition check of incoming message type which can + otherwise shoot beyond the message notifiers head array. + + Signed-off-by: Neelesh Gupta <neele...@linux.vnet.ibm.com> + Reviewed-by: Vasant Hegde <hegdevas...@linux.vnet.ibm.com> + Reviewed-by: Anshuman Khandual <khand...@linux.vnet.ibm.com> + Signed-off-by: Benjamin Herrenschmidt <b...@kernel.crashing.org> Below is the hunk from above commit, which is missing from ubuntu 14.04.3: ------------------------------------------------ @@ -354,7 +350,7 @@ static void opal_handle_message(void) - type = be32_to_cpu(msg.msg_type); - - /* Sanity check */ + type = be32_to_cpu(msg.msg_type); + + /* Sanity check */ - if (type > OPAL_MSG_TYPE_MAX) { + if (type >= OPAL_MSG_TYPE_MAX) { - pr_warning("%s: Unknown message type: %u\n", __func__, type); - return; - } + pr_warning("%s: Unknown message type: %u\n", __func__, type); + return; + } ------------------------------------------------ I just checked. The above hunk can be cleanly applied to ubuntu 14.04.3 kernel sources. We should mirror this bug to ubuntu and ask them to apply above hunk. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1487085 Title: Ubuntu 14.04.3 LTS Crash in notifier_call_chain after boot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1487085/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs