** Changed in: linux (Ubuntu)
     Assignee: Chris J Arges (arges) => (unassigned)

** Changed in: linux (Ubuntu Vivid)
     Assignee: (unassigned) => Chris J Arges (arges)

** Changed in: linux (Ubuntu Vivid)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Vivid)
       Status: New => In Progress

** Changed in: linux (Ubuntu)
       Status: In Progress => New

** Changed in: linux (Ubuntu)
   Importance: High => Undecided

** Description changed:

+ SRU Justification:
+ [Impact]
+ Users of 3.19 kernel with power8 machines get a kernel crash on boot.
+ 
+ [Test Case]
+ Boot system.
+ 
+ [Fix]
+ commit 792f96e9a769b799a2944e9369e4ea1e467135b2 needed to be backported in 
addition to d7cf83fcaf1b1668201eae4cdd6e6fe7a2448654. Our 3.19 kernel had a 
partial backport of the first patch.
+ 
+ --
+ 
+ 
  ---Problem Description---
  Installed Ubuntu 14.04.3 LTS on Palmetto and its crashing after booting to 
login.
  This happens every time I boot Ubuntu 14.04.3 LTS.  I've reinstalled Ubuntu 
and replaced the hard disk as well and re-installed.  Still crashing.
-  
+ 
  ---uname output---
  Linux paul40 3.19.0-26-generic #28~14.04.1-Ubuntu SMP Wed Aug 12 14:10:52 UTC 
2015 ppc64le ppc64le ppc64le GNU/Linux
-  
- Machine Type = Palmetto 
-  
+ 
+ Machine Type = Palmetto
+ 
  ---System Hang---
-  Ubuntu OS crashes and cannot access host. Must reboot system
-   
+  Ubuntu OS crashes and cannot access host. Must reboot system
+ 
  ---Steps to Reproduce---
-  Boot system
-   
+  Boot system
+ 
  Oops output:
-  [   33.132376] Unable to handle kernel paging request for data at address 
0x200000000000000
-     [   33.132565] Faulting instruction address: 0xc0000000000dbc60
-     [   33.133422] Oops: Kernel access of bad area, sig: 11 [#1]
-     [   33.134410] SMP NR_CPUS=2048 NUMA PowerNV
-     [   33.134478] Modules linked in: ast ttm drm_kms_helper joydev mac_hid 
drm hid_generic usbhid hid syscopyarea sysfillrect sysimgblt i2c_algo_bit 
ofpart cmdlinepart at24 uio_pdrv_genirq powernv_flash mtd ipmi_powernv 
powernv_rng opal_prd ipmi_msghandler uio uas usb_storage ahci libahci
-     [   33.139112] CPU: 24 PID: 0 Comm: swapper/24 Not tainted 
3.19.0-26-generic #28~14.04.1-Ubuntu
-     [   33.139943] task: c0000000013cccb0 ti: c000000fff700000 task.ti: 
c000000001448000
-     [   33.141642] NIP: c0000000000dbc60 LR: c0000000000dbd94 CTR: 
0000000000000000
-     [   33.142605] REGS: c000000fff703980 TRAP: 0300   Not tainted  
(3.19.0-26-generic)
-     [   33.143417] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 
28002888  XER: 00000000
-     [   33.144244] CFAR: c000000000008468 DAR: 0200000000000000 DSISR: 
40000000 SOFTE: 0 
-     GPR00: c0000000000dbd94 c000000fff703c00 c00000000144cc00 
c0000000015f03c0 
-     GPR04: 0000000000000007 c0000000015f03b8 ffffffffffffffff 
0000000000000000 
-     GPR08: 0000000000000000 0200000000000000 c00000000006c394 
9000000000001003 
-     GPR12: 0000000000002200 c00000000fb8d800 0000000000000058 
0000000000000000 
-     GPR16: c000000001448000 c000000001448000 c000000001448080 
c000000000e9a880 
-     GPR20: c000000001448080 0000000000000001 0000000000000002 
0000000000000012 
-     GPR24: c000000f1e432200 0000000000000000 0000000000000000 
c0000000015f03b8 
-     GPR28: 0000000000000007 0000000000000000 c0000000015f03c0 
ffffffffffffffff 
-     [   33.157013] NIP [c0000000000dbc60] notifier_call_chain+0x70/0x100
-     [   33.157818] LR [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60
-     [   33.162090] Call Trace:
-     [   33.162845] [c000000fff703c00] [0000000000000008] 0x8 (unreliable)
-     [   33.163644] [c000000fff703c50] [c0000000000dbd94] 
atomic_notifier_call_chain+0x44/0x60
-     [   33.164647] [c000000fff703c90] [c00000000006f2a8] 
opal_message_notify+0xa8/0x100
-     [   33.165476] [c000000fff703d00] [c0000000000dbc88] 
notifier_call_chain+0x98/0x100
-     [   33.167007] [c000000fff703d50] [c0000000000dbd94] 
atomic_notifier_call_chain+0x44/0x60
-     [   33.167816] [c000000fff703d90] [c00000000006f654] 
opal_do_notifier.part.5+0x74/0xa0
-     [   33.172166] [c000000fff703dd0] [c00000000006f6d8] 
opal_interrupt+0x58/0x70
-     [   33.172997] [c000000fff703e10] [c0000000001273d0] 
handle_irq_event_percpu+0x90/0x2b0
-     [   33.174507] [c000000fff703ed0] [c000000000127658] 
handle_irq_event+0x68/0xd0
-     [   33.175312] [c000000fff703f00] [c00000000012baf4] 
handle_fasteoi_irq+0xe4/0x240
-     [   33.176124] [c000000fff703f30] [c0000000001265c8] 
generic_handle_irq+0x58/0x90
-     [   33.176936] [c000000fff703f60] [c000000000010f10] __do_irq+0x80/0x190
-     [   33.182406] [c000000fff703f90] [c00000000002476c] call_do_irq+0x14/0x24
-     [   33.183258] [c00000000144ba30] [c0000000000110c0] do_IRQ+0xa0/0x120
-     [   33.184072] [c00000000144ba90] [c0000000000025d8] 
hardware_interrupt_common+0x158/0x180
-     [   33.184907] --- interrupt: 501 at arch_local_irq_restore+0x5c/0x90
-     [   33.184907]     LR = arch_local_irq_restore+0x40/0x90
-     [   33.186473] [c00000000144bd80] [c000000f2ae19808] 0xc000000f2ae19808 
(unreliable)
-     [   33.188024] [c00000000144bda0] [c00000000085d5d8] 
cpuidle_enter_state+0xa8/0x260
-     [   33.192695] [c00000000144be00] [c000000000108be8] 
cpu_startup_entry+0x488/0x4e0
-     [   33.193543] [c00000000144bee0] [c00000000000bdb4] rest_init+0xa4/0xc0
-     [   33.194327] [c00000000144bf00] [c000000000da3e80] 
start_kernel+0x53c/0x558
-     [   33.195084] [c00000000144bf90] [c000000000008c6c] 
start_here_common+0x20/0xa8
-     [   33.196569] Instruction dump:
-     [   33.196619] 7cfd3b78 60000000 60000000 e93e0000 2fa90000 419e00a4 
2fbf0000 419e009c 
-     [   33.197605] 2e3d0000 60000000 60000000 60420000 <e9490000> ebc90008 
7d234b78 7f84e378 
-     [   33.202763] ---[ end trace 71076895a9f126ba ]---
-     [   33.202836] 
-     [   35.203605] Kernel panic - not syncing: Fatal exception in interrupt
-     [   35.203727] drm_kms_helper: panic occurred, switching back to text 
console
-     [   35.204692] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt
-  
- Ah! This is due to notifier chain array overflow while handling opal message. 
The upstream commit 792f96e fixes this issue.. But what I see is the commit 
792f96e has been partially applied to ubuntu 14.04.3 kernel sources. And hence 
you are seeing this issue. 
+  [   33.132376] Unable to handle kernel paging request for data at address 
0x200000000000000
+     [   33.132565] Faulting instruction address: 0xc0000000000dbc60
+     [   33.133422] Oops: Kernel access of bad area, sig: 11 [#1]
+     [   33.134410] SMP NR_CPUS=2048 NUMA PowerNV
+     [   33.134478] Modules linked in: ast ttm drm_kms_helper joydev mac_hid 
drm hid_generic usbhid hid syscopyarea sysfillrect sysimgblt i2c_algo_bit 
ofpart cmdlinepart at24 uio_pdrv_genirq powernv_flash mtd ipmi_powernv 
powernv_rng opal_prd ipmi_msghandler uio uas usb_storage ahci libahci
+     [   33.139112] CPU: 24 PID: 0 Comm: swapper/24 Not tainted 
3.19.0-26-generic #28~14.04.1-Ubuntu
+     [   33.139943] task: c0000000013cccb0 ti: c000000fff700000 task.ti: 
c000000001448000
+     [   33.141642] NIP: c0000000000dbc60 LR: c0000000000dbd94 CTR: 
0000000000000000
+     [   33.142605] REGS: c000000fff703980 TRAP: 0300   Not tainted  
(3.19.0-26-generic)
+     [   33.143417] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 
28002888  XER: 00000000
+     [   33.144244] CFAR: c000000000008468 DAR: 0200000000000000 DSISR: 
40000000 SOFTE: 0
+     GPR00: c0000000000dbd94 c000000fff703c00 c00000000144cc00 c0000000015f03c0
+     GPR04: 0000000000000007 c0000000015f03b8 ffffffffffffffff 0000000000000000
+     GPR08: 0000000000000000 0200000000000000 c00000000006c394 9000000000001003
+     GPR12: 0000000000002200 c00000000fb8d800 0000000000000058 0000000000000000
+     GPR16: c000000001448000 c000000001448000 c000000001448080 c000000000e9a880
+     GPR20: c000000001448080 0000000000000001 0000000000000002 0000000000000012
+     GPR24: c000000f1e432200 0000000000000000 0000000000000000 c0000000015f03b8
+     GPR28: 0000000000000007 0000000000000000 c0000000015f03c0 ffffffffffffffff
+     [   33.157013] NIP [c0000000000dbc60] notifier_call_chain+0x70/0x100
+     [   33.157818] LR [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60
+     [   33.162090] Call Trace:
+     [   33.162845] [c000000fff703c00] [0000000000000008] 0x8 (unreliable)
+     [   33.163644] [c000000fff703c50] [c0000000000dbd94] 
atomic_notifier_call_chain+0x44/0x60
+     [   33.164647] [c000000fff703c90] [c00000000006f2a8] 
opal_message_notify+0xa8/0x100
+     [   33.165476] [c000000fff703d00] [c0000000000dbc88] 
notifier_call_chain+0x98/0x100
+     [   33.167007] [c000000fff703d50] [c0000000000dbd94] 
atomic_notifier_call_chain+0x44/0x60
+     [   33.167816] [c000000fff703d90] [c00000000006f654] 
opal_do_notifier.part.5+0x74/0xa0
+     [   33.172166] [c000000fff703dd0] [c00000000006f6d8] 
opal_interrupt+0x58/0x70
+     [   33.172997] [c000000fff703e10] [c0000000001273d0] 
handle_irq_event_percpu+0x90/0x2b0
+     [   33.174507] [c000000fff703ed0] [c000000000127658] 
handle_irq_event+0x68/0xd0
+     [   33.175312] [c000000fff703f00] [c00000000012baf4] 
handle_fasteoi_irq+0xe4/0x240
+     [   33.176124] [c000000fff703f30] [c0000000001265c8] 
generic_handle_irq+0x58/0x90
+     [   33.176936] [c000000fff703f60] [c000000000010f10] __do_irq+0x80/0x190
+     [   33.182406] [c000000fff703f90] [c00000000002476c] call_do_irq+0x14/0x24
+     [   33.183258] [c00000000144ba30] [c0000000000110c0] do_IRQ+0xa0/0x120
+     [   33.184072] [c00000000144ba90] [c0000000000025d8] 
hardware_interrupt_common+0x158/0x180
+     [   33.184907] --- interrupt: 501 at arch_local_irq_restore+0x5c/0x90
+     [   33.184907]     LR = arch_local_irq_restore+0x40/0x90
+     [   33.186473] [c00000000144bd80] [c000000f2ae19808] 0xc000000f2ae19808 
(unreliable)
+     [   33.188024] [c00000000144bda0] [c00000000085d5d8] 
cpuidle_enter_state+0xa8/0x260
+     [   33.192695] [c00000000144be00] [c000000000108be8] 
cpu_startup_entry+0x488/0x4e0
+     [   33.193543] [c00000000144bee0] [c00000000000bdb4] rest_init+0xa4/0xc0
+     [   33.194327] [c00000000144bf00] [c000000000da3e80] 
start_kernel+0x53c/0x558
+     [   33.195084] [c00000000144bf90] [c000000000008c6c] 
start_here_common+0x20/0xa8
+     [   33.196569] Instruction dump:
+     [   33.196619] 7cfd3b78 60000000 60000000 e93e0000 2fa90000 419e00a4 
2fbf0000 419e009c
+     [   33.197605] 2e3d0000 60000000 60000000 60420000 <e9490000> ebc90008 
7d234b78 7f84e378
+     [   33.202763] ---[ end trace 71076895a9f126ba ]---
+     [   33.202836]
+     [   35.203605] Kernel panic - not syncing: Fatal exception in interrupt
+     [   35.203727] drm_kms_helper: panic occurred, switching back to text 
console
+     [   35.204692] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt
+ 
+ Ah! This is due to notifier chain array overflow while handling opal
+ message. The upstream commit 792f96e fixes this issue.. But what I see
+ is the commit 792f96e has been partially applied to ubuntu 14.04.3
+ kernel sources. And hence you are seeing this issue.
  
  commit 792f96e9a769b799a2944e9369e4ea1e467135b2
  Author: Neelesh Gupta <neele...@linux.vnet.ibm.com>
  Date:   Wed Feb 11 11:57:06 2015 +0530
  
-     powerpc/powernv: Fix the overflow of OPAL message notifiers head array
-     
-     Fixes the condition check of incoming message type which can
-     otherwise shoot beyond the message notifiers head array.
-     
-     Signed-off-by: Neelesh Gupta <neele...@linux.vnet.ibm.com>
-     Reviewed-by: Vasant Hegde <hegdevas...@linux.vnet.ibm.com>
-     Reviewed-by: Anshuman Khandual <khand...@linux.vnet.ibm.com>
-     Signed-off-by: Benjamin Herrenschmidt <b...@kernel.crashing.org>
+     powerpc/powernv: Fix the overflow of OPAL message notifiers head
+ array
+ 
+     Fixes the condition check of incoming message type which can
+     otherwise shoot beyond the message notifiers head array.
+ 
+     Signed-off-by: Neelesh Gupta <neele...@linux.vnet.ibm.com>
+     Reviewed-by: Vasant Hegde <hegdevas...@linux.vnet.ibm.com>
+     Reviewed-by: Anshuman Khandual <khand...@linux.vnet.ibm.com>
+     Signed-off-by: Benjamin Herrenschmidt <b...@kernel.crashing.org>
  
  Below is the hunk from above commit, which is missing from ubuntu 14.04.3:
  ------------------------------------------------
  @@ -354,7 +350,7 @@ static void opal_handle_message(void)
-         type = be32_to_cpu(msg.msg_type);
-  
-         /* Sanity check */
+         type = be32_to_cpu(msg.msg_type);
+ 
+         /* Sanity check */
  -       if (type > OPAL_MSG_TYPE_MAX) {
  +       if (type >= OPAL_MSG_TYPE_MAX) {
-                 pr_warning("%s: Unknown message type: %u\n", __func__, type);
-                 return;
-         }
+                 pr_warning("%s: Unknown message type: %u\n", __func__, type);
+                 return;
+         }
  ------------------------------------------------
  
  I just checked. The above hunk can be cleanly applied to ubuntu 14.04.3
  kernel sources.  We should mirror this bug to ubuntu and ask them to
  apply above hunk.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1487085

Title:
  Ubuntu 14.04.3 LTS Crash in notifier_call_chain after boot

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1487085/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to