SRU Request Submitted: https://lists.ubuntu.com/archives/kernel-team/2018-June/093483.html
** Description changed: + == SRU Justification == + IBM is seeing kernel traces during testing. This is due to a missing + backport of some kernel fixes in the RTC driver, which is commit + 682e6b4da5cb. Commit 682e6b4da5cb was also cc'd to upstream stable, but + it has not landed in Bionic as of yet. It is also a fix to upstream + commit 628daa8d5abf. + + Commit 34dd25de9fe3 is also needed as a prereq to define + OPAL_BUSY_DELAY_MS. + + == Fixes == + 34dd25de9fe3 ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops") + 682e6b4da5cb ("rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops") + + == Regression Potential == + Low. Limited to powerpc. Fixes a current regression. + + == Test Case == + A test kernel was built with these patches and tested by the original bug reporter. + The bug reporter states the test kernel resolved the bug. + + == Comment: #0 - PAVAMAN SUBRAMANIYAM <pavsu...@in.ibm.com> - 2018-05-23 01:15:30 == Install a P9 Open Power Hardware with the latest OP920 Firmware images provided in the following link: http://pfd.austin.ibm.com/releasenotes/openpower9/OP920/OP920_1808A/OP920_1808N_RelNote_Main.html root@witherspoon:~# cat /etc/os-release ID="openbmc-phosphor" NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)" VERSION="ibm-v2.1" VERSION_ID="ibm-v2.1-438-g0030304-r12-0-g5ee4fb0" PRETTY_NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) ibm-v2.1" BUILD_ID="ibm-v2.1-438-g0030304-r12" root@witherspoon:~# cat /var/lib/phosphor-software-manager/pnor/ro/VERSION IBM-witherspoon-ibm-OP9-v2.0-2.14 - op-build-v2.0-11-gb248194-dirty - buildroot-2018.02.1-6-ga8d1126 - skiboot-v6.0.1 - hostboot-8ab6717d-pfc036fa - occ-77bb5e6 - linux-4.16.8-openpower2-pb532d68 - petitboot-v1.7.1-p1188545 - machine-xml-7cd20a6 - hostboot-binaries-276bb70 - capp-ucode-p9-dd2-v4 - sbe-a596975 - hcode-b8173e8 + op-build-v2.0-11-gb248194-dirty + buildroot-2018.02.1-6-ga8d1126 + skiboot-v6.0.1 + hostboot-8ab6717d-pfc036fa + occ-77bb5e6 + linux-4.16.8-openpower2-pb532d68 + petitboot-v1.7.1-p1188545 + machine-xml-7cd20a6 + hostboot-binaries-276bb70 + capp-ucode-p9-dd2-v4 + sbe-a596975 + hcode-b8173e8 Seeing the following messages in the dmesg logs. [ 16.377405] ipmi_si: Unable to find any System Interface(s) [ 17.384118] nf_conntrack version 0.5.0 (65536 buckets, 262144 max) [ 1372.711730] INFO: rcu_sched self-detected stall on CPU [ 1372.711787] 32-....: (5249 ticks this GP) idle=182/140000000000001/0 softirq=1093/1093 fqs=2623 [ 1372.711863] (t=5250 jiffies g=22430 c=22429 q=953) [ 1372.711921] Task dump for CPU 32: [ 1372.711922] kworker/32:1 R running task 0 1123 2 0x00000804 [ 1372.711930] Workqueue: events rtc_timer_do_work [ 1372.711931] Call Trace: [ 1372.711934] [c000003fd2b97350] [c00000000014a8f8] sched_show_task.part.16+0xd8/0x110 (unreliable) [ 1372.711939] [c000003fd2b973c0] [c0000000001aa8bc] rcu_dump_cpu_stacks+0xd4/0x138 [ 1372.711942] [c000003fd2b97410] [c0000000001a9988] rcu_check_callbacks+0x8e8/0xb40 [ 1372.711945] [c000003fd2b97540] [c0000000001b7c28] update_process_times+0x48/0x90 [ 1372.711948] [c000003fd2b97570] [c0000000001cf974] tick_sched_handle.isra.5+0x34/0xd0 [ 1372.711950] [c000003fd2b975a0] [c0000000001cfa70] tick_sched_timer+0x60/0xe0 [ 1372.711953] [c000003fd2b975e0] [c0000000001b87d4] __hrtimer_run_queues+0x144/0x370 [ 1372.711956] [c000003fd2b97660] [c0000000001b972c] hrtimer_interrupt+0xfc/0x350 [ 1372.711959] [c000003fd2b97730] [c0000000000249f0] __timer_interrupt+0x90/0x260 [ 1372.711962] [c000003fd2b97780] [c000000000024e08] timer_interrupt+0x98/0xe0 [ 1372.711965] [c000003fd2b977b0] [c000000000009054] decrementer_common+0x114/0x120 [ 1372.711970] --- interrupt: 901 at opal_get_rtc_time+0x98/0x110 - LR = opal_return+0x14/0x48 + LR = opal_return+0x14/0x48 [ 1372.711972] [c000003fd2b97aa0] [c000000000a457b8] opal_get_rtc_time+0x98/0x110 (unreliable) [ 1372.711975] [c000003fd2b97ae0] [c000000000a3f98c] __rtc_read_time+0x7c/0x180 [ 1372.711977] [c000003fd2b97b60] [c000000000a41738] rtc_timer_do_work+0x78/0x250 [ 1372.711980] [c000003fd2b97c90] [c000000000134378] process_one_work+0x298/0x5a0 [ 1372.711982] [c000003fd2b97d20] [c000000000134718] worker_thread+0x98/0x630 [ 1372.711985] [c000003fd2b97dc0] [c00000000013d348] kthread+0x1a8/0x1b0 [ 1372.711988] [c000003fd2b97e30] [c00000000000b658] ret_from_kernel_thread+0x5c/0x84 == Comment: #1 - PAVAMAN SUBRAMANIYAM <pavsu...@in.ibm.com> - 2018-05-23 01:31:06 == - == Comment: #2 - Application Cdeadmin <cdead...@us.ibm.com> - 2018-05-23 01:33:40 == cde00 (cdead...@us.ibm.com) added native attachment /tmp/AIXOS07311082/dmesg.txt on 2018-05-23 01:33:33 == Comment: #3 - Application Cdeadmin <cdead...@us.ibm.com> - 2018-05-24 16:45:41 == ==== State: Open by: jayeshp on 24 May 2018 16:42:57 ==== #=#=# 2018-05-24 16:42:54 (CDT) #=#=# New Fix_Potential = [P920.10W] #=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=# == Comment: #4 - Stewart Smith <sesm...@au1.ibm.com> - 2018-05-30 21:15:15 == This'll be a missing backport of some kernel fixes in the RTC driver. It's at least this commit: commit 682e6b4da5cbe8e9a53f979a58c2a9d7dc997175 Author: Nicholas Piggin <npig...@gmail.com> Date: Tue Apr 10 21:49:32 2018 +1000 - rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops - - The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or - OPAL_BUSY_EVENT from firmware, which causes large scheduling - latencies, up to 50 seconds have been observed here when RTC stops - responding (BMC reboot can do it). - - Fix this by converting it to the standard form OPAL_BUSY loop that - sleeps. - - Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") - Cc: sta...@vger.kernel.org # v3.2+ - Signed-off-by: Nicholas Piggin <npig...@gmail.com> - Acked-by: Alexandre Belloni <alexandre.bell...@bootlin.com> - Signed-off-by: Michael Ellerman <m...@ellerman.id.au> + rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops + + The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or + OPAL_BUSY_EVENT from firmware, which causes large scheduling + latencies, up to 50 seconds have been observed here when RTC stops + responding (BMC reboot can do it). + + Fix this by converting it to the standard form OPAL_BUSY loop that + sleeps. + + Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") + Cc: sta...@vger.kernel.org # v3.2+ + Signed-off-by: Nicholas Piggin <npig...@gmail.com> + Acked-by: Alexandre Belloni <alexandre.bell...@bootlin.com> + Signed-off-by: Michael Ellerman <m...@ellerman.id.au> -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1777857 Title: [LTCTest][OPAL][OP920] INFO: rcu_sched self-detected stall on CPU To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1777857/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs