[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
** Changed in: ubuntu-power-systems Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
The patch has been applied to Noble tree. It will be released in the next SRU cycle. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
** Changed in: linux (Ubuntu Noble) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
We received one ACK from the Canonical kernel team: https://lists.ubuntu.com/archives/kernel-team/2025-January/156676.html to get the fix/patch picked up for noble. ** Changed in: linux (Ubuntu Noble) Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team) ** Changed in: linux (Ubuntu Noble) Importance: Undecided => High ** Also affects: linux (Ubuntu Oracular) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Plucky) Importance: High Assignee: Canonical Kernel Team (canonical-kernel-team) Status: In Progress ** Changed in: linux (Ubuntu Oracular) Status: New => Fix Released ** Changed in: linux (Ubuntu Plucky) Status: In Progress => Fix Released ** Changed in: linux (Ubuntu Oracular) Importance: Undecided => High ** Changed in: linux (Ubuntu Plucky) Assignee: Canonical Kernel Team (canonical-kernel-team) => (unassigned) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
** Tags removed: targetmilestone-inin--- ** Tags added: targetmilestone-inin2404 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
** Also affects: linux (Ubuntu Noble) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Noble) Status: New => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
Patch was submitted to the Ubuntu kernel team mailing list for noble: https://lists.ubuntu.com/archives/kernel-team/2025-January/thread.html#156470 Changing status to 'In Progress'. ** Changed in: linux (Ubuntu) Importance: Undecided => High ** Changed in: ubuntu-power-systems Importance: Undecided => High ** Changed in: linux (Ubuntu) Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) => Canonical Kernel Team (canonical-kernel-team) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
Successful kernel test builds are available in this PPA: https://launchpad.net/~fheimes/+archive/ubuntu/lp2077722 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
** Description changed: + SRU Justification: + == + + [Impact] + + * L2 guest(s) (nested virtualization) running stress-ng getting stuck +at booting after triggering crash. + + * When for example having two Ubuntu 24.04 guests and running +stress-ng (90% load) on both and triggering crash simultaneously, +1st guest gets stuck and does not boot up. + + * In one of the attempts, both the guests got stuck on booting with + console hang. + + [Fix] + + * a373830f96db a373830f96db288a3eb43a8692b6bcd0bd88dfe1 +"KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to avoid spurious interrupts" + + [Test Plan] + + * An Ubuntu Server 24.04 LPAR installation, acting as KVM host, +on IBM Power 10 hardware (with nested KVM capable FW1060 or never) is needed. + + * On top two (or more) KVM guests (now nested), again running 24.04, +need to be setup. + + * Run the attached stress-ng.sh script on both KVM guests. + + * Trigger crash(es) on both KVM guests at the same time: +echo c >/proc/sysrq-trigger + + * At least one KVM guest (sometimes both) are now stuck while rebooting, +without the above patch in place. + + [Where problems could occur] + + * The changes are in arch/powerpc/kvm/book3s_hv.c only, +hence are ppc specific and do not affect any other architecture. + + * The net changes are more or less only two effective code lines; +and additional else case and the explicit masking off the 'MER' bit. + + * Wrong assumptions may have a different impact on KVM gusts (L0), +or interfere with any other virtualization level. + + * But the commit is an upstream accepted fix +[for ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0")] +that landed in kernel 6.12 and was also accepted as stable update +for kernels v6.8+. + + [Other Info] + + * This fix/commit discussed here will be part of the planned +target kernel for plucky, hence plucky/25.04 is not affected. + + * The fix/commit is already included in oracular master-next +as 08cbc81b9a61 and included starting with kernel Ubuntu-6.11.0-17.17. + + * With that only noble needs to be fixed (since this nested virtualization +scenario is not supported by Ubuntu prior to noble). + + * Since the fix is upstream marked as stable update, +it would usually be picked up by the kernel team automatically. + + * But to not loose the 24.04.2 window out of sight I was asked +to submit this patch separately. + + __ + Problem: - While bringing up 2 Ubuntu 24.04 guests and running stress-ng (90% load) on both and triggering crash simultaneously, 1st guest gets stuck and does not boot up. In one of the attempts, both the guests got stuck on booting with console hang. + While bringing up 2 Ubuntu 24.04 guests and running stress-ng (90% load) on both and triggering crash simultaneously, 1st guest gets stuck and does not boot up. In one of the attempts, both the guests got stuck on booting with console hang. Attempts: Reproducible 3/3 consecutive times - Run 1: L2-1 guest got stuck + Run 1: L2-1 guest got stuck Run 2: L2-1 guest got stuck Run 3: L2-1 and L2-2 guest got stuck - = L1 Host: 1. PowerVM 2. OS: Ubuntu 24.04 3. Kernel: 6.8.0-31-generic 4. Mem (free -mh): 47Gi 5. cpus: 40 Guest L2-1: 1. OS: Ubuntu 24.04 2. Kernel: 6.8.0-31-generic 3. Mem (free -mh): 9.5Gi 4. cpus: 8 5. Stress: stress-ng - 90% load 6. XML configuration: -16 -10971520 - + 16 + 10971520 + Guest L2-2: 1. OS: Ubuntu 24.04 2. Kernel: 6.8.0-31-generic 3. Mem (free -mh): 9.5Gi 4. cpus: 8 5. Stress: stress-ng - 90% load 6. XML configuration: -16 -10971520 - - + 16 + 10971520 + = Steps to reproduce: 1. Bring up 2 Ubuntu 24.04 L2 guests with configuration mentioned as above 2. Run the attached stress-ng.sh script on both L2 guests 3. Trigger crash: echo c >/proc/sysrq-trigger on both L2 guests at the same time After triggering the crash, 1 or both guest consoles will get stuck. And then, we will not be able to enter the guest neither shut it down. In oder to boot into the guest, virsh destroy of the guest will be required. - = Run1: Console.log Error message of L2-1 - Booting `Ubuntu' + Booting `Ubuntu' Loading Linux 6.8.0-31-generic ... Loading initial ramdisk ... OF stdout device is: /vdevice/vty@3000 Preparing to boot Linux version 6.8.0-31-generic (buildd@bos02-ppc64el-018) (powerpc64le-linux-gnu-gcc-13 (Ubuntu 13.2.0-23ubuntu4) 13.2.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #31-Ubuntu SMP Sat Apr 20 00:05:55 UTC 2024 (Ubuntu 6.8.0-31.31-generic 6.8.1) Detected machine
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
This was not picked up by the kernel team yet - I just checked the noble master-next tree. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
Commit a373830f96db "KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to avoid spurious interrupts" meanwhile landed in v6.12-rc7. commit a373830f96db288a3eb43a8692b6bcd0bd88dfe1 Author: Gautam Menghani Date: Mon Oct 28 14:34:09 2024 +0530 KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to avoid spurious interrupts Running a L2 vCPU (see [1] for terminology) with LPCR_MER bit set and no pending interrupts results in that L2 vCPU getting an infinite flood of spurious interrupts. The 'if check' in kvmhv_run_single_vcpu() sets the LPCR_MER bit if there are pending interrupts. The spurious flood problem can be observed in 2 cases: 1. Crashing the guest while interrupt heavy workload is running a. Start a L2 guest and run an interrupt heavy workload (eg: ipistorm) b. While the workload is running, crash the guest (make sure kdump is configured) c. Any one of the vCPUs of the guest will start getting an infinite flood of spurious interrupts. 2. Running LTP stress tests in multiple guests at the same time a. Start 4 L2 guests. b. Start running LTP stress tests on all 4 guests at same time. c. In some time, any one/more of the vCPUs of any of the guests will start getting an infinite flood of spurious interrupts. The root cause of both the above issues is the same: 1. A NMI is sent to a running vCPU that has LPCR_MER bit set. 2. In the NMI path, all registers are refreshed, i.e, H_GUEST_GET_STATE is called for all the registers. 3. When H_GUEST_GET_STATE is called for LPCR, the vcpu->arch.vcore->lpcr of that vCPU at L1 level gets updated with LPCR_MER set to 1, and this new value is always used whenever that vCPU runs, regardless of whether there was a pending interrupt. 4. Since LPCR_MER is set, the vCPU in L2 always jumps to the external interrupt handler, and this cycle never ends. Fix the spurious flood by masking off the LPCR_MER bit before running a L2 vCPU to ensure that it is not set if there are no pending interrupts. [1] Terminology: 1. L0 : PAPR hypervisor running in HV mode 2. L1 : Linux guest (logical partition) running on top of L0 3. L2 : KVM guest running on top of L1 Fixes: ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0") Cc: sta...@vger.kernel.org # v6.8+ Signed-off-by: Gautam Menghani Signed-off-by: Madhavan Srinivasan Since it's upstream properly tagged as stable update, waiting on Canonical Kernel team to pick this up. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
Many thanks Gautam! I'ce noticed that the commit was upstream tagged as stable update for 6.8+: "Fixes: ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0") Cc: sta...@vger.kernel.org # v6.8+" which is perfect, since with that it will be automatically be picked up by the Canonical kernel team for all kernels 6.8 (as in noble/24.04) and newer. ** Changed in: ubuntu-power-systems Status: Incomplete => In Progress ** Changed in: linux (Ubuntu) Status: Incomplete => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
Okay, thx Gautam for the update of the plan. So I'm leaving the status of this bug to incomplete - for now. Please share the commit(s) of the fix, once they were sent to upstream, so that we can track the inclusion into the Ubuntu kernels (6.8 for Ubuntu 24.04 and 6.11 for Ubuntu 24.10) here. Thank you! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
Hello Gautam, thanks for the explanation on why the system is getting stuck in this particular case. Well, you've mentioned in the bug description that reverting the following two commits: df938a5576f3 KVM: PPC: Book3S HV nestedv2: Do not inject certain interrupts ec0f6639fa88 KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0 prevents a L" system that is (k)dumped from hang. But is this also the upstream fix for this particular case? I'm wondering because I don't see these two commits upstream reverted yet, but I would expect to see that - the revert ideally also upstream tagged as stable update, since it fixes this situation. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
Looks like it is not yet clear if xive is the problem. So isn't it too early to revert the patches? Is it really safe to do so? I see they got introduced with kernel 6.8, but are still in the later kernels. I don't see any upstream revert (ideally as "stable update"), which would be the right approach - I think. ** Changed in: ubuntu-power-systems Status: New => Incomplete ** Changed in: linux (Ubuntu) Status: New => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash
** Package changed: kernel-package (Ubuntu) => linux (Ubuntu) ** Also affects: ubuntu-power-systems Importance: Undecided Status: New ** Changed in: ubuntu-power-systems Assignee: (unassigned) => Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2077722 Title: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs