[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2025-02-21 Thread Frank Heimes
** Changed in: ubuntu-power-systems
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2025-02-20 Thread Massimiliano Pellizzer
The patch has been applied to Noble tree. It will be released in the
next SRU cycle.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2025-02-20 Thread Koichiro Den
** Changed in: linux (Ubuntu Noble)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2025-02-06 Thread Frank Heimes
We received one ACK from the Canonical kernel team:
https://lists.ubuntu.com/archives/kernel-team/2025-January/156676.html
to get the fix/patch picked up for noble.

** Changed in: linux (Ubuntu Noble)
 Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team)

** Changed in: linux (Ubuntu Noble)
   Importance: Undecided => High

** Also affects: linux (Ubuntu Oracular)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Plucky)
   Importance: High
 Assignee: Canonical Kernel Team (canonical-kernel-team)
   Status: In Progress

** Changed in: linux (Ubuntu Oracular)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Plucky)
   Status: In Progress => Fix Released

** Changed in: linux (Ubuntu Oracular)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Plucky)
 Assignee: Canonical Kernel Team (canonical-kernel-team) => (unassigned)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2025-01-30 Thread bugproxy
** Tags removed: targetmilestone-inin---
** Tags added: targetmilestone-inin2404

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2025-01-22 Thread Kleber Sacilotto de Souza
** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Noble)
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2025-01-22 Thread Frank Heimes
Patch was submitted to the Ubuntu kernel team mailing list for noble:
https://lists.ubuntu.com/archives/kernel-team/2025-January/thread.html#156470
Changing status to 'In Progress'.

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Changed in: ubuntu-power-systems
   Importance: Undecided => High

** Changed in: linux (Ubuntu)
 Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) => 
Canonical Kernel Team (canonical-kernel-team)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2025-01-22 Thread Frank Heimes
Successful kernel test builds are available in this PPA:
https://launchpad.net/~fheimes/+archive/ubuntu/lp2077722

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2025-01-22 Thread Frank Heimes
** Description changed:

+ SRU Justification:
+ ==
+ 
+ [Impact]
+ 
+  * L2 guest(s) (nested virtualization) running stress-ng getting stuck
+at booting after triggering crash.
+ 
+  * When for example having two Ubuntu 24.04 guests and running
+stress-ng (90% load) on both and triggering crash simultaneously,
+1st guest gets stuck and does not boot up.
+ 
+  * In one of the attempts, both the guests got stuck on booting with
+ console hang.
+ 
+ [Fix]
+ 
+  * a373830f96db a373830f96db288a3eb43a8692b6bcd0bd88dfe1
+"KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to 
avoid spurious interrupts"
+ 
+ [Test Plan]
+ 
+  * An Ubuntu Server 24.04 LPAR installation, acting as KVM host,
+on IBM Power 10 hardware (with nested KVM capable FW1060 or never) is 
needed.
+ 
+  * On top two (or more) KVM guests (now nested), again running 24.04,
+need to be setup.
+ 
+  * Run the attached stress-ng.sh script on both KVM guests.
+ 
+  * Trigger crash(es) on both KVM guests at the same time:
+echo c >/proc/sysrq-trigger
+ 
+  * At least one KVM guest (sometimes both) are now stuck while rebooting,
+without the above patch in place.
+ 
+ [Where problems could occur]
+ 
+  * The changes are in arch/powerpc/kvm/book3s_hv.c only,
+hence are ppc specific and do not affect any other architecture.
+ 
+  * The net changes are more or less only two effective code lines;
+and additional else case and the explicit masking off the 'MER' bit.
+ 
+  * Wrong assumptions may have a different impact on KVM gusts (L0),
+or interfere with any other virtualization level.
+ 
+  * But the commit is an upstream accepted fix
+[for ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is 
passed to the L0")]
+that landed in kernel 6.12 and was also accepted as stable update
+for kernels v6.8+.
+ 
+ [Other Info]
+ 
+  * This fix/commit discussed here will be part of the planned
+target kernel for plucky, hence plucky/25.04 is not affected.
+ 
+  * The fix/commit is already included in oracular master-next
+as 08cbc81b9a61 and included starting with kernel Ubuntu-6.11.0-17.17.
+
+  * With that only noble needs to be fixed (since this nested virtualization
+scenario is not supported by Ubuntu prior to noble).
+ 
+  * Since the fix is upstream marked as stable update,
+it would usually be picked up by the kernel team automatically.
+  
+  * But to not loose the 24.04.2 window out of sight I was asked
+to submit this patch separately.
+ 
+ __
+ 
  Problem:
- While bringing up 2 Ubuntu 24.04 guests and running stress-ng (90% load) on 
both and triggering crash simultaneously, 1st guest gets stuck and does not 
boot up. In one of the attempts, both the guests got stuck on booting with 
console hang. 
+ While bringing up 2 Ubuntu 24.04 guests and running stress-ng (90% load) on 
both and triggering crash simultaneously, 1st guest gets stuck and does not 
boot up. In one of the attempts, both the guests got stuck on booting with 
console hang.
  
  Attempts:
  Reproducible 3/3 consecutive times
- Run 1: L2-1 guest got stuck 
+ Run 1: L2-1 guest got stuck
  Run 2: L2-1 guest got stuck
  Run 3: L2-1 and L2-2 guest got stuck
- 
  
  =
  L1 Host:
  1. PowerVM
  2. OS: Ubuntu 24.04
  3. Kernel: 6.8.0-31-generic
  4. Mem (free -mh): 47Gi
  5. cpus: 40
  
  Guest L2-1:
  1. OS: Ubuntu 24.04
  2. Kernel: 6.8.0-31-generic
  3. Mem (free -mh): 9.5Gi
  4. cpus: 8
  5. Stress: stress-ng - 90% load
  6. XML configuration:
-16
-10971520
-
+    16
+    10971520
+    
  
  Guest L2-2:
  1. OS: Ubuntu 24.04
  2. Kernel: 6.8.0-31-generic
  3. Mem (free -mh): 9.5Gi
  4. cpus: 8
  5. Stress: stress-ng - 90% load
  6. XML configuration:
-16
-10971520
-
- 
+    16
+    10971520
+    
  
  =
  Steps to reproduce:
  1. Bring up 2 Ubuntu 24.04 L2 guests with configuration mentioned as above
  2. Run the attached stress-ng.sh script on both L2 guests
  3. Trigger crash: echo c >/proc/sysrq-trigger on both L2 guests at the same 
time
  
  After triggering the crash, 1 or both guest consoles will get stuck. And
  then, we will not be able to enter the guest neither shut it down. In
  oder to boot into the guest, virsh destroy of the guest will be
  required.
  
- 
  =
  Run1: Console.log Error message of L2-1
-   Booting `Ubuntu'
+   Booting `Ubuntu'
  
  Loading Linux 6.8.0-31-generic ...
  Loading initial ramdisk ...
  OF stdout device is: /vdevice/vty@3000
  Preparing to boot Linux version 6.8.0-31-generic (buildd@bos02-ppc64el-018) 
(powerpc64le-linux-gnu-gcc-13 (Ubuntu 13.2.0-23ubuntu4) 13.2.0, GNU ld (GNU 
Binutils for Ubuntu) 2.42) #31-Ubuntu SMP Sat Apr 20 00:05:55 UTC 2024 (Ubuntu 
6.8.0-31.31-generic 6.8.1)
  Detected machine 

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2025-01-21 Thread Frank Heimes
This was not picked up by the kernel team yet - I just checked the noble
master-next tree.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2024-11-20 Thread Frank Heimes
Commit a373830f96db "KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU
before running it to avoid spurious interrupts" meanwhile landed in
v6.12-rc7.

commit a373830f96db288a3eb43a8692b6bcd0bd88dfe1
Author: Gautam Menghani 
Date:   Mon Oct 28 14:34:09 2024 +0530

KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to 
avoid spurious interrupts

Running a L2 vCPU (see [1] for terminology) with LPCR_MER bit set and no
pending interrupts results in that L2 vCPU getting an infinite flood of
spurious interrupts. The 'if check' in kvmhv_run_single_vcpu() sets the
LPCR_MER bit if there are pending interrupts.

The spurious flood problem can be observed in 2 cases:
1. Crashing the guest while interrupt heavy workload is running
  a. Start a L2 guest and run an interrupt heavy workload (eg: ipistorm)
  b. While the workload is running, crash the guest (make sure kdump
 is configured)
  c. Any one of the vCPUs of the guest will start getting an infinite
 flood of spurious interrupts.

2. Running LTP stress tests in multiple guests at the same time
   a. Start 4 L2 guests.
   b. Start running LTP stress tests on all 4 guests at same time.
   c. In some time, any one/more of the vCPUs of any of the guests will
  start getting an infinite flood of spurious interrupts.

The root cause of both the above issues is the same:
1. A NMI is sent to a running vCPU that has LPCR_MER bit set.
2. In the NMI path, all registers are refreshed, i.e, H_GUEST_GET_STATE
   is called for all the registers.
3. When H_GUEST_GET_STATE is called for LPCR, the vcpu->arch.vcore->lpcr
   of that vCPU at L1 level gets updated with LPCR_MER set to 1, and this
   new value is always used whenever that vCPU runs, regardless of whether
   there was a pending interrupt.
4. Since LPCR_MER is set, the vCPU in L2 always jumps to the external
   interrupt handler, and this cycle never ends.

Fix the spurious flood by masking off the LPCR_MER bit before running a
L2 vCPU to ensure that it is not set if there are no pending interrupts.

[1] Terminology:
1. L0 : PAPR hypervisor running in HV mode
2. L1 : Linux guest (logical partition) running on top of L0
3. L2 : KVM guest running on top of L1

Fixes: ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is 
passed to the L0")
Cc: sta...@vger.kernel.org # v6.8+
Signed-off-by: Gautam Menghani 
Signed-off-by: Madhavan Srinivasan 

Since it's upstream properly tagged as stable update, waiting on
Canonical Kernel team to pick this up.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2024-11-08 Thread Frank Heimes
Many thanks Gautam!
I'ce noticed that the commit was upstream tagged as stable update for 6.8+:
"Fixes: ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is 
passed to the L0")
 Cc: sta...@vger.kernel.org # v6.8+"
which is perfect, since with that it will be automatically be picked up by the 
Canonical kernel team for all kernels 6.8 (as in noble/24.04) and newer.


** Changed in: ubuntu-power-systems
   Status: Incomplete => In Progress

** Changed in: linux (Ubuntu)
   Status: Incomplete => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2024-10-07 Thread Frank Heimes
Okay, thx Gautam for the update of the plan.

So I'm leaving the status of this bug to incomplete - for now.
Please share the commit(s) of the fix, once they were sent to upstream, so that 
we can track the inclusion into the Ubuntu kernels (6.8 for Ubuntu 24.04 and 
6.11 for Ubuntu 24.10) here.

Thank you!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2024-10-07 Thread Frank Heimes
Hello Gautam, thanks for the explanation on why the system is getting
stuck in this particular case.

Well, you've mentioned in the bug description that reverting the following two 
commits:
df938a5576f3 KVM: PPC: Book3S HV nestedv2: Do not inject certain interrupts
ec0f6639fa88 KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the 
L0
prevents a L" system that is (k)dumped from hang.

But is this also the upstream fix for this particular case?
I'm wondering because I don't see these two commits upstream reverted yet, 
but I would expect to see that - the revert ideally also upstream tagged as 
stable update,
since it fixes this situation.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2024-09-03 Thread Frank Heimes
Looks like it is not yet clear if xive is the problem.

So isn't it too early to revert the patches? Is it really safe to do so?
I see they got introduced with kernel 6.8, but are still in the later kernels.
I don't see any upstream revert (ideally as "stable update"), which would be 
the right approach - I think.

** Changed in: ubuntu-power-systems
   Status: New => Incomplete

** Changed in: linux (Ubuntu)
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2077722] Re: [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash

2024-08-26 Thread Frank Heimes
** Package changed: kernel-package (Ubuntu) => linux (Ubuntu)

** Also affects: ubuntu-power-systems
   Importance: Undecided
   Status: New

** Changed in: ubuntu-power-systems
 Assignee: (unassigned) => Ubuntu on IBM Power Systems Bug Triage 
(ubuntu-power-triage)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2077722

Title:
  [Ubuntu 24.04] MultiVM - L2 guest(s) running stress-ng getting stuck
  at booting after triggering crash

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2077722/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs