Public bug reported:

BugLink: https://bugs.launchpad.net/bugs/2155222

[Impact]

Jammy VMs running on "Gen2" v6 instance types on Azure fail to collect a kdump
with both the 5.15 and 6.8 HWE kernel, yet kdump succeeds for 6.8 onward on
noble onward. Even stranger, it succeeds on jammy with secureboot enabled, and
fails with secureboot disabled.

The difference between jammy and noble onward can be explained with userspace
tools, as kdump-tools uses -c (--kexec-syscall) by default, and changes to
-s (--kexec-file-syscall) when secureboot is enabled. Noble onward works due to
using -a (--kexec-syscall-auto) by default, which defaults to -s. Noble will
fail when using -c instead.

From man kexec:

-s (--kexec-file-syscall)
      Specify that the new KEXEC_FILE_LOAD syscall should be used exclusively.

-c (--kexec-syscall)
      Specify that the old KEXEC_LOAD syscall should be used exclusively (the 
default).

-a (--kexec-syscall-auto)
      Try the new KEXEC_FILE_LOAD syscall first and when it is not supported or 
the kernel does not understand the supplied  image  fall  back  to  the  old
      KEXEC_LOAD interface.

      There is no one single interface that always works.

      KEXEC_FILE_LOAD is required on systems that use locked-down secure boot 
to verify the kernel signature.  KEXEC_LOAD may be also disabled in the kernel
      configuration.

      KEXEC_LOAD is required for some kernel image formats and on
architectures that do not implement KEXEC_FILE_LOAD.

Regardless, the issue is actually a hyperv subsystem issue in the
kernel.

When the kexec / kdump kernel boots, vmbus_reserve_fb() fails to reserve the
framebuffer MMIO range due to a Gen2 VM's screen.lfb_base being zero. This
causes a MMIO conflict between hyperv-drm and pci-hyperv: when the pci-hyperv's
hv_allocate_config_window() calls vmbus_allocate_mmio() to get an MMIO range,
it usually gets a 32-bit MMIO range that overlaps with the framebuffer MMIO
range, and later hv_pci_enter_d0() fails with an error message
"PCI Pass-through VSP failed D0 Entry with status" since the host thinks that
PCI devices must not use MMIO space that the host has assigned to the
framebuffer.

This is especially an issue if pci-hyperv is built-in and hyperv-drm is built as
a module. Consequently, the kdump/kexec kernel fails to detect PCI devices via
pci-hyperv, and may fail to mount the root file system, which may reside in a
NVMe disk.

The end result is that capturing kdumps fail when -c (--kexec-syscall) is used,
which is the default on jammy.

[Fix]

This is currently queued up in the hyperv maintainer tree, in the hyperv-fixes
branch:

commit 016a25e4b0df4d77e7c258edee4aaf982e4ee809 hyperv
From: Dexuan Cui <[email protected]>
Date: Thu, 7 May 2026 14:28:38 -0700
Subject: Drivers: hv: vmbus: Improve the logic of reserving fb_mmio on Gen2 VMs
Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/commit/?h=hyperv-fixes&id=016a25e4b0df4d77e7c258edee4aaf982e4ee809

This is expected to make the 7.2 merge window.

This fix is required for hyperv users, and is mostly relevant for -azure users
only, but I am still requesting this for -generic to ensure that anyone using
-generic on Azure can still kexec, and to make it easier to bisect -generic on
Azure in the future.

[Testcase]

This needs to be tested on Azure on both v5 and v6 instance types. The issue
occurs with v6 instance types, but we need to ensure we do not cause a
regression with v5 instance types.

For each series you are testing, create a VM with the following instance types:
- Standard_D4ads_v5
- Standard_D4ads_v6

For the image type, you need to select "Gen2" images:
- "Ubuntu Server 22.04 LTS - x64 Gen2"
- "Ubuntu Server 24.04 LTS - x64 Gen2"
- "Ubuntu Server 25.10 - x64 Gen 2"
- "Ubuntu Server 26.04 LTS - x64 Gen 2"

If you are going to test with -c (--kexec-syscall), secureboot needs to be
disabled, and you can do this with:
- Under Security type, select "Configure security features"
- uncheck "Enable Secure Boot". Save.

Create the VM.

Log in, and install kdump-tools:

$ sudo apt update
$ sudo apt install kdump-tools

Say yes to each prompt.

$ sudo vim /etc/default/grub.d/kdump-tools.cfg
Change crashkernel=512M-:192M from 192M to 1G, save, exit.

$ sudo vim /etc/kernel/postinst.d/kdump-tools
Change dep to most, save exit.

$ sudo update-grub
$ sudo reboot

Verify that the cmdline has crashkernel set to 1G memory:
$ cat /proc/cmdline
$ kdump-config show

On the Azure Web Interface, select "Serial Console" for the VM, and watch the
serial console.

$ sudo sysctl -w kernel.sysrq=1
$ sudo su
$ echo c > /proc/sysrq-trigger

Watch the kernel panic and reboot into the crash kernel.

On failure:

The kexec kernel gets stuck, and writes these messages to dmesg.

[    1.157729] hv_pci 7ad35d50-c05b-47ab-b3a0-56a9a845852b: PCI VMBus probing: 
Using version 0x10004
[    1.167427] hv_pci 7ad35d50-c05b-47ab-b3a0-56a9a845852b: Retrying D0 Entry
[    1.173231] hv_pci 7ad35d50-c05b-47ab-b3a0-56a9a845852b: PCI Pass-through 
VSP failed D0 Entry with status c000000d
[    1.181091] hv_vmbus: probe failed for device 
7ad35d50-c05b-47ab-b3a0-56a9a845852b (-71)
[    1.186890] hv_pci: probe of 7ad35d50-c05b-47ab-b3a0-56a9a845852b failed 
with error -71
[    1.194422] hv_pci 00000001-7870-47b5-b203-907d12ca697e: PCI VMBus probing: 
Using version 0x10004
[    1.202172] hv_pci 00000001-7870-47b5-b203-907d12ca697e: Retrying D0 Entry
[    1.207877] hv_pci 00000001-7870-47b5-b203-907d12ca697e: PCI Pass-through 
VSP failed D0 Entry with status c000000d

The kexec kernel gives up, and reboots. No kdump is generated. /var/crash will
be empty.

On success:

The kdump is collected, and saved to /var/crash, and will be present on next
boot.

There are test kernels available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf425760-test

If you install the test kernel and reboot, kdump will work correctly on v6
instance types.

[Where problems could occur]

This changes how vmbus_reserve_fb() reserves MMIO space for the framebuffer,
and if a regression were to occur, it could affect the pci-hyperv and hyperv-drm
drivers from being able to claim the correct MMIO ranges.

This could show as instances failing to start or failing to kexec / collect a
kdump with the crashkernel.

This fix works both on amd64 and arm64 instance types, as well as with 32bit
and 64bit pci busses.

[Other info]

Upstream mailing list threads:

Abandoned Patch:
V1: 
https://lore.kernel.org/linux-hyperv/[email protected]/
V2: 
https://lore.kernel.org/linux-hyperv/[email protected]/

Current Patch:
V1: 
https://lore.kernel.org/linux-hyperv/[email protected]/
V2: 
https://lore.kernel.org/linux-hyperv/[email protected]/
V3: 
https://lore.kernel.org/linux-hyperv/[email protected]/

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Fix Committed

** Affects: linux (Ubuntu Jammy)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress

** Affects: linux (Ubuntu Noble)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress

** Affects: linux (Ubuntu Questing)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress

** Affects: linux (Ubuntu Resolute)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress


** Tags: sts

** Also affects: linux (Ubuntu Resolute)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Noble)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Questing)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Jammy)
       Status: New => In Progress

** Changed in: linux (Ubuntu Noble)
       Status: New => In Progress

** Changed in: linux (Ubuntu Questing)
       Status: New => In Progress

** Changed in: linux (Ubuntu Resolute)
       Status: New => In Progress

** Changed in: linux (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Noble)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Questing)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Jammy)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: linux (Ubuntu Noble)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: linux (Ubuntu Resolute)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: linux (Ubuntu Resolute)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Questing)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: linux (Ubuntu)
       Status: New => Fix Committed

** Description changed:

- BugLink: https://bugs.launchpad.net/bugs/
+ BugLink: https://bugs.launchpad.net/bugs/2155222
  
  [Impact]
  
  Jammy VMs running on "Gen2" v6 instance types on Azure fail to collect a kdump
  with both the 5.15 and 6.8 HWE kernel, yet kdump succeeds for 6.8 onward on
  noble onward. Even stranger, it succeeds on jammy with secureboot enabled, and
  fails with secureboot disabled.
  
  The difference between jammy and noble onward can be explained with userspace
  tools, as kdump-tools uses -c (--kexec-syscall) by default, and changes to
  -s (--kexec-file-syscall) when secureboot is enabled. Noble onward works due 
to
  using -a (--kexec-syscall-auto) by default, which defaults to -s. Noble will
  fail when using -c instead.
  
  From man kexec:
  
  -s (--kexec-file-syscall)
-       Specify that the new KEXEC_FILE_LOAD syscall should be used exclusively.
+       Specify that the new KEXEC_FILE_LOAD syscall should be used exclusively.
  
  -c (--kexec-syscall)
-       Specify that the old KEXEC_LOAD syscall should be used exclusively (the 
default).
+       Specify that the old KEXEC_LOAD syscall should be used exclusively (the 
default).
  
  -a (--kexec-syscall-auto)
-       Try the new KEXEC_FILE_LOAD syscall first and when it is not supported 
or the kernel does not understand the supplied  image  fall  back  to  the  old
-       KEXEC_LOAD interface.
+       Try the new KEXEC_FILE_LOAD syscall first and when it is not supported 
or the kernel does not understand the supplied  image  fall  back  to  the  old
+       KEXEC_LOAD interface.
  
-       There is no one single interface that always works.
+       There is no one single interface that always works.
  
-       KEXEC_FILE_LOAD is required on systems that use locked-down secure boot 
to verify the kernel signature.  KEXEC_LOAD may be also disabled in the kernel
-       configuration.
+       KEXEC_FILE_LOAD is required on systems that use locked-down secure boot 
to verify the kernel signature.  KEXEC_LOAD may be also disabled in the kernel
+       configuration.
  
-       KEXEC_LOAD is required for some kernel image formats and on
+       KEXEC_LOAD is required for some kernel image formats and on
  architectures that do not implement KEXEC_FILE_LOAD.
  
  Regardless, the issue is actually a hyperv subsystem issue in the
  kernel.
  
  When the kexec / kdump kernel boots, vmbus_reserve_fb() fails to reserve the
  framebuffer MMIO range due to a Gen2 VM's screen.lfb_base being zero. This
  causes a MMIO conflict between hyperv-drm and pci-hyperv: when the 
pci-hyperv's
- hv_allocate_config_window() calls vmbus_allocate_mmio() to get an MMIO range, 
- it usually gets a 32-bit MMIO range that overlaps with the framebuffer MMIO 
- range, and later hv_pci_enter_d0() fails with an error message 
- "PCI Pass-through VSP failed D0 Entry with status" since the host thinks that 
- PCI devices must not use MMIO space that the host has assigned to the 
+ hv_allocate_config_window() calls vmbus_allocate_mmio() to get an MMIO range,
+ it usually gets a 32-bit MMIO range that overlaps with the framebuffer MMIO
+ range, and later hv_pci_enter_d0() fails with an error message
+ "PCI Pass-through VSP failed D0 Entry with status" since the host thinks that
+ PCI devices must not use MMIO space that the host has assigned to the
  framebuffer.
  
  This is especially an issue if pci-hyperv is built-in and hyperv-drm is built 
as
- a module. Consequently, the kdump/kexec kernel fails to detect PCI devices 
via 
- pci-hyperv, and may fail to mount the root file system, which may reside in a 
- NVMe disk. 
+ a module. Consequently, the kdump/kexec kernel fails to detect PCI devices via
+ pci-hyperv, and may fail to mount the root file system, which may reside in a
+ NVMe disk.
  
  The end result is that capturing kdumps fail when -c (--kexec-syscall) is 
used,
- which is the default on jammy. 
+ which is the default on jammy.
  
  [Fix]
  
  This is currently queued up in the hyperv maintainer tree, in the hyperv-fixes
  branch:
  
  commit 016a25e4b0df4d77e7c258edee4aaf982e4ee809 hyperv
  From: Dexuan Cui <[email protected]>
  Date: Thu, 7 May 2026 14:28:38 -0700
  Subject: Drivers: hv: vmbus: Improve the logic of reserving fb_mmio on Gen2 
VMs
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/commit/?h=hyperv-fixes&id=016a25e4b0df4d77e7c258edee4aaf982e4ee809
  
  This is expected to make the 7.2 merge window.
  
  This fix is required for hyperv users, and is mostly relevant for -azure users
  only, but I am still requesting this for -generic to ensure that anyone using
  -generic on Azure can still kexec, and to make it easier to bisect -generic on
  Azure in the future.
  
  [Testcase]
  
  This needs to be tested on Azure on both v5 and v6 instance types. The issue
- occurs with v6 instance types, but we need to ensure we do not cause a 
+ occurs with v6 instance types, but we need to ensure we do not cause a
  regression with v5 instance types.
  
  For each series you are testing, create a VM with the following instance 
types:
  - Standard_D4ads_v5
  - Standard_D4ads_v6
  
  For the image type, you need to select "Gen2" images:
  - "Ubuntu Server 22.04 LTS - x64 Gen2"
  - "Ubuntu Server 24.04 LTS - x64 Gen2"
  - "Ubuntu Server 25.10 - x64 Gen 2"
  - "Ubuntu Server 26.04 LTS - x64 Gen 2"
  
  If you are going to test with -c (--kexec-syscall), secureboot needs to be
  disabled, and you can do this with:
- - Under Security type, select "Configure security features" 
+ - Under Security type, select "Configure security features"
  - uncheck "Enable Secure Boot". Save.
  
  Create the VM.
  
  Log in, and install kdump-tools:
  
  $ sudo apt update
  $ sudo apt install kdump-tools
  
  Say yes to each prompt.
  
  $ sudo vim /etc/default/grub.d/kdump-tools.cfg
  Change crashkernel=512M-:192M from 192M to 1G, save, exit.
  
  $ sudo vim /etc/kernel/postinst.d/kdump-tools
  Change dep to most, save exit.
  
  $ sudo update-grub
  $ sudo reboot
  
  Verify that the cmdline has crashkernel set to 1G memory:
- $ cat /proc/cmdline 
- $ kdump-config show 
+ $ cat /proc/cmdline
+ $ kdump-config show
  
  On the Azure Web Interface, select "Serial Console" for the VM, and watch the
  serial console.
  
  $ sudo sysctl -w kernel.sysrq=1
  $ sudo su
  $ echo c > /proc/sysrq-trigger
  
  Watch the kernel panic and reboot into the crash kernel.
  
  On failure:
  
  The kexec kernel gets stuck, and writes these messages to dmesg.
  
  [    1.157729] hv_pci 7ad35d50-c05b-47ab-b3a0-56a9a845852b: PCI VMBus 
probing: Using version 0x10004
  [    1.167427] hv_pci 7ad35d50-c05b-47ab-b3a0-56a9a845852b: Retrying D0 Entry
  [    1.173231] hv_pci 7ad35d50-c05b-47ab-b3a0-56a9a845852b: PCI Pass-through 
VSP failed D0 Entry with status c000000d
  [    1.181091] hv_vmbus: probe failed for device 
7ad35d50-c05b-47ab-b3a0-56a9a845852b (-71)
  [    1.186890] hv_pci: probe of 7ad35d50-c05b-47ab-b3a0-56a9a845852b failed 
with error -71
  [    1.194422] hv_pci 00000001-7870-47b5-b203-907d12ca697e: PCI VMBus 
probing: Using version 0x10004
  [    1.202172] hv_pci 00000001-7870-47b5-b203-907d12ca697e: Retrying D0 Entry
  [    1.207877] hv_pci 00000001-7870-47b5-b203-907d12ca697e: PCI Pass-through 
VSP failed D0 Entry with status c000000d
  
  The kexec kernel gives up, and reboots. No kdump is generated. /var/crash will
  be empty.
  
  On success:
  
  The kdump is collected, and saved to /var/crash, and will be present on next
  boot.
  
  There are test kernels available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/sf425760-test
  
  If you install the test kernel and reboot, kdump will work correctly on v6
  instance types.
  
  [Where problems could occur]
  
  This changes how vmbus_reserve_fb() reserves MMIO space for the framebuffer,
  and if a regression were to occur, it could affect the pci-hyperv and 
hyperv-drm
  drivers from being able to claim the correct MMIO ranges.
  
  This could show as instances failing to start or failing to kexec / collect a
  kdump with the crashkernel.
  
  This fix works both on amd64 and arm64 instance types, as well as with 32bit
  and 64bit pci busses.
  
  [Other info]
  
  Upstream mailing list threads:
  
  Abandoned Patch:
  V1: 
https://lore.kernel.org/linux-hyperv/[email protected]/
  V2: 
https://lore.kernel.org/linux-hyperv/[email protected]/
  
  Current Patch:
  V1: 
https://lore.kernel.org/linux-hyperv/[email protected]/
  V2: 
https://lore.kernel.org/linux-hyperv/[email protected]/
  V3: 
https://lore.kernel.org/linux-hyperv/[email protected]/

** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2155222

Title:
  [hyperv] Ensure MMIO Mapping is Correct for Kexec / kdump kernel on
  Azure v6 Instance Types

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2155222/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to