Public bug reported:

We have a custom build of the kernel based on the Ubuntu-
hwe-5.15-5.15.0-91.101_20.04.1 tag.  It includes a small number of
patches but nothing in the area of the early boot code.  Xen is based on
the upstream 4.15.5 stable branch with all patches up to and including
XSA-444.  In approximately 1% of pv guest boots we get the following
crash which looks like it involves the entry_64.S code.  We have seen
this on different hardware models but only with an Intel processor
although we don't have any AMD based systems.  The problem was also
observed with the 5.15.0-85 tag.

I have had a look on the main line kernel branch for arch/x86/entry
changes but I can't obviously connect this problem to anything there
based on the commit messages.  I don't have the knowledge to understand
the code though and whether there is actually something relevant.


```
[    0.303715] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user 
pointer sanitization
[    0.303727] Spectre V2 : Mitigation: Enhanced IBRS
[    0.303733] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on 
context switch
[    0.303740] Spectre V2 : Spectre v2 / PBRSB-eIBRS: Retire a single CALL on 
VMEXIT
[    0.303746] RETBleed: Mitigation: Enhanced IBRS
[    0.303752] Spectre V2 : mitigation: Enabling conditional Indirect Branch 
Prediction Barrier
[    0.303760] Speculative Store Bypass: Mitigation: Speculative Store Bypass 
disabled via prctl and seccomp
[    0.303771] MMIO Stale Data: Mitigation: Clear CPU buffers
[    0.303777] GDS: Unknown: Dependent on hypervisor status
[    0.303827] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[    0.303835] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.303840] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.303846] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[    0.303851] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[    0.303857] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[    0.303865] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.303871] x86/fpu: xstate_offset[5]: 1088, xstate_sizes[5]:   64
[    0.303877] x86/fpu: xstate_offset[6]: 1152, xstate_sizes[6]:  512
[    0.303882] x86/fpu: xstate_offset[7]: 1664, xstate_sizes[7]: 1024
[    0.303888] x86/fpu: Enabled xstate features 0xe7, context size is 2688 
bytes, using 'standard' format.
[    0.327588] segment-related general protection fault: e030 [#1] SMP NOPTI
[    0.327604] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-91-generic 
#101~20.04.1custom1
[    0.327614] RIP: e030:native_irq_return_iret+0x0/0x2
[    0.327627] Code: 5b 41 5b 41 5a 41 59 41 58 58 59 5a 5e 5f 48 83 c4 08 eb 
0f 0f 1f 00 90 66 66 2e 0f 1f 84 00 00 00 00 00 f6 44 24 20 04 75 02 <48> cf 57 
0f 01 f8 eb 12 0f 20 df 90 90 90 90 90 48 81 e7 ff e7 ff
[    0.327640] RSP: e02b:ffffffff82e03bc8 EFLAGS: 00010046
[    0.327647] RAX: 0000000000000000 RBX: ffffffff82e03c30 RCX: ffffffff81e01101
[    0.327653] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001f
[    0.327660] RBP: ffffffff82e03bf8 R08: ffffffff81e011ef R09: 0000000000000005
[    0.327666] R10: 0000000000000006 R11: e8ae0feb75ccff49 R12: ffffffff81e011ef
[    0.327672] R13: 0000000000000006 R14: ffffffff81e011f1 R15: 0000000000000002
[    0.327684] FS:  0000000000000000(0000) GS:ffff888015a00000(0000) 
knlGS:0000000000000000
[    0.327691] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.327696] CR2: 0000000000000000 CR3: 0000000002e10000 CR4: 0000000000050660
[    0.327705] Call Trace:
[    0.327709]  <TASK>
[    0.327713]  ? show_trace_log_lvl+0x1d6/0x2ea
[    0.327723]  ? show_trace_log_lvl+0x1d6/0x2ea
[    0.327729]  ? insn_decode+0xec/0x100
[    0.327738]  ? show_regs.part.0+0x23/0x29
[    0.327743]  ? __die_body.cold+0x8/0xd
[    0.327748]  ? die_addr+0x3e/0x60
[    0.327756]  ? exc_general_protection+0x1c1/0x350
[    0.327766]  ? asm_exc_general_protection+0x27/0x30
[    0.327772]  ? restore_regs_and_return_to_kernel+0x1d/0x2c
[    0.327778]  ? restore_regs_and_return_to_kernel+0x1b/0x2c
[    0.327784]  ? restore_regs_and_return_to_kernel+0x1b/0x2c
[    0.327789]  ? asm_sysvec_xen_hvm_callback+0x11/0x20
[    0.327796]  ? native_iret+0x7/0x7
[    0.327801]  ? insn_get_displacement+0x4d/0x110
[    0.327807]  insn_decode+0xec/0x100
[    0.327813]  optimize_nops+0x68/0x150
[    0.327819]  ? restore_regs_and_return_to_kernel+0x1d/0x2c
[    0.327825]  ? restore_regs_and_return_to_kernel+0x2c/0x2c
[    0.327830]  ? restore_regs_and_return_to_kernel+0x20/0x2c
[    0.327837]  apply_alternatives+0x181/0x3a0
[    0.327843]  ? restore_regs_and_return_to_kernel+0x1b/0x2c
[    0.327848]  ? fb_is_primary_device+0x25/0x73
[    0.327855]  ? restore_regs_and_return_to_kernel+0x1b/0x2c
[    0.327861]  ? apply_alternatives+0x8/0x3a0
[    0.327867]  ? fb_is_primary_device+0x6e/0x73
[    0.327872]  ? apply_returns+0xfc/0x180
[    0.327878]  ? fb_is_primary_device+0x6e/0x73
[    0.327883]  ? sanitize_boot_params.constprop.0+0xa/0xef
[    0.327889]  ? fb_is_primary_device+0x73/0x73
[    0.327895]  alternative_instructions+0xa9/0x173
[    0.327904]  arch_cpu_finalize_init+0x2c/0x51
[    0.327909]  start_kernel+0x425/0x4ce
[    0.327916]  x86_64_start_reservations+0x24/0x2a
[    0.327922]  xen_start_kernel+0x41e/0x429
[    0.327928]  startup_xen+0x3e/0x3e
[    0.327934]  </TASK>
[    0.327937] Modules linked in:
[    0.327943] ---[ end trace c275641b4f1eba81 ]---
[    0.327948] RIP: e030:native_irq_return_iret+0x0/0x2
[    0.327954] Code: 5b 41 5b 41 5a 41 59 41 58 58 59 5a 5e 5f 48 83 c4 08 eb 
0f 0f 1f 00 90 66 66 2e 0f 1f 84 00 00 00 00 00 f6 44 24 20 04 75 02 <48> cf 57 
0f 01 f8 eb 12 0f 20 df 90 90 90 90 90 48 81 e7 ff e7 ff
[    0.327967] RSP: e02b:ffffffff82e03bc8 EFLAGS: 00010046
[    0.327972] RAX: 0000000000000000 RBX: ffffffff82e03c30 RCX: ffffffff81e01101
[    0.327978] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001f
[    0.327984] RBP: ffffffff82e03bf8 R08: ffffffff81e011ef R09: 0000000000000005
[    0.327990] R10: 0000000000000006 R11: e8ae0feb75ccff49 R12: ffffffff81e011ef
[    0.327996] R13: 0000000000000006 R14: ffffffff81e011f1 R15: 0000000000000002
[    0.328006] FS:  0000000000000000(0000) GS:ffff888015a00000(0000) 
knlGS:0000000000000000
[    0.328012] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.328018] CR2: 0000000000000000 CR3: 0000000002e10000 CR4: 0000000000050660
[    0.328027] Kernel panic - not syncing: Attempted to kill the idle task!
```


# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal


# uname -a
Linux hostname 5.15.0-91-generic #101~20.04.1custom1 SMP Thu Nov 23 12:37:35 
UTC 2023 x86_64 x86_64 x86_64 GNU/Linux


# cat /proc/version_signature 
Ubuntu 5.15.0-91.101~20.04.1custom1-generic 5.15.131


# xl info
host                   : hostname
release                : 5.15.0-91-generic
version                : #101~20.04.1custom1 SMP Thu Nov 23 12:37:35 UTC 2023
machine                : x86_64
nr_cpus                : 80
max_cpu_id             : 79
nr_nodes               : 2
cores_per_socket       : 20
threads_per_core       : 2
cpu_mhz                : 2294.609
hw_caps                : 
bfebfbff:77fef3ff:2c100800:00000121:0000000f:f3bfbfff:00405f4e:00000100
virt_caps              : pv hvm hvm_directio pv_directio hap shadow 
iommu_hap_pt_share vmtrace
total_memory           : 130523
free_memory            : 79395
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 15
xen_extra              : .5
xen_version            : 4.15.5
xen_caps               : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p 
hvm-3.0-x86_64 
xen_scheduler          : credit2
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : Mon Nov 20 09:36:08 2023 +0000 git:0196200b35-dirty
xen_commandline        : placeholder console=vga,com2 com2=115200,8n1 
dom0_max_vcpus=4-8 dom0_mem=min:6144,max:65536m 
iommu=on,required,intpost,verbose,debug x2apic=off sched=credit2 
flask=enforcing gnttab_max_frames=128 xpti=off smt=on cpufreq=xen:performance 
spec-ctrl=gds-mit=0
cc_compiler            : gcc (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
cc_compile_by          : 
cc_compile_domain      : 
cc_compile_date        : Mon Nov 20 09:37:08 UTC 2023
build_id               : 986e88b638105b0dfc4ecf5c9bbb9743a61b2677
xend_config_format     : 4

** Affects: linux-meta (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: focal kernel-bug

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-meta in Ubuntu.
https://bugs.launchpad.net/bugs/2045248

Title:
  focal: 5.15.0-91 crashes on boot as Xen PV guest

Status in linux-meta package in Ubuntu:
  New

Bug description:
  We have a custom build of the kernel based on the Ubuntu-
  hwe-5.15-5.15.0-91.101_20.04.1 tag.  It includes a small number of
  patches but nothing in the area of the early boot code.  Xen is based
  on the upstream 4.15.5 stable branch with all patches up to and
  including XSA-444.  In approximately 1% of pv guest boots we get the
  following crash which looks like it involves the entry_64.S code.  We
  have seen this on different hardware models but only with an Intel
  processor although we don't have any AMD based systems.  The problem
  was also observed with the 5.15.0-85 tag.

  I have had a look on the main line kernel branch for arch/x86/entry
  changes but I can't obviously connect this problem to anything there
  based on the commit messages.  I don't have the knowledge to
  understand the code though and whether there is actually something
  relevant.

  
  ```
  [    0.303715] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user 
pointer sanitization
  [    0.303727] Spectre V2 : Mitigation: Enhanced IBRS
  [    0.303733] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB 
on context switch
  [    0.303740] Spectre V2 : Spectre v2 / PBRSB-eIBRS: Retire a single CALL on 
VMEXIT
  [    0.303746] RETBleed: Mitigation: Enhanced IBRS
  [    0.303752] Spectre V2 : mitigation: Enabling conditional Indirect Branch 
Prediction Barrier
  [    0.303760] Speculative Store Bypass: Mitigation: Speculative Store Bypass 
disabled via prctl and seccomp
  [    0.303771] MMIO Stale Data: Mitigation: Clear CPU buffers
  [    0.303777] GDS: Unknown: Dependent on hypervisor status
  [    0.303827] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
  [    0.303835] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
  [    0.303840] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
  [    0.303846] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
  [    0.303851] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
  [    0.303857] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
  [    0.303865] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
  [    0.303871] x86/fpu: xstate_offset[5]: 1088, xstate_sizes[5]:   64
  [    0.303877] x86/fpu: xstate_offset[6]: 1152, xstate_sizes[6]:  512
  [    0.303882] x86/fpu: xstate_offset[7]: 1664, xstate_sizes[7]: 1024
  [    0.303888] x86/fpu: Enabled xstate features 0xe7, context size is 2688 
bytes, using 'standard' format.
  [    0.327588] segment-related general protection fault: e030 [#1] SMP NOPTI
  [    0.327604] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-91-generic 
#101~20.04.1custom1
  [    0.327614] RIP: e030:native_irq_return_iret+0x0/0x2
  [    0.327627] Code: 5b 41 5b 41 5a 41 59 41 58 58 59 5a 5e 5f 48 83 c4 08 eb 
0f 0f 1f 00 90 66 66 2e 0f 1f 84 00 00 00 00 00 f6 44 24 20 04 75 02 <48> cf 57 
0f 01 f8 eb 12 0f 20 df 90 90 90 90 90 48 81 e7 ff e7 ff
  [    0.327640] RSP: e02b:ffffffff82e03bc8 EFLAGS: 00010046
  [    0.327647] RAX: 0000000000000000 RBX: ffffffff82e03c30 RCX: 
ffffffff81e01101
  [    0.327653] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
000000000000001f
  [    0.327660] RBP: ffffffff82e03bf8 R08: ffffffff81e011ef R09: 
0000000000000005
  [    0.327666] R10: 0000000000000006 R11: e8ae0feb75ccff49 R12: 
ffffffff81e011ef
  [    0.327672] R13: 0000000000000006 R14: ffffffff81e011f1 R15: 
0000000000000002
  [    0.327684] FS:  0000000000000000(0000) GS:ffff888015a00000(0000) 
knlGS:0000000000000000
  [    0.327691] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
  [    0.327696] CR2: 0000000000000000 CR3: 0000000002e10000 CR4: 
0000000000050660
  [    0.327705] Call Trace:
  [    0.327709]  <TASK>
  [    0.327713]  ? show_trace_log_lvl+0x1d6/0x2ea
  [    0.327723]  ? show_trace_log_lvl+0x1d6/0x2ea
  [    0.327729]  ? insn_decode+0xec/0x100
  [    0.327738]  ? show_regs.part.0+0x23/0x29
  [    0.327743]  ? __die_body.cold+0x8/0xd
  [    0.327748]  ? die_addr+0x3e/0x60
  [    0.327756]  ? exc_general_protection+0x1c1/0x350
  [    0.327766]  ? asm_exc_general_protection+0x27/0x30
  [    0.327772]  ? restore_regs_and_return_to_kernel+0x1d/0x2c
  [    0.327778]  ? restore_regs_and_return_to_kernel+0x1b/0x2c
  [    0.327784]  ? restore_regs_and_return_to_kernel+0x1b/0x2c
  [    0.327789]  ? asm_sysvec_xen_hvm_callback+0x11/0x20
  [    0.327796]  ? native_iret+0x7/0x7
  [    0.327801]  ? insn_get_displacement+0x4d/0x110
  [    0.327807]  insn_decode+0xec/0x100
  [    0.327813]  optimize_nops+0x68/0x150
  [    0.327819]  ? restore_regs_and_return_to_kernel+0x1d/0x2c
  [    0.327825]  ? restore_regs_and_return_to_kernel+0x2c/0x2c
  [    0.327830]  ? restore_regs_and_return_to_kernel+0x20/0x2c
  [    0.327837]  apply_alternatives+0x181/0x3a0
  [    0.327843]  ? restore_regs_and_return_to_kernel+0x1b/0x2c
  [    0.327848]  ? fb_is_primary_device+0x25/0x73
  [    0.327855]  ? restore_regs_and_return_to_kernel+0x1b/0x2c
  [    0.327861]  ? apply_alternatives+0x8/0x3a0
  [    0.327867]  ? fb_is_primary_device+0x6e/0x73
  [    0.327872]  ? apply_returns+0xfc/0x180
  [    0.327878]  ? fb_is_primary_device+0x6e/0x73
  [    0.327883]  ? sanitize_boot_params.constprop.0+0xa/0xef
  [    0.327889]  ? fb_is_primary_device+0x73/0x73
  [    0.327895]  alternative_instructions+0xa9/0x173
  [    0.327904]  arch_cpu_finalize_init+0x2c/0x51
  [    0.327909]  start_kernel+0x425/0x4ce
  [    0.327916]  x86_64_start_reservations+0x24/0x2a
  [    0.327922]  xen_start_kernel+0x41e/0x429
  [    0.327928]  startup_xen+0x3e/0x3e
  [    0.327934]  </TASK>
  [    0.327937] Modules linked in:
  [    0.327943] ---[ end trace c275641b4f1eba81 ]---
  [    0.327948] RIP: e030:native_irq_return_iret+0x0/0x2
  [    0.327954] Code: 5b 41 5b 41 5a 41 59 41 58 58 59 5a 5e 5f 48 83 c4 08 eb 
0f 0f 1f 00 90 66 66 2e 0f 1f 84 00 00 00 00 00 f6 44 24 20 04 75 02 <48> cf 57 
0f 01 f8 eb 12 0f 20 df 90 90 90 90 90 48 81 e7 ff e7 ff
  [    0.327967] RSP: e02b:ffffffff82e03bc8 EFLAGS: 00010046
  [    0.327972] RAX: 0000000000000000 RBX: ffffffff82e03c30 RCX: 
ffffffff81e01101
  [    0.327978] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
000000000000001f
  [    0.327984] RBP: ffffffff82e03bf8 R08: ffffffff81e011ef R09: 
0000000000000005
  [    0.327990] R10: 0000000000000006 R11: e8ae0feb75ccff49 R12: 
ffffffff81e011ef
  [    0.327996] R13: 0000000000000006 R14: ffffffff81e011f1 R15: 
0000000000000002
  [    0.328006] FS:  0000000000000000(0000) GS:ffff888015a00000(0000) 
knlGS:0000000000000000
  [    0.328012] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
  [    0.328018] CR2: 0000000000000000 CR3: 0000000002e10000 CR4: 
0000000000050660
  [    0.328027] Kernel panic - not syncing: Attempted to kill the idle task!
  ```

  
  # lsb_release -a
  No LSB modules are available.
  Distributor ID: Ubuntu
  Description:    Ubuntu 20.04.6 LTS
  Release:        20.04
  Codename:       focal

  
  # uname -a
  Linux hostname 5.15.0-91-generic #101~20.04.1custom1 SMP Thu Nov 23 12:37:35 
UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

  
  # cat /proc/version_signature 
  Ubuntu 5.15.0-91.101~20.04.1custom1-generic 5.15.131

  
  # xl info
  host                   : hostname
  release                : 5.15.0-91-generic
  version                : #101~20.04.1custom1 SMP Thu Nov 23 12:37:35 UTC 2023
  machine                : x86_64
  nr_cpus                : 80
  max_cpu_id             : 79
  nr_nodes               : 2
  cores_per_socket       : 20
  threads_per_core       : 2
  cpu_mhz                : 2294.609
  hw_caps                : 
bfebfbff:77fef3ff:2c100800:00000121:0000000f:f3bfbfff:00405f4e:00000100
  virt_caps              : pv hvm hvm_directio pv_directio hap shadow 
iommu_hap_pt_share vmtrace
  total_memory           : 130523
  free_memory            : 79395
  sharing_freed_memory   : 0
  sharing_used_memory    : 0
  outstanding_claims     : 0
  free_cpus              : 0
  xen_major              : 4
  xen_minor              : 15
  xen_extra              : .5
  xen_version            : 4.15.5
  xen_caps               : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p 
hvm-3.0-x86_64 
  xen_scheduler          : credit2
  xen_pagesize           : 4096
  platform_params        : virt_start=0xffff800000000000
  xen_changeset          : Mon Nov 20 09:36:08 2023 +0000 git:0196200b35-dirty
  xen_commandline        : placeholder console=vga,com2 com2=115200,8n1 
dom0_max_vcpus=4-8 dom0_mem=min:6144,max:65536m 
iommu=on,required,intpost,verbose,debug x2apic=off sched=credit2 
flask=enforcing gnttab_max_frames=128 xpti=off smt=on cpufreq=xen:performance 
spec-ctrl=gds-mit=0
  cc_compiler            : gcc (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
  cc_compile_by          : 
  cc_compile_domain      : 
  cc_compile_date        : Mon Nov 20 09:37:08 UTC 2023
  build_id               : 986e88b638105b0dfc4ecf5c9bbb9743a61b2677
  xend_config_format     : 4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-meta/+bug/2045248/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to