Public bug reported:

Hi I'm facing the following crash now two times in a row while runnign the same
test - so somewhat reproducible it seems:

[ 1444.399448] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 1444.431172] #PF: supervisor write access in kernel mode
[ 1444.454715] #PF: error_code(0x0002) - not-present page
[ 1444.478052] PGD 0 P4D 0 
[ 1444.489448] Oops: 0002 [#1] SMP PTI
[ 1444.505120] CPU: 6 PID: 26233 Comm: chcpu Tainted: P        W  O      
5.13.0-27-generic #29~20.04.1-Ubuntu
[ 1444.549884] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS 
P89 01/22/2018
[ 1444.587322] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
[ 1444.611352] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 
8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 
08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
[ 1444.696490] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282
[ 1444.720510] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: ffffbf5d818dbbf0
[ 1444.752719] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 0000000000000000
[ 1444.784978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: ffffbf5d818dbae8
[ 1444.816712] R10: 0000000000000001 R11: 0000000000000001 R12: ffff983d939b0000
[ 1444.848844] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 0000000000000005
[ 1444.881389] FS:  00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) 
knlGS:0000000000000000
[ 1444.918201] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1444.944633] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 00000000001706e0
[ 1444.977001] Call Trace:
[ 1444.988071]  ? blk_mq_exit_hctx+0x160/0x160
[ 1445.007037]  cpuhp_invoke_callback+0x179/0x430
[ 1445.027179]  cpuhp_invoke_callback_range+0x44/0x80
[ 1445.048737]  _cpu_down+0x109/0x310
[ 1445.064062]  cpu_down+0x36/0x60
[ 1445.077882]  cpu_device_down+0x16/0x20
[ 1445.094741]  cpu_subsys_offline+0xe/0x10
[ 1445.112439]  device_offline+0x8e/0xc0
[ 1445.129064]  online_store+0x4c/0x90
[ 1445.144835]  dev_attr_store+0x17/0x30
[ 1445.161307]  sysfs_kf_write+0x3e/0x50
[ 1445.177856]  kernfs_fop_write_iter+0x138/0x1c0
[ 1445.198036]  new_sync_write+0x117/0x1b0
[ 1445.215386]  vfs_write+0x185/0x250
[ 1445.230649]  ksys_write+0x67/0xe0
[ 1445.245565]  __x64_sys_write+0x1a/0x20
[ 1445.262448]  do_syscall_64+0x61/0xb0
[ 1445.278585]  ? do_syscall_64+0x6e/0xb0
[ 1445.295940]  ? asm_exc_page_fault+0x8/0x30
[ 1445.314969]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1445.338356] RIP: 0033:0x7f1c8fda30a7
[ 1445.355062] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 
f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 1445.440161] RSP: 002b:00007fffed1c4418 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[ 1445.474829] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007f1c8fda30a7
[ 1445.507219] RDX: 0000000000000001 RSI: 0000559369f25869 RDI: 0000000000000004
[ 1445.539438] RBP: 00007f1c8fe88500 R08: 0000000000000000 R09: 00007fffed1c43c0
[ 1445.572547] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
[ 1445.604842] R13: 00007fffed1c4420 R14: 0000559369f25869 R15: 0000000000000001
[ 1445.636897] Modules linked in: vhost_net tap ebtable_filter ebtables veth 
nbd xt_comment zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) 
zcommon(PO) znvpair(PO) spl(O) vhost_vsock vmw_vsock_virtio_transport_common 
vhost vhost_iotlb vsock xt_MASQUERADE xt_conntrack xt_CHECKSUM ipt_REJECT 
nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle 
iptable_nat nf_tables ip6table_filter ip6_tables iptable_filter bpfilter bridge 
stp llc nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 uio_pci_generic uio nls_iso8859_1 
dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr 
intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp 
rpcrdma kvm_intel sunrpc kvm rdma_ucm ib_iser libiscsi scsi_transport_iscsi 
rapl ib_umad rdma_cm ib_ipoib intel_cstate efi_pstore iw_cm ib_cm hpilo ioatdma 
acpi_ipmi acpi_tad ipmi_si mac_hid acpi_power_meter sch_fq_codel ipmi_devintf 
ipmi_msghandler msr
[ 1445.636951]  ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core ses 
enclosure mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul syscopyarea 
crc32_pclmul sysfillrect ghash_clmulni_intel sysimgblt fb_sys_fops aesni_intel 
mlx5_core cec ixgbe pci_hyperv_intf crypto_simd xfrm_algo psample nvme cryptd 
rc_core hpsa mlxfw i2c_i801 dca xhci_pci drm i2c_smbus lpc_ich tg3 tls 
xhci_pci_renesas nvme_core mdio scsi_transport_sas wmi
[ 1446.267521] CR2: 0000000000000008
[ 1446.282506] ---[ end trace 99f81ab62ed1f929 ]---
[ 1446.308857] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
[ 1446.311595] ixgbe 0000:04:00.1 eno50: NIC Link is Up 10 Gbps, Flow Control: 
None
[ 1446.332973] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 
8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 
08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
[ 1446.332975] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282
[ 1446.332977] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: ffffbf5d818dbbf0
[ 1446.332978] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 0000000000000000
[ 1446.332978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: ffffbf5d818dbae8
[ 1446.332979] R10: 0000000000000001 R11: 0000000000000001 R12: ffff983d939b0000
[ 1446.332980] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 0000000000000005
[ 1446.332981] FS:  00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) 
knlGS:0000000000000000
[ 1446.332982] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1446.332983] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 00000000001706e0

The system is somewhat stuck afterwards.
I can't get back to libvirt (not restart the service, not spawn a new guest),
nor openvswitch (ovs-vsctl show) all those calls get stuck while other things
somewhat work. But also e.g. a new ssh login is stuck, so debugging after the
crash is very limited.

The order in which the tests do things is like:
1. set up a simple openvswitch
2. start the libvirt network for this OVS instance
3. disable cpus 5-11 (as I want the test to only have 0-4)
4. start a VM guest using that OVS network

It seems (based on log timing) that the crash might happen at either at
chcpu or guest start.


--- details ---

The guest I start is via uvtool:
 $ uvt-kvm create --run-script-once=init-ens4-dhcp.sh --memory 2048
   --template guest-openvswitch-1.xml --machine-type ubuntu --unsafe-caching
   --cpu 4 --ssh-public-key-file /home/ubuntu/.ssh/id_rsa.pub
   --password=ubuntu guest-openvswitch-1 release=focal arch=amd64 label=daily

Differences to a "normal" guest are
- using an openvswitch connection
- using huge pages

The used template is like this:

<domain type='kvm'>
        <os>
                <type>hvm</type>
                <boot dev='hd'/>
        </os>
<!-- we pass as much as possible -->
<cpu mode='host-passthrough'>
        <numa>
                <cell id='0' cpus='0-3' memory='2097152' unit='KiB' 
memAccess='shared'/>
        </numa>
</cpu>
<currentMemory unit='KiB'>2097152</currentMemory>
<memoryBacking>
        <hugepages>
        <page size="2" unit="M" nodeset="0"/>
    </hugepages>
</memoryBacking>
<vcpu placement='static'>4</vcpu>
<features>
        <acpi/>
        <apic/>
        <pae/>
</features>
<devices>
        <interface type='network'>
                <source network='default'/>
                <model type='virtio'/>
                <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
function='0x0'/>
        </interface>
        <interface type='network'>
                <source network='ovsbr0'/>
                <model type='virtio'/>
                <driver name='vhost' queues='4'/>
                <address type='pci' domain='0x0000' bus='0x00' slot='0x04' 
function='0x0'/>
        </interface>
        <serial type='pty'>
                <source path='/dev/pts/3'/>
                <target port='0'/>
        </serial>
        <video/>
</devices>
</domain>

The OVS setup is rather simple,
one internal bridge and one upstream port, nothing "too special".
This looks like:

+ ovs-vsctl show
8dfc2067-7b9b-48d7-a50a-df17bbd3cb6c
    Bridge ovsbr0
        Port eno49
            Interface eno49
        Port ovsbr0
            Interface ovsbr0
                type: internal
    ovs_version: "2.13.5


Libvirt knows about that OVS bridge and has a network to use it like:

<network>
    <name>ovsbr0</name>
    <forward mode='bridge'/>
    <bridge name='ovsbr0'/>
    <virtualport type='openvswitch'/>
</network>
--- 
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116,  1 Mar 29 07:06 seq
 crw-rw---- 1 root audio 116, 33 Mar 29 07:06 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: skip
DistroRelease: Ubuntu 20.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: HP ProLiant DL360 Gen9
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
PciMultimedia:
 
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.13.0-27-generic 
root=UUID=c941b173-e6b5-485a-a02b-8d966b8d3c73 ro --- console=ttyS1,115200
ProcVersionSignature: Ubuntu 5.13.0-27.29~20.04.1-generic 5.13.19
RelatedPackageVersions:
 linux-restricted-modules-5.13.0-27-generic N/A
 linux-backports-modules-5.13.0-27-generic  N/A
 linux-firmware                             1.187.29
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags:  focal uec-images
Uname: Linux 5.13.0-27-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: kvm libvirt
_MarkForUpload: True
dmi.bios.date: 01/22/2018
dmi.bios.release: 2.56
dmi.bios.vendor: HP
dmi.bios.version: P89
dmi.board.name: ProLiant DL360 Gen9
dmi.board.vendor: HP
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.ec.firmware.release: 2.60
dmi.modalias: 
dmi:bvnHP:bvrP89:bd01/22/2018:br2.56:efr2.60:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr:sku780018-S01:
dmi.product.family: ProLiant
dmi.product.name: ProLiant DL360 Gen9
dmi.product.sku: 780018-S01
dmi.sys.vendor: HP

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: apport-collected focal uec-images

** Tags added: apport-collected focal uec-images

** Description changed:

  Hi I'm facing the following crash now two times in a row while runnign the 
same
  test - so somewhat reproducible it seems:
  
  [ 1444.399448] BUG: kernel NULL pointer dereference, address: 0000000000000008
  [ 1444.431172] #PF: supervisor write access in kernel mode
  [ 1444.454715] #PF: error_code(0x0002) - not-present page
  [ 1444.478052] PGD 0 P4D 0 
  [ 1444.489448] Oops: 0002 [#1] SMP PTI
  [ 1444.505120] CPU: 6 PID: 26233 Comm: chcpu Tainted: P        W  O      
5.13.0-27-generic #29~20.04.1-Ubuntu
  [ 1444.549884] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, 
BIOS P89 01/22/2018
  [ 1444.587322] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
  [ 1444.611352] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 
8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 
08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
  [ 1444.696490] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282
  [ 1444.720510] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: 
ffffbf5d818dbbf0
  [ 1444.752719] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 
0000000000000000
  [ 1444.784978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: 
ffffbf5d818dbae8
  [ 1444.816712] R10: 0000000000000001 R11: 0000000000000001 R12: 
ffff983d939b0000
  [ 1444.848844] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 
0000000000000005
  [ 1444.881389] FS:  00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) 
knlGS:0000000000000000
  [ 1444.918201] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1444.944633] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 
00000000001706e0
  [ 1444.977001] Call Trace:
  [ 1444.988071]  ? blk_mq_exit_hctx+0x160/0x160
  [ 1445.007037]  cpuhp_invoke_callback+0x179/0x430
  [ 1445.027179]  cpuhp_invoke_callback_range+0x44/0x80
  [ 1445.048737]  _cpu_down+0x109/0x310
  [ 1445.064062]  cpu_down+0x36/0x60
  [ 1445.077882]  cpu_device_down+0x16/0x20
  [ 1445.094741]  cpu_subsys_offline+0xe/0x10
  [ 1445.112439]  device_offline+0x8e/0xc0
  [ 1445.129064]  online_store+0x4c/0x90
  [ 1445.144835]  dev_attr_store+0x17/0x30
  [ 1445.161307]  sysfs_kf_write+0x3e/0x50
  [ 1445.177856]  kernfs_fop_write_iter+0x138/0x1c0
  [ 1445.198036]  new_sync_write+0x117/0x1b0
  [ 1445.215386]  vfs_write+0x185/0x250
  [ 1445.230649]  ksys_write+0x67/0xe0
  [ 1445.245565]  __x64_sys_write+0x1a/0x20
  [ 1445.262448]  do_syscall_64+0x61/0xb0
  [ 1445.278585]  ? do_syscall_64+0x6e/0xb0
  [ 1445.295940]  ? asm_exc_page_fault+0x8/0x30
  [ 1445.314969]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [ 1445.338356] RIP: 0033:0x7f1c8fda30a7
  [ 1445.355062] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 
f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
  [ 1445.440161] RSP: 002b:00007fffed1c4418 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
  [ 1445.474829] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 
00007f1c8fda30a7
  [ 1445.507219] RDX: 0000000000000001 RSI: 0000559369f25869 RDI: 
0000000000000004
  [ 1445.539438] RBP: 00007f1c8fe88500 R08: 0000000000000000 R09: 
00007fffed1c43c0
  [ 1445.572547] R10: 0000000000000000 R11: 0000000000000246 R12: 
0000000000000004
  [ 1445.604842] R13: 00007fffed1c4420 R14: 0000559369f25869 R15: 
0000000000000001
  [ 1445.636897] Modules linked in: vhost_net tap ebtable_filter ebtables veth 
nbd xt_comment zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) 
zcommon(PO) znvpair(PO) spl(O) vhost_vsock vmw_vsock_virtio_transport_common 
vhost vhost_iotlb vsock xt_MASQUERADE xt_conntrack xt_CHECKSUM ipt_REJECT 
nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle 
iptable_nat nf_tables ip6table_filter ip6_tables iptable_filter bpfilter bridge 
stp llc nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 uio_pci_generic uio nls_iso8859_1 
dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr 
intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp 
rpcrdma kvm_intel sunrpc kvm rdma_ucm ib_iser libiscsi scsi_transport_iscsi 
rapl ib_umad rdma_cm ib_ipoib intel_cstate efi_pstore iw_cm ib_cm hpilo ioatdma 
acpi_ipmi acpi_tad ipmi_si mac_hid acpi_power_meter sch_fq_codel ipmi_devintf 
ipmi_msghandler msr
  [ 1445.636951]  ip_tables x_tables autofs4 btrfs blake2b_generic 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs 
ib_core ses enclosure mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul 
syscopyarea crc32_pclmul sysfillrect ghash_clmulni_intel sysimgblt fb_sys_fops 
aesni_intel mlx5_core cec ixgbe pci_hyperv_intf crypto_simd xfrm_algo psample 
nvme cryptd rc_core hpsa mlxfw i2c_i801 dca xhci_pci drm i2c_smbus lpc_ich tg3 
tls xhci_pci_renesas nvme_core mdio scsi_transport_sas wmi
  [ 1446.267521] CR2: 0000000000000008
  [ 1446.282506] ---[ end trace 99f81ab62ed1f929 ]---
  [ 1446.308857] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
  [ 1446.311595] ixgbe 0000:04:00.1 eno50: NIC Link is Up 10 Gbps, Flow 
Control: None
  [ 1446.332973] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 
8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 
08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
  [ 1446.332975] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282
  [ 1446.332977] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: 
ffffbf5d818dbbf0
  [ 1446.332978] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 
0000000000000000
  [ 1446.332978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: 
ffffbf5d818dbae8
  [ 1446.332979] R10: 0000000000000001 R11: 0000000000000001 R12: 
ffff983d939b0000
  [ 1446.332980] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 
0000000000000005
  [ 1446.332981] FS:  00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) 
knlGS:0000000000000000
  [ 1446.332982] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1446.332983] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 
00000000001706e0
  
  The system is somewhat stuck afterwards.
  I can't get back to libvirt (not restart the service, not spawn a new guest),
  nor openvswitch (ovs-vsctl show) all those calls get stuck while other things
  somewhat work. But also e.g. a new ssh login is stuck, so debugging after the
  crash is very limited.
  
  The order in which the tests do things is like:
  1. set up a simple openvswitch
  2. start the libvirt network for this OVS instance
  3. disable cpus 5-11 (as I want the test to only have 0-4)
  4. start a VM guest using that OVS network
  
  It seems (based on log timing) that the crash might happen at either at
  chcpu or guest start.
  
  
  --- details ---
  
  The guest I start is via uvtool:
   $ uvt-kvm create --run-script-once=init-ens4-dhcp.sh --memory 2048
     --template guest-openvswitch-1.xml --machine-type ubuntu --unsafe-caching
     --cpu 4 --ssh-public-key-file /home/ubuntu/.ssh/id_rsa.pub
     --password=ubuntu guest-openvswitch-1 release=focal arch=amd64 label=daily
  
  Differences to a "normal" guest are
  - using an openvswitch connection
  - using huge pages
  
  The used template is like this:
  
  <domain type='kvm'>
        <os>
                <type>hvm</type>
                <boot dev='hd'/>
        </os>
  <!-- we pass as much as possible -->
  <cpu mode='host-passthrough'>
        <numa>
                <cell id='0' cpus='0-3' memory='2097152' unit='KiB' 
memAccess='shared'/>
        </numa>
  </cpu>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <memoryBacking>
        <hugepages>
          <page size="2" unit="M" nodeset="0"/>
      </hugepages>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <features>
        <acpi/>
        <apic/>
        <pae/>
  </features>
  <devices>
        <interface type='network'>
                <source network='default'/>
                <model type='virtio'/>
                <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
function='0x0'/>
        </interface>
        <interface type='network'>
                <source network='ovsbr0'/>
                <model type='virtio'/>
                <driver name='vhost' queues='4'/>
                <address type='pci' domain='0x0000' bus='0x00' slot='0x04' 
function='0x0'/>
        </interface>
        <serial type='pty'>
                <source path='/dev/pts/3'/>
                <target port='0'/>
        </serial>
        <video/>
  </devices>
  </domain>
  
  The OVS setup is rather simple,
  one internal bridge and one upstream port, nothing "too special".
  This looks like:
  
  + ovs-vsctl show
  8dfc2067-7b9b-48d7-a50a-df17bbd3cb6c
      Bridge ovsbr0
          Port eno49
              Interface eno49
          Port ovsbr0
              Interface ovsbr0
                  type: internal
      ovs_version: "2.13.5
  
  
  Libvirt knows about that OVS bridge and has a network to use it like:
  
  <network>
      <name>ovsbr0</name>
      <forward mode='bridge'/>
      <bridge name='ovsbr0'/>
      <virtualport type='openvswitch'/>
  </network>
+ --- 
+ ProblemType: Bug
+ AlsaDevices:
+  total 0
+  crw-rw---- 1 root audio 116,  1 Mar 29 07:06 seq
+  crw-rw---- 1 root audio 116, 33 Mar 29 07:06 timer
+ AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
+ ApportVersion: 2.20.11-0ubuntu27.21
+ Architecture: amd64
+ ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
+ AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
+ CasperMD5CheckResult: skip
+ DistroRelease: Ubuntu 20.04
+ IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
+ MachineType: HP ProLiant DL360 Gen9
+ NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
+ Package: linux (not installed)
+ PciMultimedia:
+  
+ ProcFB: 0 mgag200drmfb
+ ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.13.0-27-generic 
root=UUID=c941b173-e6b5-485a-a02b-8d966b8d3c73 ro --- console=ttyS1,115200
+ ProcVersionSignature: Ubuntu 5.13.0-27.29~20.04.1-generic 5.13.19
+ RelatedPackageVersions:
+  linux-restricted-modules-5.13.0-27-generic N/A
+  linux-backports-modules-5.13.0-27-generic  N/A
+  linux-firmware                             1.187.29
+ RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
+ Tags:  focal uec-images
+ Uname: Linux 5.13.0-27-generic x86_64
+ UpgradeStatus: No upgrade log present (probably fresh install)
+ UserGroups: kvm libvirt
+ _MarkForUpload: True
+ dmi.bios.date: 01/22/2018
+ dmi.bios.release: 2.56
+ dmi.bios.vendor: HP
+ dmi.bios.version: P89
+ dmi.board.name: ProLiant DL360 Gen9
+ dmi.board.vendor: HP
+ dmi.chassis.type: 23
+ dmi.chassis.vendor: HP
+ dmi.ec.firmware.release: 2.60
+ dmi.modalias: 
dmi:bvnHP:bvrP89:bd01/22/2018:br2.56:efr2.60:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr:sku780018-S01:
+ dmi.product.family: ProLiant
+ dmi.product.name: ProLiant DL360 Gen9
+ dmi.product.sku: 780018-S01
+ dmi.sys.vendor: HP

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1966870

Title:
  Focal 20.04.4 crashing when using openvswitch/hugepages

Status in linux package in Ubuntu:
  New

Bug description:
  Hi I'm facing the following crash now two times in a row while runnign the 
same
  test - so somewhat reproducible it seems:

  [ 1444.399448] BUG: kernel NULL pointer dereference, address: 0000000000000008
  [ 1444.431172] #PF: supervisor write access in kernel mode
  [ 1444.454715] #PF: error_code(0x0002) - not-present page
  [ 1444.478052] PGD 0 P4D 0 
  [ 1444.489448] Oops: 0002 [#1] SMP PTI
  [ 1444.505120] CPU: 6 PID: 26233 Comm: chcpu Tainted: P        W  O      
5.13.0-27-generic #29~20.04.1-Ubuntu
  [ 1444.549884] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, 
BIOS P89 01/22/2018
  [ 1444.587322] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
  [ 1444.611352] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 
8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 
08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
  [ 1444.696490] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282
  [ 1444.720510] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: 
ffffbf5d818dbbf0
  [ 1444.752719] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 
0000000000000000
  [ 1444.784978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: 
ffffbf5d818dbae8
  [ 1444.816712] R10: 0000000000000001 R11: 0000000000000001 R12: 
ffff983d939b0000
  [ 1444.848844] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 
0000000000000005
  [ 1444.881389] FS:  00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) 
knlGS:0000000000000000
  [ 1444.918201] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1444.944633] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 
00000000001706e0
  [ 1444.977001] Call Trace:
  [ 1444.988071]  ? blk_mq_exit_hctx+0x160/0x160
  [ 1445.007037]  cpuhp_invoke_callback+0x179/0x430
  [ 1445.027179]  cpuhp_invoke_callback_range+0x44/0x80
  [ 1445.048737]  _cpu_down+0x109/0x310
  [ 1445.064062]  cpu_down+0x36/0x60
  [ 1445.077882]  cpu_device_down+0x16/0x20
  [ 1445.094741]  cpu_subsys_offline+0xe/0x10
  [ 1445.112439]  device_offline+0x8e/0xc0
  [ 1445.129064]  online_store+0x4c/0x90
  [ 1445.144835]  dev_attr_store+0x17/0x30
  [ 1445.161307]  sysfs_kf_write+0x3e/0x50
  [ 1445.177856]  kernfs_fop_write_iter+0x138/0x1c0
  [ 1445.198036]  new_sync_write+0x117/0x1b0
  [ 1445.215386]  vfs_write+0x185/0x250
  [ 1445.230649]  ksys_write+0x67/0xe0
  [ 1445.245565]  __x64_sys_write+0x1a/0x20
  [ 1445.262448]  do_syscall_64+0x61/0xb0
  [ 1445.278585]  ? do_syscall_64+0x6e/0xb0
  [ 1445.295940]  ? asm_exc_page_fault+0x8/0x30
  [ 1445.314969]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [ 1445.338356] RIP: 0033:0x7f1c8fda30a7
  [ 1445.355062] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 
f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
  [ 1445.440161] RSP: 002b:00007fffed1c4418 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
  [ 1445.474829] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 
00007f1c8fda30a7
  [ 1445.507219] RDX: 0000000000000001 RSI: 0000559369f25869 RDI: 
0000000000000004
  [ 1445.539438] RBP: 00007f1c8fe88500 R08: 0000000000000000 R09: 
00007fffed1c43c0
  [ 1445.572547] R10: 0000000000000000 R11: 0000000000000246 R12: 
0000000000000004
  [ 1445.604842] R13: 00007fffed1c4420 R14: 0000559369f25869 R15: 
0000000000000001
  [ 1445.636897] Modules linked in: vhost_net tap ebtable_filter ebtables veth 
nbd xt_comment zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) 
zcommon(PO) znvpair(PO) spl(O) vhost_vsock vmw_vsock_virtio_transport_common 
vhost vhost_iotlb vsock xt_MASQUERADE xt_conntrack xt_CHECKSUM ipt_REJECT 
nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle 
iptable_nat nf_tables ip6table_filter ip6_tables iptable_filter bpfilter bridge 
stp llc nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 uio_pci_generic uio nls_iso8859_1 
dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr 
intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp 
rpcrdma kvm_intel sunrpc kvm rdma_ucm ib_iser libiscsi scsi_transport_iscsi 
rapl ib_umad rdma_cm ib_ipoib intel_cstate efi_pstore iw_cm ib_cm hpilo ioatdma 
acpi_ipmi acpi_tad ipmi_si mac_hid acpi_power_meter sch_fq_codel ipmi_devintf 
ipmi_msghandler msr
  [ 1445.636951]  ip_tables x_tables autofs4 btrfs blake2b_generic 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs 
ib_core ses enclosure mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul 
syscopyarea crc32_pclmul sysfillrect ghash_clmulni_intel sysimgblt fb_sys_fops 
aesni_intel mlx5_core cec ixgbe pci_hyperv_intf crypto_simd xfrm_algo psample 
nvme cryptd rc_core hpsa mlxfw i2c_i801 dca xhci_pci drm i2c_smbus lpc_ich tg3 
tls xhci_pci_renesas nvme_core mdio scsi_transport_sas wmi
  [ 1446.267521] CR2: 0000000000000008
  [ 1446.282506] ---[ end trace 99f81ab62ed1f929 ]---
  [ 1446.308857] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
  [ 1446.311595] ixgbe 0000:04:00.1 eno50: NIC Link is Up 10 Gbps, Flow 
Control: None
  [ 1446.332973] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 
8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 
08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
  [ 1446.332975] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282
  [ 1446.332977] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: 
ffffbf5d818dbbf0
  [ 1446.332978] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 
0000000000000000
  [ 1446.332978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: 
ffffbf5d818dbae8
  [ 1446.332979] R10: 0000000000000001 R11: 0000000000000001 R12: 
ffff983d939b0000
  [ 1446.332980] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 
0000000000000005
  [ 1446.332981] FS:  00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) 
knlGS:0000000000000000
  [ 1446.332982] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1446.332983] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 
00000000001706e0

  The system is somewhat stuck afterwards.
  I can't get back to libvirt (not restart the service, not spawn a new guest),
  nor openvswitch (ovs-vsctl show) all those calls get stuck while other things
  somewhat work. But also e.g. a new ssh login is stuck, so debugging after the
  crash is very limited.

  The order in which the tests do things is like:
  1. set up a simple openvswitch
  2. start the libvirt network for this OVS instance
  3. disable cpus 5-11 (as I want the test to only have 0-4)
  4. start a VM guest using that OVS network

  It seems (based on log timing) that the crash might happen at either at
  chcpu or guest start.

  
  --- details ---

  The guest I start is via uvtool:
   $ uvt-kvm create --run-script-once=init-ens4-dhcp.sh --memory 2048
     --template guest-openvswitch-1.xml --machine-type ubuntu --unsafe-caching
     --cpu 4 --ssh-public-key-file /home/ubuntu/.ssh/id_rsa.pub
     --password=ubuntu guest-openvswitch-1 release=focal arch=amd64 label=daily

  Differences to a "normal" guest are
  - using an openvswitch connection
  - using huge pages

  The used template is like this:

  <domain type='kvm'>
        <os>
                <type>hvm</type>
                <boot dev='hd'/>
        </os>
  <!-- we pass as much as possible -->
  <cpu mode='host-passthrough'>
        <numa>
                <cell id='0' cpus='0-3' memory='2097152' unit='KiB' 
memAccess='shared'/>
        </numa>
  </cpu>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <memoryBacking>
        <hugepages>
          <page size="2" unit="M" nodeset="0"/>
      </hugepages>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <features>
        <acpi/>
        <apic/>
        <pae/>
  </features>
  <devices>
        <interface type='network'>
                <source network='default'/>
                <model type='virtio'/>
                <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
function='0x0'/>
        </interface>
        <interface type='network'>
                <source network='ovsbr0'/>
                <model type='virtio'/>
                <driver name='vhost' queues='4'/>
                <address type='pci' domain='0x0000' bus='0x00' slot='0x04' 
function='0x0'/>
        </interface>
        <serial type='pty'>
                <source path='/dev/pts/3'/>
                <target port='0'/>
        </serial>
        <video/>
  </devices>
  </domain>

  The OVS setup is rather simple,
  one internal bridge and one upstream port, nothing "too special".
  This looks like:

  + ovs-vsctl show
  8dfc2067-7b9b-48d7-a50a-df17bbd3cb6c
      Bridge ovsbr0
          Port eno49
              Interface eno49
          Port ovsbr0
              Interface ovsbr0
                  type: internal
      ovs_version: "2.13.5

  
  Libvirt knows about that OVS bridge and has a network to use it like:

  <network>
      <name>ovsbr0</name>
      <forward mode='bridge'/>
      <bridge name='ovsbr0'/>
      <virtualport type='openvswitch'/>
  </network>
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Mar 29 07:06 seq
   crw-rw---- 1 root audio 116, 33 Mar 29 07:06 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.21
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  MachineType: HP ProLiant DL360 Gen9
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcFB: 0 mgag200drmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.13.0-27-generic 
root=UUID=c941b173-e6b5-485a-a02b-8d966b8d3c73 ro --- console=ttyS1,115200
  ProcVersionSignature: Ubuntu 5.13.0-27.29~20.04.1-generic 5.13.19
  RelatedPackageVersions:
   linux-restricted-modules-5.13.0-27-generic N/A
   linux-backports-modules-5.13.0-27-generic  N/A
   linux-firmware                             1.187.29
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  focal uec-images
  Uname: Linux 5.13.0-27-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: kvm libvirt
  _MarkForUpload: True
  dmi.bios.date: 01/22/2018
  dmi.bios.release: 2.56
  dmi.bios.vendor: HP
  dmi.bios.version: P89
  dmi.board.name: ProLiant DL360 Gen9
  dmi.board.vendor: HP
  dmi.chassis.type: 23
  dmi.chassis.vendor: HP
  dmi.ec.firmware.release: 2.60
  dmi.modalias: 
dmi:bvnHP:bvrP89:bd01/22/2018:br2.56:efr2.60:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr:sku780018-S01:
  dmi.product.family: ProLiant
  dmi.product.name: ProLiant DL360 Gen9
  dmi.product.sku: 780018-S01
  dmi.sys.vendor: HP

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1966870/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to