Public bug reported:

Setup is comprised of multiple compute nodes in an OpenStack setup, all
nodes being connected to a SAN storage through FC.

Env specs:
Ubuntu-Server 20.04.3 LTS
Kernel: 5.4.0-89-generic
CPU: AMD EPYC 7H12

At random times we observe the nodes getting locked up, system load is 
increasing and no actions can be taken, leading to having to reboot the server 
to recover.
There is no pattern in this and stress testing the servers does not reproduce 
this.

Log snippet:
[1673239.174269] general protection fault: 0000 [#1] SMP NOPTI
[1673239.183446] CPU: 97 PID: 1224718 Comm: cadvisor Not tainted 
5.4.0-89-generic #100-Ubuntu
[1673239.192622] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.3.6 
07/06/2021
[1673239.203336] RIP: 0010:string_nocheck+0x38/0x60
[1673239.212811] Code: 66 85 c0 74 3e 83 e8 01 4c 8d 5c 07 01 31 c0 eb 19 49 39 
fa 76 03 44 88 07 48 83 c7 01 41 8d 71 01 48 83 c0 01 4c 39 df 74 0f <44> 0f b6 
04 02 41 89 c1 89 c6 45 84 c0 75 d8 4c 89 d2 e8 11 ff ff
[1673239.232904] RSP: 0018:ffffa25f3199fba0 EFLAGS: 00010046
[1673239.244331] RAX: 0000000000000000 RBX: ffffa25f3199fc58 RCX: 
ffff0a00ffffff04
[1673239.256551] RDX: d969688991a5a25c RSI: ffff8de32b560000 RDI: 
ffff8de32b5400c6
[1673239.269226] RBP: ffffa25f3199fba0 R08: ffffffff9c445a00 R09: 
0000000000ffff0a
[1673239.279111] R10: ffff8de32b560000 R11: ffff8de42b5400c5 R12: 
ffff8de32b560000
[1673239.289855] R13: d969688991a5a25c R14: ffff0a00ffffff04 R15: 
ffff8de32b5400c6
[1673239.299447] FS:  00007f925a7fc700(0000) GS:ffff8df87f440000(0000) 
knlGS:0000000000000000
[1673239.308670] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1673239.317988] CR2: 00007f9012efcfb8 CR3: 0000007fa1abe000 CR4: 
0000000000340ee0
[1673239.327377] Call Trace:
[1673239.337796]  string+0x4a/0x60
[1673239.347948]  vsnprintf+0x26f/0x4e0
[1673239.356909]  seq_vprintf+0x35/0x50
[1673239.365819]  seq_printf+0x53/0x70
[1673239.374919]  __blkg_prfill_rwstat+0x5d/0xb0
[1673239.383362]  blkg_prfill_rwstat_field+0x97/0xc0
[1673239.391580]  blkcg_print_blkgs+0xba/0xf0
[1673239.399891]  ? blkg_prfill_rwstat+0xc0/0xc0
[1673239.408266]  blkg_print_stat_bytes+0x45/0x50
[1673239.416378]  cgroup_seqfile_show+0x56/0xc0
[1673239.424336]  kernfs_seq_show+0x27/0x30
[1673239.432208]  seq_read+0xdc/0x490
[1673239.440065]  kernfs_fop_read+0x35/0x1b0
[1673239.448107]  __vfs_read+0x1b/0x40
[1673239.456329]  vfs_read+0xab/0x160
[1673239.465529]  ksys_read+0x67/0xe0
[1673239.473872]  __x64_sys_read+0x1a/0x20
[1673239.481269]  do_syscall_64+0x57/0x190
[1673239.488798]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1673239.497445] RIP: 0033:0x4cc910
[1673239.504694] Code: 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 49 c7 c2 00 00 
00 00 49 c7 c0 00 00 00 00 49 c7 c1 00 00 00 00 48 8b 44 24 08 0f 05 <48> 3d 01 
f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
[1673239.519380] RSP: 002b:000000c01dd1e7a0 EFLAGS: 00000202 ORIG_RAX: 
0000000000000000
[1673239.526527] RAX: ffffffffffffffda RBX: 000000c000046f00 RCX: 
00000000004cc910
[1673239.534890] RDX: 0000000000001000 RSI: 000000c00c2cd000 RDI: 
000000000000000e
[1673239.542958] RBP: 000000c01dd1e7f0 R08: 0000000000000000 R09: 
0000000000000000
[1673239.549915] R10: 0000000000000000 R11: 0000000000000202 R12: 
ffffffffffffffff
[1673239.558350] R13: 0000000000000002 R14: 0000000000000001 R15: 
0000000000000002
[1673239.565178] Modules linked in: veth vhost_net nf_conntrack_netlink vhost 
tap dm_queue_length cls_u32 sch_cbq xsk_diag udp_diag raw_diag unix_diag 
af_packet_diag tcp_diag inet_diag netlink_diag ebtable_filter ebtables 
sch_ingress geneve ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout nfnetlink aufs 
overlay rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache bonding 
ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 
xt_tcpudp xt_comment xt_state xt_conntrack iptable_filter bpfilter 
nls_iso8859_1 scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif amd64_edac_mod 
edac_mce_amd dell_smbios kvm_amd dcdbas kvm wmi_bmof dell_wmi_descriptor ccp 
k10temp ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid 
sch_fq_codel tcp_bbr openvswitch nsh nf_conncount nf_nat nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 msr dm_multipath br_netfilter bridge stp llc 
sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 
async_raid6_recov async_memcpy async_pq
[1673239.565265]  async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
multipath linear mlx5_ib lpfc drm_vram_helper i2c_algo_bit ib_uverbs nvmet_fc 
crct10dif_pclmul ib_core crc32_pclmul ttm ghash_clmulni_intel drm_kms_helper 
aesni_intel nvmet syscopyarea crypto_simd sysfillrect cryptd nvme_fc 
glue_helper sysimgblt nvme_fabrics ahci fb_sys_fops mlx5_core tg3 libahci 
nvme_core pci_hyperv_intf drm tls scsi_transport_fc mlxfw megaraid_sas 
i2c_piix4 wmi
[1673239.664974] ---[ end trace b20e1996a1c8240d ]---

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Confirmed


** Tags: apport-collected focal uec-images

** Tags added: apport-collected focal uec-images

** Description changed:

  Setup is comprised of multiple compute nodes in an OpenStack setup, all
  nodes being connected to a SAN storage through FC.
  
  At random times we observe the nodes getting locked up, system load is 
increasing and no actions can be taken, leading to having to reboot the server 
to recover.
  There is no pattern in this and stress testing the servers does not reproduce 
this.
  
  
  Log snippet:
  [1673239.174269] general protection fault: 0000 [#1] SMP NOPTI
  [1673239.183446] CPU: 97 PID: 1224718 Comm: cadvisor Not tainted 
5.4.0-89-generic #100-Ubuntu
  [1673239.192622] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.3.6 
07/06/2021
  [1673239.203336] RIP: 0010:string_nocheck+0x38/0x60
  [1673239.212811] Code: 66 85 c0 74 3e 83 e8 01 4c 8d 5c 07 01 31 c0 eb 19 49 
39 fa 76 03 44 88 07 48 83 c7 01 41 8d 71 01 48 83 c0 01 4c 39 df 74 0f <44> 0f 
b6 04 02 41 89 c1 89 c6 45 84 c0 75 d8 4c 89 d2 e8 11 ff ff
  [1673239.232904] RSP: 0018:ffffa25f3199fba0 EFLAGS: 00010046
  [1673239.244331] RAX: 0000000000000000 RBX: ffffa25f3199fc58 RCX: 
ffff0a00ffffff04
  [1673239.256551] RDX: d969688991a5a25c RSI: ffff8de32b560000 RDI: 
ffff8de32b5400c6
  [1673239.269226] RBP: ffffa25f3199fba0 R08: ffffffff9c445a00 R09: 
0000000000ffff0a
  [1673239.279111] R10: ffff8de32b560000 R11: ffff8de42b5400c5 R12: 
ffff8de32b560000
  [1673239.289855] R13: d969688991a5a25c R14: ffff0a00ffffff04 R15: 
ffff8de32b5400c6
  [1673239.299447] FS:  00007f925a7fc700(0000) GS:ffff8df87f440000(0000) 
knlGS:0000000000000000
  [1673239.308670] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [1673239.317988] CR2: 00007f9012efcfb8 CR3: 0000007fa1abe000 CR4: 
0000000000340ee0
  [1673239.327377] Call Trace:
  [1673239.337796]  string+0x4a/0x60
  [1673239.347948]  vsnprintf+0x26f/0x4e0
  [1673239.356909]  seq_vprintf+0x35/0x50
  [1673239.365819]  seq_printf+0x53/0x70
  [1673239.374919]  __blkg_prfill_rwstat+0x5d/0xb0
  [1673239.383362]  blkg_prfill_rwstat_field+0x97/0xc0
  [1673239.391580]  blkcg_print_blkgs+0xba/0xf0
  [1673239.399891]  ? blkg_prfill_rwstat+0xc0/0xc0
  [1673239.408266]  blkg_print_stat_bytes+0x45/0x50
  [1673239.416378]  cgroup_seqfile_show+0x56/0xc0
  [1673239.424336]  kernfs_seq_show+0x27/0x30
  [1673239.432208]  seq_read+0xdc/0x490
  [1673239.440065]  kernfs_fop_read+0x35/0x1b0
  [1673239.448107]  __vfs_read+0x1b/0x40
  [1673239.456329]  vfs_read+0xab/0x160
  [1673239.465529]  ksys_read+0x67/0xe0
  [1673239.473872]  __x64_sys_read+0x1a/0x20
  [1673239.481269]  do_syscall_64+0x57/0x190
  [1673239.488798]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [1673239.497445] RIP: 0033:0x4cc910
  [1673239.504694] Code: 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 49 c7 c2 00 
00 00 00 49 c7 c0 00 00 00 00 49 c7 c1 00 00 00 00 48 8b 44 24 08 0f 05 <48> 3d 
01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
  [1673239.519380] RSP: 002b:000000c01dd1e7a0 EFLAGS: 00000202 ORIG_RAX: 
0000000000000000
  [1673239.526527] RAX: ffffffffffffffda RBX: 000000c000046f00 RCX: 
00000000004cc910
  [1673239.534890] RDX: 0000000000001000 RSI: 000000c00c2cd000 RDI: 
000000000000000e
  [1673239.542958] RBP: 000000c01dd1e7f0 R08: 0000000000000000 R09: 
0000000000000000
  [1673239.549915] R10: 0000000000000000 R11: 0000000000000202 R12: 
ffffffffffffffff
  [1673239.558350] R13: 0000000000000002 R14: 0000000000000001 R15: 
0000000000000002
  [1673239.565178] Modules linked in: veth vhost_net nf_conntrack_netlink vhost 
tap dm_queue_length cls_u32 sch_cbq xsk_diag udp_diag raw_diag unix_diag 
af_packet_diag tcp_diag inet_diag netlink_diag ebtable_filter ebtables 
sch_ingress geneve ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout nfnetlink aufs 
overlay rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache bonding 
ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 
xt_tcpudp xt_comment xt_state xt_conntrack iptable_filter bpfilter 
nls_iso8859_1 scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif amd64_edac_mod 
edac_mce_amd dell_smbios kvm_amd dcdbas kvm wmi_bmof dell_wmi_descriptor ccp 
k10temp ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid 
sch_fq_codel tcp_bbr openvswitch nsh nf_conncount nf_nat nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 msr dm_multipath br_netfilter bridge stp llc 
sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 
async_raid6_recov async_memcpy async_pq
  [1673239.565265]  async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
multipath linear mlx5_ib lpfc drm_vram_helper i2c_algo_bit ib_uverbs nvmet_fc 
crct10dif_pclmul ib_core crc32_pclmul ttm ghash_clmulni_intel drm_kms_helper 
aesni_intel nvmet syscopyarea crypto_simd sysfillrect cryptd nvme_fc 
glue_helper sysimgblt nvme_fabrics ahci fb_sys_fops mlx5_core tg3 libahci 
nvme_core pci_hyperv_intf drm tls scsi_transport_fc mlxfw megaraid_sas 
i2c_piix4 wmi
  [1673239.664974] ---[ end trace b20e1996a1c8240d ]---
+ --- 
+ ProblemType: Bug
+ AlsaDevices:
+  total 0
+  crw-rw---- 1 root audio 116,  1 Dec 14 11:20 seq
+  crw-rw---- 1 root audio 116, 33 Dec 14 11:20 timer
+ AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
+ ApportVersion: 2.20.11-0ubuntu27.21
+ Architecture: amd64
+ ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
+ AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
+ CasperMD5CheckResult: pass
+ DistroRelease: Ubuntu 20.04
+ InstallationDate: Installed on 2021-10-13 (62 days ago)
+ InstallationMedia: Ubuntu-Server 20.04.3 LTS "Focal Fossa" - Release amd64 
(20210824)
+ IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
+ MachineType: Dell Inc. PowerEdge R7525
+ Package: linux (not installed)
+ PciMultimedia:
+  
+ ProcFB: 0 EFI VGA
+ ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-89-generic 
root=/dev/mapper/linux-root ro crashkernel=auto nomodeset consoleblank=0 
amd_iommu=on iommu=pt
+ ProcVersionSignature: Ubuntu 5.4.0-89.100-generic 5.4.143
+ RelatedPackageVersions:
+  linux-restricted-modules-5.4.0-89-generic N/A
+  linux-backports-modules-5.4.0-89-generic  N/A
+  linux-firmware                            1.187.19
+ RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
+ Tags:  focal uec-images
+ Uname: Linux 5.4.0-89-generic x86_64
+ UnreportableReason: This report is about a package that is not installed.
+ UpgradeStatus: No upgrade log present (probably fresh install)
+ UserGroups: N/A
+ _MarkForUpload: False
+ dmi.bios.date: 07/06/2021
+ dmi.bios.vendor: Dell Inc.
+ dmi.bios.version: 2.3.6
+ dmi.board.name: 0590KW
+ dmi.board.vendor: Dell Inc.
+ dmi.board.version: A01
+ dmi.chassis.type: 23
+ dmi.chassis.vendor: Dell Inc.
+ dmi.modalias: 
dmi:bvnDellInc.:bvr2.3.6:bd07/06/2021:svnDellInc.:pnPowerEdgeR7525:pvr:rvnDellInc.:rn0590KW:rvrA01:cvnDellInc.:ct23:cvr:
+ dmi.product.family: PowerEdge
+ dmi.product.name: PowerEdge R7525
+ dmi.product.sku: SKU=NotProvided;ModelName=PowerEdge R7525
+ dmi.sys.vendor: Dell Inc.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1954924

Title:
  Kernel 5.4 - general protection fault SMP NOPTI

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1954924/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to