Hi Mike, Thanks for the confirmation on that. Since the last crash I was having them daily until I downgraded back to kernel 4.4 and Xen 4.6 where stability resumed. Zero crashes since 24th December.
@Paul, Wei, Can we get this investigated? It appears that this is a stability blocker for Xen releases on newer kernels. Best regards, Alex -----Original Message----- From: Michael Collins [mailto:m...@ark-net.org] Sent: Friday, 29 December 2017 5:05 AM To: Alex Braunegg; 'Juergen Gross'; xen-devel@lists.xenproject.org Cc: 'Paul Durrant'; 'Wei Liu' Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430! Alex, I saw this same issue when running a kernel 4.13+, switched back to 4.11 and the problem has not resurfaced. I would like to understand the root cause of this issue. Mike On 12/22/2017 3:35 PM, Alex Braunegg wrote: > Hi all, > > Another crash this morning: > > vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3a > ------------[ cut here ]------------ > kernel BUG at drivers/net/xen-netback/netback.c:430! > invalid opcode: 0000 [#1] SMP > Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E) > xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E) > ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE) > icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E) > i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E) > sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E) > libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E) > CPU: 0 PID: 14238 Comm: vif2.0-q0-deall Tainted: P OE > 4.14.6-1.el6.x86_64 #1 > Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013 > task: ffff880059e255c0 task.stack: ffffc90001f64000 > RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] > RSP: e02b:ffffc90001f67c68 EFLAGS: 00010292 > RAX: 0000000000000045 RBX: ffffc90001f55000 RCX: 0000000000000000 > RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38 > RBP: ffffc90001f67e98 R08: 0000000000000372 R09: 0000000000000373 > R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90001f5e730 > R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8 > FS: 00007f92865d29a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffffffffff600400 CR3: 000000006209c000 CR4: 0000000000000660 > Call Trace: > ? _raw_spin_unlock_irqrestore+0x11/0x20 > ? error_exit+0x5/0x20 > ? __update_load_avg_cfs_rq+0x176/0x180 > ? xen_mc_flush+0x87/0x120 > ? xen_load_sp0+0x84/0xa0 > ? __switch_to+0x1c1/0x360 > ? finish_task_switch+0x78/0x240 > ? __schedule+0x192/0x496 > ? _raw_spin_lock_irqsave+0x1a/0x3c > ? _raw_spin_lock_irqsave+0x1a/0x3c > ? _raw_spin_unlock_irqrestore+0x11/0x20 > xenvif_dealloc_kthread+0x68/0xf0 [xen_netback] > ? do_wait_intr+0x80/0x80 > ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback] > kthread+0x106/0x140 > ? kthread_destroy_worker+0x60/0x60 > ret_from_fork+0x25/0x30 > Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 c7 > c6 10 2b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 c9 06 e1 <0f> 0b 0f 0b 48 8b > 53 20 89 c1 48 c7 c6 48 2b 55 a0 31 c0 45 31 > RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: ffffc90001f67c68 > ---[ end trace 130de0b7e39d0eea ]--- > > Best regards, > > Alex > > > > -----Original Message----- > From: Juergen Gross [mailto:jgr...@suse.com] > Sent: Friday, 22 December 2017 5:47 PM > To: Alex Braunegg; xen-devel@lists.xenproject.org > Cc: Wei Liu; Paul Durrant > Subject: Re: [Xen-devel] [BUG] kernel bug encountered at > drivers/net/xen-netback/netback.c:430! > > On 22/12/17 07:40, Alex Braunegg wrote: >> Hi all, >> >> Experienced the same issue again today: > Ccing the maintainers. > > > Juergen > >> ============================================================================ >> ========= >> >> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f >> ------------[ cut here ]------------ >> kernel BUG at drivers/net/xen-netback/netback.c:430! >> invalid opcode: 0000 [#1] SMP >> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E) >> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E) >> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE) >> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E) >> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E) >> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E) >> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E) >> CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P OE >> 4.14.6-1.el6.x86_64 #1 >> Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013 >> task: ffff880062518000 task.stack: ffffc90004f88000 >> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] >> RSP: e02b:ffffc90004f8bc68 EFLAGS: 00010292 >> RAX: 0000000000000045 RBX: ffffc90000fcd000 RCX: 0000000000000000 >> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38 >> RBP: ffffc90004f8be98 R08: 000000000000037d R09: 000000000000037e >> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000fd6730 >> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8 >> FS: 00007f40c63639a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000 >> CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: ffffffffff600400 CR3: 000000006375f000 CR4: 0000000000000660 >> Call Trace: >> ? error_exit+0x5/0x20 >> ? __update_load_avg_cfs_rq+0x176/0x180 >> ? xen_mc_flush+0x87/0x120 >> ? xen_load_sp0+0x84/0xa0 >> ? __switch_to+0x1c1/0x360 >> ? finish_task_switch+0x78/0x240 >> ? __schedule+0x192/0x496 >> ? _raw_spin_lock_irqsave+0x1a/0x3c >> ? _raw_spin_lock_irqsave+0x1a/0x3c >> ? _raw_spin_unlock_irqrestore+0x11/0x20 >> xenvif_dealloc_kthread+0x68/0xf0 [xen_netback] >> ? do_wait_intr+0x80/0x80 >> ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback] >> kthread+0x106/0x140 >> ? kthread_destroy_worker+0x60/0x60 >> ? kthread_destroy_worker+0x60/0x60 >> ret_from_fork+0x25/0x30 >> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 >> c7 c6 10 5b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 99 06 e1 <0f> 0b 0f 0b 48 >> 8b 53 20 89 c1 48 c7 c6 48 5b 55 a0 31 c0 45 31 >> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: >> ffffc90004f8bc68 >> ---[ end trace 010682c76619a1bd ]--- >> >> ============================================================================ >> ========= >> >> Best regards, >> >> Alex >> >> -----Original Message----- >> From: Alex Braunegg [mailto:alex.braun...@gmail.com] >> Sent: Thursday, 21 December 2017 8:04 AM >> To: 'xen-devel@lists.xenproject.org' >> Subject: [BUG] kernel bug encountered at >> drivers/net/xen-netback/netback.c:430! >> >> Hi all, >> >> I experienced the following bug whilst using a Xen VM. What happened was >> that this morning a single Xen VM suddenly terminated without cause with the >> following being logged in dmesg. >> >> Only 1 VM experienced an issue (out of 2 which were running), the other >> remained up and fully functional until I attempted to restart the crashed VM >> which triggered the kernel bug. >> >> Kernel: 4.14.6 >> Xen: 4.8.2 >> >> ============================================================================ >> ========= >> >> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f >> ------------[ cut here ]------------ >> kernel BUG at drivers/net/xen-netback/netback.c:430! >> invalid opcode: 0000 [#1] SMP >> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E) >> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E) >> ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE) >> spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E) >> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E) >> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E) >> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E) >> CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P OE >> 4.14.6-1.el6.x86_64 #1 >> Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013 >> task: ffff8800595cc980 task.stack: ffffc900028e0000 >> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] >> RSP: e02b:ffffc900028e3c68 EFLAGS: 00010292 >> RAX: 0000000000000045 RBX: ffffc90002969000 RCX: 0000000000000000 >> RDX: ffff88007f4146e8 RSI: ffff88007f40db38 RDI: ffff88007f40db38 >> RBP: ffffc900028e3e98 R08: 000000000000037b R09: 000000000000037c >> R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90002972730 >> R13: 0000160000000000 R14: aaaaaaaaaaaaaaab R15: ffffc9000099bbe8 >> FS: 00007fee260ff9a0(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000 >> CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: ffffffffff600400 CR3: 0000000062815000 CR4: 0000000000000660 >> Call Trace: >> ? error_exit+0x5/0x20 >> ? __update_load_avg_cfs_rq+0x176/0x180 >> ? xen_mc_flush+0x87/0x120 >> ? xen_load_sp0+0x84/0xa0 >> ? __switch_to+0x1c1/0x360 >> ? finish_task_switch+0x78/0x240 >> ? __schedule+0x192/0x496 >> ? _raw_spin_lock_irqsave+0x1a/0x3c >> ? _raw_spin_lock_irqsave+0x1a/0x3c >> ? _raw_spin_unlock_irqrestore+0x11/0x20 >> xenvif_dealloc_kthread+0x68/0xf0 [xen_netback] >> ? do_wait_intr+0x80/0x80 >> ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback] >> kthread+0x106/0x140 >> ? kthread_destroy_worker+0x60/0x60 >> ? kthread_destroy_worker+0x60/0x60 >> ret_from_fork+0x25/0x30 >> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 >> c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48 >> 8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31 >> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: >> ffffc900028e3c68 >> ---[ end trace 7d827dae67002ffc ]--- >> >> ============================================================================ >> ========= >> >> The section of relevant kernel code is: >> >> ============================================================================ >> ========= >> >> static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue, >> u16 pending_idx) >> { >> if (unlikely(queue->grant_tx_handle[pending_idx] == >> NETBACK_INVALID_HANDLE)) { >> netdev_err(queue->vif->dev, >> "Trying to unmap invalid handle! pending_idx: >> 0x%x\n", >> pending_idx); >> BUG(); >> } >> queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE; >> } >> >> ============================================================================ >> ========= >> >> In an attempt to recover from this situation I restarted / destroyed (xl >> restart <vmname> / xl destroy <vmname>) the VM to recover it's state and the >> following error messages were logged at the console: >> >> ============================================================================ >> ========= >> >> libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: >> /etc/xen/scripts/block remove [25271] died due to fatal signal Segmentation >> fault >> libxl: error: libxl_device.c:1080:device_backend_callback: unable to remove >> device with path /local/domain/0/backend/vif/2/0 >> libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed >> for 2 >> >> ============================================================================ >> ========= >> >> After which the physical system hung, then the physical system restarted >> with nothing else logged and everything came back OK & operational including >> the VM that crashed. >> >> Further details (xl dmesg, xl info) attached. >> >> Best regards, >> >> Alex Braunegg >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xenproject.org >> https://lists.xenproject.org/mailman/listinfo/xen-devel >> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel