Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2018-01-10 Thread Alex Braunegg

> Actually no need... The underlying issue was really a bug and has
> been fixed in 4.14.11.

Thanks for tracking this down & spending time looking at this Paul.

Best regards,

Alex




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2018-01-10 Thread 'Christoph Moench-Tegeder'
## Paul Durrant (paul.durr...@citrix.com):

> Actually no need... The underlying issue was really a bug and has
> been fixed in 4.14.11.

Oh. That explains why reverting the other patch "fixed" the problem -
I had skipped 4.14.10 and 4.14.11 - and the problem has gone away
independently of that.
Cool, I'll try vanilla 4.14.13 really soon now (once I'm home...)

Thanks for the investigation,
Christoph

-- 
Spare Space.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2018-01-10 Thread Paul Durrant
> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
> Of Paul Durrant
> Sent: 10 January 2018 12:52
> To: 'Christoph Moench-Tegeder' <c...@burggraben.net>
> Cc: 'Michael Collins' <m...@ark-net.org>; 'Juergen Gross'
> <jgr...@suse.com>; Wei Liu <wei.l...@citrix.com>; 'Alex Braunegg'
> <alex.braun...@gmail.com>; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-
> netback/netback.c:430!
> 
> I have tracked down the problem to multiple calls to the zerocopy callback for
> the same ubuf_info. I am not sure exactly which patch introduced the issue
> but my suspicion is that it was one of the the MSG_ZEROCOPY series (see
> https://marc.info/?l=linux-netdev=149807997726733=2).
> I have a candidate patch to netback to make use of the ubuf_info ref count
> to handle the multiple callbacks and that certainly fixes the issue for me. 
> I'll
> post this shortly recommending a backport to stable.
> 

Actually no need... The underlying issue was really a bug and has been fixed in 
4.14.11. See 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.14.13=17155ea827b2fd81330a442ed56d0edafd9969e1

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2018-01-10 Thread Paul Durrant
I have tracked down the problem to multiple calls to the zerocopy callback for 
the same ubuf_info. I am not sure exactly which patch introduced the issue but 
my suspicion is that it was one of the the MSG_ZEROCOPY series (see 
https://marc.info/?l=linux-netdev=149807997726733=2).
I have a candidate patch to netback to make use of the ubuf_info ref count to 
handle the multiple callbacks and that certainly fixes the issue for me. I'll 
post this shortly recommending a backport to stable.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2018-01-09 Thread Paul Durrant
I finally have a reliable repro and and it's trivial...

Just try to copy a large file out of a Windows VM to an SMB share (using PV 
drivers in the VM). Dom0 goes bang pretty much immediately. I get another BUG 
too on another CPU...

[ 1062.422497] [ cut here ]
[ 1062.422510] kernel BUG at drivers/net/xen-netback/netback.c:1225!
[ 1062.422518] invalid opcode:  [#2] SMP
[ 1062.422522] Modules linked in: xt_physdev br_netfilter iptable_filter tun 
nfsv3 nfs_acl rpcsec_gss_krbl
[ 1062.422618]  ahci libahci ehci_pci libata ehci_hcd tg3 megaraid_sas ptp 
usbcore pps_core scsi_mod libpy
[ 1062.422636] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G  D W   
4.14.0-rc5+ #13
[ 1062.422642] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 
10/002/2015
[ 1062.422649] task: 81c10480 task.stack: 81c0
[ 1062.422659] RIP: 1e030:xenvif_zerocopy_callback+0x7e/0xc0 [xen_netback]
[ 1062.422666] RSP: e02b:88200e403d28 EFLAGS: 00010012
[ 1062.422672] RAX: 0240 RBX: c90048a5a260 RCX: 0100
[ 1062.422678] RDX: 0540 RSI: c90048a58420 RDI: 0039
[ 1062.422684] RBP: 88200e403d48 R08:  R09: 
[ 1062.422691] R10: 0040 R11: 881feea1b268 R12: c90048a63810
[ 1062.422697] R13: 0001 R14: c90048a578e0 R15: 882002da4900
[ 1062.422714] FS:  () GS:88200e40() 
knlGS:
[ 1062.422721] CS:  e033 DS:  ES:  CR0: 80050033
[ 1062.422726] CR2: 5615b0a0e000 CR3: 001fc96c4000 CR4: 00042660
[ 1062.422734] Call Trace:
[ 1062.422738]  
[ 1062.422746]  skb_release_data+0xe4/0x110
[ 1062.422753]  skb_release_all+0x24/0x30
[ 1062.422758]  consume_skb+0x2c/0x90
[ 1062.422765]  __dev_kfree_skb_any+0x2f/0x40
[ 1062.422776]  tg3_poll_work+0x265/0xf20 [tg3]
[ 1062.422783]  ? xenvif_tx_action+0x758/0x8e0 [xen_netback]
[ 1062.422791]  ? __enqueue_entity+0x5c/0x60
[ 1062.422797]  ? enqueue_entity+0x113/0x7b0
[ 1062.422806]  ? tg3_msi_1shot+0x52/0x60 [tg3]
[ 1062.422814]  tg3_poll+0x7e/0x420 [tg3]
[ 1062.422821]  net_rx_action+0x268/0x3e0
[ 1062.422829]  __do_softirq+0x104/0x28f
[ 1062.422837]  irq_exit+0xb6/0xc0
[ 1062.422843]  xen_evtchn_do_upcall+0x30/0x40
[ 1062.422850]  xen_do_hypervisor_callback+0x29/0x40
[ 1062.422855]  

So, I can now start to investigate.

Cheers,

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2018-01-08 Thread Paul Durrant
> -Original Message-
> From: 'Christoph Moench-Tegeder' [mailto:c...@burggraben.net]
> Sent: 07 January 2018 22:19
> To: Paul Durrant <paul.durr...@citrix.com>
> Cc: 'Michael Collins' <m...@ark-net.org>; 'Juergen Gross'
> <jgr...@suse.com>; Wei Liu <wei.l...@citrix.com>; 'Alex Braunegg'
> <alex.braun...@gmail.com>; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-
> netback/netback.c:430!
> 
> ## Paul Durrant (paul.durr...@citrix.com):
> 
> > > I could try a new kernel (KPTI, yay!) with that "mildly suspicious" commit
> > > cc8737a5fe9051b7fa052b08c57ddb9f539c389a reverted on the weekend
> and
> > > report back (just to rule that out - like you, I don't really believe
> > > that this is the cause).
> > > For the record, I'm still running 4.13.16 on the Dom0 (that's the last
> > > working Dom0 kernel).
> >
> > Thanks. Well, that's the only netback commit that's in master but not in
> > 4.13.16 so it would be useful to conclusively rule that out as a cause.
> 
> Funny thing: with that commit reverted, I'm running 4.14.12 on my Dom0.
> That's holding much longer than any 4.4 kernel on that host before.
> That's interesing, as the crashing code looks more correct (at least
> for me and some compiler...), and the change is rather small.
> 

Yes, that is very strange. Thanks for the info.

Cheers,

  Paul

> Regards,
> Christoph
> 
> --
> Spare Space
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2018-01-07 Thread 'Christoph Moench-Tegeder'
## Paul Durrant (paul.durr...@citrix.com):

> > I could try a new kernel (KPTI, yay!) with that "mildly suspicious" commit
> > cc8737a5fe9051b7fa052b08c57ddb9f539c389a reverted on the weekend and
> > report back (just to rule that out - like you, I don't really believe
> > that this is the cause).
> > For the record, I'm still running 4.13.16 on the Dom0 (that's the last
> > working Dom0 kernel).
> 
> Thanks. Well, that's the only netback commit that's in master but not in
> 4.13.16 so it would be useful to conclusively rule that out as a cause.

Funny thing: with that commit reverted, I'm running 4.14.12 on my Dom0.
That's holding much longer than any 4.4 kernel on that host before.
That's interesing, as the crashing code looks more correct (at least
for me and some compiler...), and the change is rather small.

Regards,
Christoph

-- 
Spare Space

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2018-01-04 Thread Paul Durrant
> -Original Message-
> From: Christoph Moench-Tegeder [mailto:c...@burggraben.net]
> Sent: 03 January 2018 20:34
> To: Paul Durrant <paul.durr...@citrix.com>
> Cc: 'Alex Braunegg' <alex.braun...@gmail.com>; 'Michael Collins'
> <m...@ark-net.org>; 'Juergen Gross' <jgr...@suse.com>; xen-
> de...@lists.xenproject.org; Wei Liu <wei.l...@citrix.com>
> Subject: Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-
> netback/netback.c:430!
> 
> ## Paul Durrant (paul.durr...@citrix.com):
> 
> > How easy is it to trigger this? I'm assuming, from the original
> > description, that I can probably trigger it by forcibly terminating
> > a running domain and then trying to restart it.
> 
> As Alex said: in the "common cases" (like his and mine) it seems to
> be enough to run a few DomUs and just wait a little (no special
> load required) - with my 10 domains, the bg triggers in a few minutes
> ( https://lists.xenproject.org/archives/html/xen-devel/2017-
> 12/msg01516.html
> is my report of the issue - I didn't spot Alex' report).
> The order of event here is:
> - boot Dom0
> - xl create a few DomUs (all recent Linux, all builder=hvm in my setup,
>   each VM has exactly one virtual network interface, all bridged onto
>   the one ethernet interface on the Dom0 which carries all traffic
>   to the Dom0 and the DomUs)
> - after a few minutes, the Dom0 kernel logs the BUG() in question
> - shortly after (not immediately! - may take even some more minutes)
>   the DomU behind the vif reported in the BUG becomes unresponsive:
>   no network traffic, no reaction on the virtual console, no message
>   in syslog).
> - trying to xl destroy the unresponsive domain (or trying to do a
>   normal shutdown on one of the other domains) results in the corrupted
>   state documented in my earlier report (see link).
> 
> In my case this "cannot" be an issue with an old gcc - Debian 9 ships
> with "gcc (Debian 6.3.0-18) 6.3.0 20170516" (but beware of new bugs,
> who knows?).
> 
> I could try a new kernel (KPTI, yay!) with that "mildly suspicious" commit
> cc8737a5fe9051b7fa052b08c57ddb9f539c389a reverted on the weekend and
> report back (just to rule that out - like you, I don't really believe
> that this is the cause).
> For the record, I'm still running 4.13.16 on the Dom0 (that's the last
> working Dom0 kernel).

Thanks. Well, that's the only netback commit that's in master but not in 
4.13.16 so it would be useful to conclusively rule that out as a cause.

  Cheers,

Paul

> 
> Regards,
> Christoph
> 
> --
> Spare Space
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2018-01-03 Thread Christoph Moench-Tegeder
## Paul Durrant (paul.durr...@citrix.com):

> How easy is it to trigger this? I'm assuming, from the original
> description, that I can probably trigger it by forcibly terminating
> a running domain and then trying to restart it.

As Alex said: in the "common cases" (like his and mine) it seems to
be enough to run a few DomUs and just wait a little (no special
load required) - with my 10 domains, the bg triggers in a few minutes
( https://lists.xenproject.org/archives/html/xen-devel/2017-12/msg01516.html
is my report of the issue - I didn't spot Alex' report).
The order of event here is:
- boot Dom0
- xl create a few DomUs (all recent Linux, all builder=hvm in my setup,
  each VM has exactly one virtual network interface, all bridged onto
  the one ethernet interface on the Dom0 which carries all traffic
  to the Dom0 and the DomUs)
- after a few minutes, the Dom0 kernel logs the BUG() in question
- shortly after (not immediately! - may take even some more minutes)
  the DomU behind the vif reported in the BUG becomes unresponsive:
  no network traffic, no reaction on the virtual console, no message
  in syslog).
- trying to xl destroy the unresponsive domain (or trying to do a
  normal shutdown on one of the other domains) results in the corrupted
  state documented in my earlier report (see link).

In my case this "cannot" be an issue with an old gcc - Debian 9 ships
with "gcc (Debian 6.3.0-18) 6.3.0 20170516" (but beware of new bugs,
who knows?).

I could try a new kernel (KPTI, yay!) with that "mildly suspicious" commit
cc8737a5fe9051b7fa052b08c57ddb9f539c389a reverted on the weekend and
report back (just to rule that out - like you, I don't really believe
that this is the cause).
For the record, I'm still running 4.13.16 on the Dom0 (that's the last
working Dom0 kernel).

Regards,
Christoph

-- 
Spare Space

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2018-01-03 Thread Alex Braunegg
> How easy is it to trigger this? I'm assuming, from the original description, 
> that I can probably trigger it by forcibly terminating a running domain and 
> then trying to restart it.

For me the trigger was just having 2 VM's running and then within 24 hr's one 
would crash with the debug data sent to console / dmesg. I didn’t have to do 
anything special to trigger it - nor did I try / attempt to trigger it.

When attempting to restart the crashed VM (using xl) - that’s when I got the 
additional xl messages & the server rebooted.

> This breaks compilation of xen-netback with older compilers.
>>From kbuild bot with gcc-4.4.7:

My Xen version (and all packages other packages including the kernel) are built 
/ rebuilt using gcc 4.6.2 so I don’t think I am hitting this gcc issue that the 
patch fixed.

Best regards,

Alex




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2017-12-28 Thread Alex Braunegg
Hi Mike,

Thanks for the confirmation on that. Since the last crash I was having them 
daily until I downgraded back to kernel 4.4 and Xen 4.6 where stability 
resumed. Zero crashes since 24th December.

@Paul, Wei,

Can we get this investigated? It appears that this is a stability blocker for 
Xen releases on newer kernels.

Best regards,

Alex

-Original Message-
From: Michael Collins [mailto:m...@ark-net.org] 
Sent: Friday, 29 December 2017 5:05 AM
To: Alex Braunegg; 'Juergen Gross'; xen-devel@lists.xenproject.org
Cc: 'Paul Durrant'; 'Wei Liu'
Subject: Re: [Xen-devel] [BUG] kernel bug encountered at 
drivers/net/xen-netback/netback.c:430!

Alex,

  I saw this same issue when running a kernel 4.13+, switched 
back to 4.11 and the problem has not resurfaced.  I would like to 
understand the root cause of this issue.

Mike


On 12/22/2017 3:35 PM, Alex Braunegg wrote:
> Hi all,
>
> Another crash this morning:
>
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3a
> [ cut here ]
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode:  [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E) 
> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E) 
> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE) 
> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E) 
> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E) 
> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E) 
> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 14238 Comm: vif2.0-q0-deall Tainted: P   OE   
> 4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
> task: 880059e255c0 task.stack: c90001f64000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:c90001f67c68 EFLAGS: 00010292
> RAX: 0045 RBX: c90001f55000 RCX: 
> RDX: 88007f4146e8 RSI: 88007f40db38 RDI: 88007f40db38
> RBP: c90001f67e98 R08: 0372 R09: 0373
> R10: 0001 R11:  R12: c90001f5e730
> R13: 1600 R14: aaab R15: c999bbe8
> FS:  7f92865d29a0() GS:88007f40() knlGS:
> CS:  e033 DS:  ES:  CR0: 80050033
> CR2: ff600400 CR3: 6209c000 CR4: 0660
> Call Trace:
>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>   ? error_exit+0x5/0x20
>   ? __update_load_avg_cfs_rq+0x176/0x180
>   ? xen_mc_flush+0x87/0x120
>   ? xen_load_sp0+0x84/0xa0
>   ? __switch_to+0x1c1/0x360
>   ? finish_task_switch+0x78/0x240
>   ? __schedule+0x192/0x496
>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>   ? _raw_spin_lock_irqsave+0x1a/0x3c
>   ? _raw_spin_unlock_irqrestore+0x11/0x20
>   xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>   ? do_wait_intr+0x80/0x80
>   ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>   kthread+0x106/0x140
>   ? kthread_destroy_worker+0x60/0x60
>   ret_from_fork+0x25/0x30
> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 c7 
> c6 10 2b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 c9 06 e1 <0f> 0b 0f 0b 48 8b 
> 53 20 89 c1 48 c7 c6 48 2b 55 a0 31 c0 45 31
> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: c90001f67c68
> ---[ end trace 130de0b7e39d0eea ]---
>
> Best regards,
>
> Alex
>
>
>
> -Original Message-
> From: Juergen Gross [mailto:jgr...@suse.com]
> Sent: Friday, 22 December 2017 5:47 PM
> To: Alex Braunegg; xen-devel@lists.xenproject.org
> Cc: Wei Liu; Paul Durrant
> Subject: Re: [Xen-devel] [BUG] kernel bug encountered at 
> drivers/net/xen-netback/netback.c:430!
>
> On 22/12/17 07:40, Alex Braunegg wrote:
>> Hi all,
>>
>> Experienced the same issue again today:
> Ccing the maintainers.
>
>
> Juergen
>
>> 
>> =
>>
>> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
>> [ cut here ]
>> kernel BUG at drivers/net/xen-netback/netback.c:430!
>> invalid opcode:  [#1] SMP
>> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
>> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
>> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE)
>> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
>> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
>> sg(E) raid1(E) sd_mod(E

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2017-12-22 Thread Alex Braunegg
Hi all,

Another crash this morning:

vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3a
[ cut here ]
kernel BUG at drivers/net/xen-netback/netback.c:430!
invalid opcode:  [#1] SMP
Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E) xen_netback(E) 
nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E) ipmi_si(E) 
ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE) 
spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E) i2c_piix4(E) 
i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E) sg(E) raid1(E) 
sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E) libahci(E) 
dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
CPU: 0 PID: 14238 Comm: vif2.0-q0-deall Tainted: P   OE   
4.14.6-1.el6.x86_64 #1
Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
task: 880059e255c0 task.stack: c90001f64000
RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
RSP: e02b:c90001f67c68 EFLAGS: 00010292
RAX: 0045 RBX: c90001f55000 RCX: 
RDX: 88007f4146e8 RSI: 88007f40db38 RDI: 88007f40db38
RBP: c90001f67e98 R08: 0372 R09: 0373
R10: 0001 R11:  R12: c90001f5e730
R13: 1600 R14: aaab R15: c999bbe8
FS:  7f92865d29a0() GS:88007f40() knlGS:
CS:  e033 DS:  ES:  CR0: 80050033
CR2: ff600400 CR3: 6209c000 CR4: 0660
Call Trace:
 ? _raw_spin_unlock_irqrestore+0x11/0x20
 ? error_exit+0x5/0x20
 ? __update_load_avg_cfs_rq+0x176/0x180
 ? xen_mc_flush+0x87/0x120
 ? xen_load_sp0+0x84/0xa0
 ? __switch_to+0x1c1/0x360
 ? finish_task_switch+0x78/0x240
 ? __schedule+0x192/0x496
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_unlock_irqrestore+0x11/0x20
 xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
 ? do_wait_intr+0x80/0x80
 ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
 kthread+0x106/0x140
 ? kthread_destroy_worker+0x60/0x60
 ret_from_fork+0x25/0x30
Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48 c7 
c6 10 2b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 c9 06 e1 <0f> 0b 0f 0b 48 8b 53 
20 89 c1 48 c7 c6 48 2b 55 a0 31 c0 45 31 
RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP: c90001f67c68
---[ end trace 130de0b7e39d0eea ]---

Best regards,

Alex



-Original Message-
From: Juergen Gross [mailto:jgr...@suse.com] 
Sent: Friday, 22 December 2017 5:47 PM
To: Alex Braunegg; xen-devel@lists.xenproject.org
Cc: Wei Liu; Paul Durrant
Subject: Re: [Xen-devel] [BUG] kernel bug encountered at 
drivers/net/xen-netback/netback.c:430!

On 22/12/17 07:40, Alex Braunegg wrote:
> Hi all,
> 
> Experienced the same issue again today:

Ccing the maintainers.


Juergen

> 
> 
> =
> 
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
> [ cut here ]
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode:  [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE)
> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P   OE
> 4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
> task: 880062518000 task.stack: c90004f88000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:c90004f8bc68 EFLAGS: 00010292
> RAX: 0045 RBX: c9fcd000 RCX: 
> RDX: 88007f4146e8 RSI: 88007f40db38 RDI: 88007f40db38
> RBP: c90004f8be98 R08: 037d R09: 037e
> R10: 0001 R11:  R12: c9fd6730
> R13: 1600 R14: aaab R15: c999bbe8
> FS:  7f40c63639a0() GS:88007f40() knlGS:
> CS:  e033 DS:  ES:  CR0: 80050033
> CR2: ff600400 CR3: 6375f000 CR4: 0660
> Call Trace:
>  ? error_exit+0x5/0x20
>  ? __update_load_avg_cfs_rq+0x176/0x180
>  ? xen_mc_flush+0x87/0x120
>  ? xen_load_sp0+0x84/0xa0
>  ? __switch_to+0x1c1/0x360
>  ? finish_task_switch+0x78/0x240
>  ? __schedule+0x192/0x496
>  ? _raw_spin_lock_irqsave+0x1a/0x3c

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2017-12-21 Thread Juergen Gross
On 22/12/17 07:40, Alex Braunegg wrote:
> Hi all,
> 
> Experienced the same issue again today:

Ccing the maintainers.


Juergen

> 
> 
> =
> 
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
> [ cut here ]
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode:  [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
> ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE)
> icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P   OE
> 4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
> task: 880062518000 task.stack: c90004f88000
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP: e02b:c90004f8bc68 EFLAGS: 00010292
> RAX: 0045 RBX: c9fcd000 RCX: 
> RDX: 88007f4146e8 RSI: 88007f40db38 RDI: 88007f40db38
> RBP: c90004f8be98 R08: 037d R09: 037e
> R10: 0001 R11:  R12: c9fd6730
> R13: 1600 R14: aaab R15: c999bbe8
> FS:  7f40c63639a0() GS:88007f40() knlGS:
> CS:  e033 DS:  ES:  CR0: 80050033
> CR2: ff600400 CR3: 6375f000 CR4: 0660
> Call Trace:
>  ? error_exit+0x5/0x20
>  ? __update_load_avg_cfs_rq+0x176/0x180
>  ? xen_mc_flush+0x87/0x120
>  ? xen_load_sp0+0x84/0xa0
>  ? __switch_to+0x1c1/0x360
>  ? finish_task_switch+0x78/0x240
>  ? __schedule+0x192/0x496
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_lock_irqsave+0x1a/0x3c
>  ? _raw_spin_unlock_irqrestore+0x11/0x20
>  xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
>  ? do_wait_intr+0x80/0x80
>  ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
>  kthread+0x106/0x140
>  ? kthread_destroy_worker+0x60/0x60
>  ? kthread_destroy_worker+0x60/0x60
>  ret_from_fork+0x25/0x30
> Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
> c7 c6 10 5b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 99 06 e1 <0f> 0b 0f 0b 48
> 8b 53 20 89 c1 48 c7 c6 48 5b 55 a0 31 c0 45 31 
> RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
> c90004f8bc68
> ---[ end trace 010682c76619a1bd ]---
> 
> 
> =====
> 
> Best regards,
> 
> Alex
> 
> -----Original Message-----
> From: Alex Braunegg [mailto:alex.braun...@gmail.com] 
> Sent: Thursday, 21 December 2017 8:04 AM
> To: 'xen-devel@lists.xenproject.org'
> Subject: [BUG] kernel bug encountered at
> drivers/net/xen-netback/netback.c:430!
> 
> Hi all,
> 
> I experienced the following bug whilst using a Xen VM. What happened was
> that this morning a single Xen VM suddenly terminated without cause with the
> following being logged in dmesg. 
> 
> Only 1 VM experienced an issue (out of 2 which were running), the other
> remained up and fully functional until I attempted to restart the crashed VM
> which triggered the kernel bug.
> 
> Kernel:   4.14.6
> Xen:  4.8.2
> 
> 
> =
> 
> vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
> [ cut here ]
> kernel BUG at drivers/net/xen-netback/netback.c:430!
> invalid opcode:  [#1] SMP
> Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
> xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
> ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE)
> spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E)
> i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
> sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
> libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
> CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P   OE
> 4.14.6-1.el6.x86_64 #1
> Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
> task: 8800595cc980 task.stack: c900028e
> RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
> RSP

Re: [Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2017-12-21 Thread Alex Braunegg
Hi all,

Experienced the same issue again today:


=

vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x2f
[ cut here ]
kernel BUG at drivers/net/xen-netback/netback.c:430!
invalid opcode:  [#1] SMP
Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
ipmi_si(E) ipmi_msghandler(E) k10temp(E) zfs(POE) zcommon(POE) znvpair(POE)
icp(POE) spl(OE) zavl(POE) zunicode(POE) tpm_infineon(E) sp5100_tco(E)
i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
CPU: 0 PID: 12636 Comm: vif2.0-q0-deall Tainted: P   OE
4.14.6-1.el6.x86_64 #1
Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
task: 880062518000 task.stack: c90004f88000
RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
RSP: e02b:c90004f8bc68 EFLAGS: 00010292
RAX: 0045 RBX: c9fcd000 RCX: 
RDX: 88007f4146e8 RSI: 88007f40db38 RDI: 88007f40db38
RBP: c90004f8be98 R08: 037d R09: 037e
R10: 0001 R11:  R12: c9fd6730
R13: 1600 R14: aaab R15: c999bbe8
FS:  7f40c63639a0() GS:88007f40() knlGS:
CS:  e033 DS:  ES:  CR0: 80050033
CR2: ff600400 CR3: 6375f000 CR4: 0660
Call Trace:
 ? error_exit+0x5/0x20
 ? __update_load_avg_cfs_rq+0x176/0x180
 ? xen_mc_flush+0x87/0x120
 ? xen_load_sp0+0x84/0xa0
 ? __switch_to+0x1c1/0x360
 ? finish_task_switch+0x78/0x240
 ? __schedule+0x192/0x496
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_unlock_irqrestore+0x11/0x20
 xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
 ? do_wait_intr+0x80/0x80
 ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
 kthread+0x106/0x140
 ? kthread_destroy_worker+0x60/0x60
 ? kthread_destroy_worker+0x60/0x60
 ret_from_fork+0x25/0x30
Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
c7 c6 10 5b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 99 06 e1 <0f> 0b 0f 0b 48
8b 53 20 89 c1 48 c7 c6 48 5b 55 a0 31 c0 45 31 
RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
c90004f8bc68
---[ end trace 010682c76619a1bd ]---


=

Best regards,

Alex

-Original Message-
From: Alex Braunegg [mailto:alex.braun...@gmail.com] 
Sent: Thursday, 21 December 2017 8:04 AM
To: 'xen-devel@lists.xenproject.org'
Subject: [BUG] kernel bug encountered at
drivers/net/xen-netback/netback.c:430!

Hi all,

I experienced the following bug whilst using a Xen VM. What happened was
that this morning a single Xen VM suddenly terminated without cause with the
following being logged in dmesg. 

Only 1 VM experienced an issue (out of 2 which were running), the other
remained up and fully functional until I attempted to restart the crashed VM
which triggered the kernel bug.

Kernel: 4.14.6
Xen:4.8.2


=

vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
[ cut here ]
kernel BUG at drivers/net/xen-netback/netback.c:430!
invalid opcode:  [#1] SMP
Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE)
spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E)
i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P   OE
4.14.6-1.el6.x86_64 #1
Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
task: 8800595cc980 task.stack: c900028e
RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
RSP: e02b:c900028e3c68 EFLAGS: 00010292
RAX: 0045 RBX: c90002969000 RCX: 
RDX: 88007f4146e8 RSI: 88007f40db38 RDI: 88007f40db38
RBP: c900028e3e98 R08: 037b R09: 037c
R10: 0001 R11:  R12: c90002972730
R13: 1600 R14: aaab R15: c999bbe8
FS:  7fee260ff9a0() GS:88007f40() knlGS:
CS:  e033 DS:  ES:  CR0: 80050033
CR2: ff600400 CR3: 62815000 CR4: 0660
Call Trace:
 ? error_exit+0x

[Xen-devel] [BUG] kernel bug encountered at drivers/net/xen-netback/netback.c:430!

2017-12-20 Thread Alex Braunegg
Hi all,

I experienced the following bug whilst using a Xen VM. What happened was
that this morning a single Xen VM suddenly terminated without cause with the
following being logged in dmesg. 

Only 1 VM experienced an issue (out of 2 which were running), the other
remained up and fully functional until I attempted to restart the crashed VM
which triggered the kernel bug.

Kernel: 4.14.6
Xen:4.8.2


=

vif vif-2-0 vif2.0: Trying to unmap invalid handle! pending_idx: 0x3f
[ cut here ]
kernel BUG at drivers/net/xen-netback/netback.c:430!
invalid opcode:  [#1] SMP
Modules linked in: xt_physdev(E) iptable_filter(E) ip_tables(E)
xen_netback(E) nfsd(E) lockd(E) grace(E) nfs_acl(E) auth_rpcgss(E) sunrpc(E)
ipmi_si(E) ipmi_msghandler(E) zfs(POE) zcommon(POE) znvpair(POE) icp(POE)
spl(OE) zavl(POE) zunicode(POE) k10temp(E) tpm_infineon(E) sp5100_tco(E)
i2c_piix4(E) i2c_core(E) ohci_pci(E) ohci_hcd(E) tg3(E) ptp(E) pps_core(E)
sg(E) raid1(E) sd_mod(E) ata_generic(E) pata_acpi(E) pata_atiixp(E) ahci(E)
libahci(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) dax(E)
CPU: 0 PID: 13163 Comm: vif2.0-q0-deall Tainted: P   OE
4.14.6-1.el6.x86_64 #1
Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
task: 8800595cc980 task.stack: c900028e
RIP: e030:xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback]
RSP: e02b:c900028e3c68 EFLAGS: 00010292
RAX: 0045 RBX: c90002969000 RCX: 
RDX: 88007f4146e8 RSI: 88007f40db38 RDI: 88007f40db38
RBP: c900028e3e98 R08: 037b R09: 037c
R10: 0001 R11:  R12: c90002972730
R13: 1600 R14: aaab R15: c999bbe8
FS:  7fee260ff9a0() GS:88007f40() knlGS:
CS:  e033 DS:  ES:  CR0: 80050033
CR2: ff600400 CR3: 62815000 CR4: 0660
Call Trace:
 ? error_exit+0x5/0x20
 ? __update_load_avg_cfs_rq+0x176/0x180
 ? xen_mc_flush+0x87/0x120
 ? xen_load_sp0+0x84/0xa0
 ? __switch_to+0x1c1/0x360
 ? finish_task_switch+0x78/0x240
 ? __schedule+0x192/0x496
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_lock_irqsave+0x1a/0x3c
 ? _raw_spin_unlock_irqrestore+0x11/0x20
 xenvif_dealloc_kthread+0x68/0xf0 [xen_netback]
 ? do_wait_intr+0x80/0x80
 ? xenvif_map_frontend_data_rings+0xe0/0xe0 [xen_netback]
 kthread+0x106/0x140
 ? kthread_destroy_worker+0x60/0x60
 ? kthread_destroy_worker+0x60/0x60
 ret_from_fork+0x25/0x30
Code: 89 df 49 83 c4 02 e8 e5 f5 ff ff 4d 39 ec 75 e8 eb a2 48 8b 43 20 48
c7 c6 10 3b 55 a0 48 8b b8 20 03 00 00 31 c0 e8 85 b9 06 e1 <0f> 0b 0f 0b 48
8b 53 20 89 c1 48 c7 c6 48 3b 55 a0 31 c0 45 31 
RIP: xenvif_tx_dealloc_action+0x1bb/0x230 [xen_netback] RSP:
c900028e3c68
---[ end trace 7d827dae67002ffc ]---


=

The section of relevant kernel code is:


=

static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue,
 u16 pending_idx)
{
if (unlikely(queue->grant_tx_handle[pending_idx] ==
 NETBACK_INVALID_HANDLE)) {
netdev_err(queue->vif->dev,
   "Trying to unmap invalid handle! pending_idx:
0x%x\n",
   pending_idx);
BUG();
}
queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
}


=

In an attempt to recover from this situation I restarted / destroyed (xl
restart  / xl destroy ) the VM to recover it's state and the
following error messages were logged at the console:


=

libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus:
/etc/xen/scripts/block remove [25271] died due to fatal signal Segmentation
fault
libxl: error: libxl_device.c:1080:device_backend_callback: unable to remove
device with path /local/domain/0/backend/vif/2/0
libxl: error: libxl.c:1647:devices_destroy_cb: libxl__devices_destroy failed
for 2


=

After which the physical system hung, then the physical system restarted
with nothing else logged and everything came back OK & operational including
the VM that crashed.

Further details (xl dmesg, xl info) attached.

Best regards,

Alex Braunegg
 Xen 4.8.2
(XEN) Xen version 4.8.2 () (gcc (GCC) 4.6.2 20111027 (Red Hat 
4.6.2-1)) debug=n  Sun Dec 17 14:32:09 EST 2017
(XEN) Latest ChangeSet: 
(XEN) Bootloader: GNU GRUB 0.97
(XEN) Command line: dom0_mem=2048M,max:2048M cpufreq=xen dom0_max_vcpus=1 
dom0_vcpus_pin
(XEN) Video