Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-10 Thread Mike Galbraith
On Tue, 2017-04-11 at 00:23 +0300, Michael S. Tsirkin wrote:
> On Sat, Apr 08, 2017 at 07:01:34AM +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 21:56 +0300, Michael S. Tsirkin wrote:
> > 
> > > OK. test3 and test4 are now pushed: test3 should fix your hang,
> > > test4 is trying to fix a crash reported independently.
> > 
> > test3 does not fix the post hibernate hang business that I can easily
> > reproduce, those are NFS, and at least as old as 4.4.  Host/guest,
> > dunno, put 4.4 on both, guest hangs intermittently.
> 
> OK so IIUC you agree it's a good idea to send test4 to Linus, right?

Well, my box agrees that that is a viable option.

> Hybernation's still broken but that's not a regression.

Yup.

> > [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> > [] rpc_wait_bit_killable+0x1e/0xb0 [sunrpc]
> > [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> > [] autoremove_wake_function+0x50/0x50
> > [] call_decode+0x850/0x850 [sunrpc]
> > [] call_decode+0x850/0x850 [sunrpc]
> > [] __rpc_execute+0x14e/0x440 [sunrpc]
> > [] ktime_get+0x35/0xa0
> > [] rpc_run_task+0x120/0x170 [sunrpc]
> > [] nfs4_call_sync_sequence+0x56/0x80 [nfsv4]
> > [] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4]
> > [] path_lookupat+0xd2/0x100
> > [] nfs4_proc_getattr+0x5c/0xe0 [nfsv4]
> > [] __nfs_revalidate_inode+0xa0/0x300 [nfs]
> > [] nfs_getattr+0x95/0x250 [nfs]
> > [] vfs_statx+0x7b/0xc0
> > [] SYSC_newstat+0x20/0x40
> > [] entry_SYSCALL_64_fastpath+0x1a/0xa9
> > [] 0x
> > 
> > I noted no _other_ misbehavior in either kernel, w/wo threadirqs.
> > 
> > > > -Mike
> 
> Interesting. I would guess virtio net does not complete some
> packets. So you were unable to find an old guest where this
> works fine?

I just tried my opensuse 13.2 clone.  It works markedly less fine,
turns into a brick either on the way down or back up in short order.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-10 Thread Mike Galbraith
On Tue, 2017-04-11 at 00:23 +0300, Michael S. Tsirkin wrote:
> On Sat, Apr 08, 2017 at 07:01:34AM +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 21:56 +0300, Michael S. Tsirkin wrote:
> > 
> > > OK. test3 and test4 are now pushed: test3 should fix your hang,
> > > test4 is trying to fix a crash reported independently.
> > 
> > test3 does not fix the post hibernate hang business that I can easily
> > reproduce, those are NFS, and at least as old as 4.4.  Host/guest,
> > dunno, put 4.4 on both, guest hangs intermittently.
> 
> OK so IIUC you agree it's a good idea to send test4 to Linus, right?

Well, my box agrees that that is a viable option.

> Hybernation's still broken but that's not a regression.

Yup.

> > [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> > [] rpc_wait_bit_killable+0x1e/0xb0 [sunrpc]
> > [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> > [] autoremove_wake_function+0x50/0x50
> > [] call_decode+0x850/0x850 [sunrpc]
> > [] call_decode+0x850/0x850 [sunrpc]
> > [] __rpc_execute+0x14e/0x440 [sunrpc]
> > [] ktime_get+0x35/0xa0
> > [] rpc_run_task+0x120/0x170 [sunrpc]
> > [] nfs4_call_sync_sequence+0x56/0x80 [nfsv4]
> > [] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4]
> > [] path_lookupat+0xd2/0x100
> > [] nfs4_proc_getattr+0x5c/0xe0 [nfsv4]
> > [] __nfs_revalidate_inode+0xa0/0x300 [nfs]
> > [] nfs_getattr+0x95/0x250 [nfs]
> > [] vfs_statx+0x7b/0xc0
> > [] SYSC_newstat+0x20/0x40
> > [] entry_SYSCALL_64_fastpath+0x1a/0xa9
> > [] 0x
> > 
> > I noted no _other_ misbehavior in either kernel, w/wo threadirqs.
> > 
> > > > -Mike
> 
> Interesting. I would guess virtio net does not complete some
> packets. So you were unable to find an old guest where this
> works fine?

I just tried my opensuse 13.2 clone.  It works markedly less fine,
turns into a brick either on the way down or back up in short order.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-10 Thread Michael S. Tsirkin
On Sat, Apr 08, 2017 at 07:01:34AM +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 21:56 +0300, Michael S. Tsirkin wrote:
> 
> > OK. test3 and test4 are now pushed: test3 should fix your hang,
> > test4 is trying to fix a crash reported independently.
> 
> test3 does not fix the post hibernate hang business that I can easily
> reproduce, those are NFS, and at least as old as 4.4.  Host/guest,
> dunno, put 4.4 on both, guest hangs intermittently.

OK so IIUC you agree it's a good idea to send test4 to Linus, right?
Hybernation's still broken but that's not a regression.

> [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> [] rpc_wait_bit_killable+0x1e/0xb0 [sunrpc]
> [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> [] autoremove_wake_function+0x50/0x50
> [] call_decode+0x850/0x850 [sunrpc]
> [] call_decode+0x850/0x850 [sunrpc]
> [] __rpc_execute+0x14e/0x440 [sunrpc]
> [] ktime_get+0x35/0xa0
> [] rpc_run_task+0x120/0x170 [sunrpc]
> [] nfs4_call_sync_sequence+0x56/0x80 [nfsv4]
> [] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4]
> [] path_lookupat+0xd2/0x100
> [] nfs4_proc_getattr+0x5c/0xe0 [nfsv4]
> [] __nfs_revalidate_inode+0xa0/0x300 [nfs]
> [] nfs_getattr+0x95/0x250 [nfs]
> [] vfs_statx+0x7b/0xc0
> [] SYSC_newstat+0x20/0x40
> [] entry_SYSCALL_64_fastpath+0x1a/0xa9
> [] 0x
> 
> I noted no _other_ misbehavior in either kernel, w/wo threadirqs.
> 
>   -Mike

Interesting. I would guess virtio net does not complete some
packets. So you were unable to find an old guest where this
works fine?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-10 Thread Michael S. Tsirkin
On Sat, Apr 08, 2017 at 07:01:34AM +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 21:56 +0300, Michael S. Tsirkin wrote:
> 
> > OK. test3 and test4 are now pushed: test3 should fix your hang,
> > test4 is trying to fix a crash reported independently.
> 
> test3 does not fix the post hibernate hang business that I can easily
> reproduce, those are NFS, and at least as old as 4.4.  Host/guest,
> dunno, put 4.4 on both, guest hangs intermittently.

OK so IIUC you agree it's a good idea to send test4 to Linus, right?
Hybernation's still broken but that's not a regression.

> [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> [] rpc_wait_bit_killable+0x1e/0xb0 [sunrpc]
> [] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> [] autoremove_wake_function+0x50/0x50
> [] call_decode+0x850/0x850 [sunrpc]
> [] call_decode+0x850/0x850 [sunrpc]
> [] __rpc_execute+0x14e/0x440 [sunrpc]
> [] ktime_get+0x35/0xa0
> [] rpc_run_task+0x120/0x170 [sunrpc]
> [] nfs4_call_sync_sequence+0x56/0x80 [nfsv4]
> [] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4]
> [] path_lookupat+0xd2/0x100
> [] nfs4_proc_getattr+0x5c/0xe0 [nfsv4]
> [] __nfs_revalidate_inode+0xa0/0x300 [nfs]
> [] nfs_getattr+0x95/0x250 [nfs]
> [] vfs_statx+0x7b/0xc0
> [] SYSC_newstat+0x20/0x40
> [] entry_SYSCALL_64_fastpath+0x1a/0xa9
> [] 0x
> 
> I noted no _other_ misbehavior in either kernel, w/wo threadirqs.
> 
>   -Mike

Interesting. I would guess virtio net does not complete some
packets. So you were unable to find an old guest where this
works fine?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 21:56 +0300, Michael S. Tsirkin wrote:

> OK. test3 and test4 are now pushed: test3 should fix your hang,
> test4 is trying to fix a crash reported independently.

test3 does not fix the post hibernate hang business that I can easily
reproduce, those are NFS, and at least as old as 4.4.  Host/guest,
dunno, put 4.4 on both, guest hangs intermittently.

[] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[] rpc_wait_bit_killable+0x1e/0xb0 [sunrpc]
[] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[] autoremove_wake_function+0x50/0x50
[] call_decode+0x850/0x850 [sunrpc]
[] call_decode+0x850/0x850 [sunrpc]
[] __rpc_execute+0x14e/0x440 [sunrpc]
[] ktime_get+0x35/0xa0
[] rpc_run_task+0x120/0x170 [sunrpc]
[] nfs4_call_sync_sequence+0x56/0x80 [nfsv4]
[] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4]
[] path_lookupat+0xd2/0x100
[] nfs4_proc_getattr+0x5c/0xe0 [nfsv4]
[] __nfs_revalidate_inode+0xa0/0x300 [nfs]
[] nfs_getattr+0x95/0x250 [nfs]
[] vfs_statx+0x7b/0xc0
[] SYSC_newstat+0x20/0x40
[] entry_SYSCALL_64_fastpath+0x1a/0xa9
[] 0x

I noted no _other_ misbehavior in either kernel, w/wo threadirqs.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 21:56 +0300, Michael S. Tsirkin wrote:

> OK. test3 and test4 are now pushed: test3 should fix your hang,
> test4 is trying to fix a crash reported independently.

test3 does not fix the post hibernate hang business that I can easily
reproduce, those are NFS, and at least as old as 4.4.  Host/guest,
dunno, put 4.4 on both, guest hangs intermittently.

[] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[] rpc_wait_bit_killable+0x1e/0xb0 [sunrpc]
[] __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[] autoremove_wake_function+0x50/0x50
[] call_decode+0x850/0x850 [sunrpc]
[] call_decode+0x850/0x850 [sunrpc]
[] __rpc_execute+0x14e/0x440 [sunrpc]
[] ktime_get+0x35/0xa0
[] rpc_run_task+0x120/0x170 [sunrpc]
[] nfs4_call_sync_sequence+0x56/0x80 [nfsv4]
[] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4]
[] path_lookupat+0xd2/0x100
[] nfs4_proc_getattr+0x5c/0xe0 [nfsv4]
[] __nfs_revalidate_inode+0xa0/0x300 [nfs]
[] nfs_getattr+0x95/0x250 [nfs]
[] vfs_statx+0x7b/0xc0
[] SYSC_newstat+0x20/0x40
[] entry_SYSCALL_64_fastpath+0x1a/0xa9
[] 0x

I noted no _other_ misbehavior in either kernel, w/wo threadirqs.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Michael S. Tsirkin
On Fri, Apr 07, 2017 at 04:29:53PM +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 16:35 +0300, Michael S. Tsirkin wrote:
> 
> > Oh wait, I still put the ctx feature patches in there :(
> > Pls ignore, I'll update when I've fixed it up. Sorry about the noise.
> 
> Both worked fine w/wo threadirqs.
> 
>   -Mike

OK. test3 and test4 are now pushed: test3 should fix your hang,
test4 is trying to fix a crash reported independently.

Will push to linux-next once I hear from you.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Michael S. Tsirkin
On Fri, Apr 07, 2017 at 04:29:53PM +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 16:35 +0300, Michael S. Tsirkin wrote:
> 
> > Oh wait, I still put the ctx feature patches in there :(
> > Pls ignore, I'll update when I've fixed it up. Sorry about the noise.
> 
> Both worked fine w/wo threadirqs.
> 
>   -Mike

OK. test3 and test4 are now pushed: test3 should fix your hang,
test4 is trying to fix a crash reported independently.

Will push to linux-next once I hear from you.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 16:35 +0300, Michael S. Tsirkin wrote:

> Oh wait, I still put the ctx feature patches in there :(
> Pls ignore, I'll update when I've fixed it up. Sorry about the noise.

Both worked fine w/wo threadirqs.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 16:35 +0300, Michael S. Tsirkin wrote:

> Oh wait, I still put the ctx feature patches in there :(
> Pls ignore, I'll update when I've fixed it up. Sorry about the noise.

Both worked fine w/wo threadirqs.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Michael S. Tsirkin
On Fri, Apr 07, 2017 at 04:20:12PM +0300, Michael S. Tsirkin wrote:
> On Fri, Apr 07, 2017 at 09:22:02AM +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote:
> > > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> > > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> > > > 
> > > > > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > > > > 
> > > > > > -Mike
> > > > > 
> > > > > Thanks a lot.
> > > > > OK I pushed out two new tags
> > > > >   test1 with just the cleanup reverts
> > > > >   test2 with a bugfix in this area
> > > > > 
> > > > > 
> > > > > I would very much appreciate your testing report on both -
> > > > > should be ok but better make sure.
> > > > 
> > > > Ok, once it percolates out I'll do that.
> > > 
> > > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.
> > 
> > But test2 works fine w/wo threadirqs.
> 
> Oops. This is what one gets by pushing at 2am. I fixed that one up
> (still didn't even build as I'm in the middle of a conference).
> Also it's actually the reverse test2 is just the revert test1 has
> one more bugfix.
> 
> So I'm inclined to push test2 out to linux-next for now, and will
> add test1 later if it fares well.
> 
> Mike, your testing is very much appreciated!

Oh wait, I still put the ctx feature patches in there :(
Pls ignore, I'll update when I've fixed it up. Sorry about the noise.

> -- 
> MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Michael S. Tsirkin
On Fri, Apr 07, 2017 at 04:20:12PM +0300, Michael S. Tsirkin wrote:
> On Fri, Apr 07, 2017 at 09:22:02AM +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote:
> > > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> > > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> > > > 
> > > > > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > > > > 
> > > > > > -Mike
> > > > > 
> > > > > Thanks a lot.
> > > > > OK I pushed out two new tags
> > > > >   test1 with just the cleanup reverts
> > > > >   test2 with a bugfix in this area
> > > > > 
> > > > > 
> > > > > I would very much appreciate your testing report on both -
> > > > > should be ok but better make sure.
> > > > 
> > > > Ok, once it percolates out I'll do that.
> > > 
> > > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.
> > 
> > But test2 works fine w/wo threadirqs.
> 
> Oops. This is what one gets by pushing at 2am. I fixed that one up
> (still didn't even build as I'm in the middle of a conference).
> Also it's actually the reverse test2 is just the revert test1 has
> one more bugfix.
> 
> So I'm inclined to push test2 out to linux-next for now, and will
> add test1 later if it fares well.
> 
> Mike, your testing is very much appreciated!

Oh wait, I still put the ctx feature patches in there :(
Pls ignore, I'll update when I've fixed it up. Sorry about the noise.

> -- 
> MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Michael S. Tsirkin
On Fri, Apr 07, 2017 at 09:22:02AM +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> > > 
> > > > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > > > 
> > > > >   -Mike
> > > > 
> > > > Thanks a lot.
> > > > OK I pushed out two new tags
> > > > test1 with just the cleanup reverts
> > > > test2 with a bugfix in this area
> > > > 
> > > > 
> > > > I would very much appreciate your testing report on both -
> > > > should be ok but better make sure.
> > > 
> > > Ok, once it percolates out I'll do that.
> > 
> > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.
> 
> But test2 works fine w/wo threadirqs.

Oops. This is what one gets by pushing at 2am. I fixed that one up
(still didn't even build as I'm in the middle of a conference).
Also it's actually the reverse test2 is just the revert test1 has
one more bugfix.

So I'm inclined to push test2 out to linux-next for now, and will
add test1 later if it fares well.

Mike, your testing is very much appreciated!

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Michael S. Tsirkin
On Fri, Apr 07, 2017 at 09:22:02AM +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> > > 
> > > > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > > > 
> > > > >   -Mike
> > > > 
> > > > Thanks a lot.
> > > > OK I pushed out two new tags
> > > > test1 with just the cleanup reverts
> > > > test2 with a bugfix in this area
> > > > 
> > > > 
> > > > I would very much appreciate your testing report on both -
> > > > should be ok but better make sure.
> > > 
> > > Ok, once it percolates out I'll do that.
> > 
> > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.
> 
> But test2 works fine w/wo threadirqs.

Oops. This is what one gets by pushing at 2am. I fixed that one up
(still didn't even build as I'm in the middle of a conference).
Also it's actually the reverse test2 is just the revert test1 has
one more bugfix.

So I'm inclined to push test2 out to linux-next for now, and will
add test1 later if it fares well.

Mike, your testing is very much appreciated!

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 09:22 +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> > > 
> > > > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > > > 
> > > > >   -Mike
> > > > 
> > > > Thanks a lot.
> > > > OK I pushed out two new tags
> > > > test1 with just the cleanup reverts
> > > > test2 with a bugfix in this area
> > > > 
> > > > 
> > > > I would very much appreciate your testing report on both -
> > > > should be ok but better make sure.
> > > 
> > > Ok, once it percolates out I'll do that.
> > 
> > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.
> 
> But test2 works fine w/wo threadirqs.

(CONFIG_DEBUG_SHIRQ=y as well btw)


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 09:22 +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> > > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> > > 
> > > > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > > > 
> > > > >   -Mike
> > > > 
> > > > Thanks a lot.
> > > > OK I pushed out two new tags
> > > > test1 with just the cleanup reverts
> > > > test2 with a bugfix in this area
> > > > 
> > > > 
> > > > I would very much appreciate your testing report on both -
> > > > should be ok but better make sure.
> > > 
> > > Ok, once it percolates out I'll do that.
> > 
> > for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.
> 
> But test2 works fine w/wo threadirqs.

(CONFIG_DEBUG_SHIRQ=y as well btw)


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> > 
> > > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > > 
> > > > -Mike
> > > 
> > > Thanks a lot.
> > > OK I pushed out two new tags
> > >   test1 with just the cleanup reverts
> > >   test2 with a bugfix in this area
> > > 
> > > 
> > > I would very much appreciate your testing report on both -
> > > should be ok but better make sure.
> > 
> > Ok, once it percolates out I'll do that.
> 
> for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.

But test2 works fine w/wo threadirqs.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 09:05 +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> > On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> > 
> > > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > > 
> > > > -Mike
> > > 
> > > Thanks a lot.
> > > OK I pushed out two new tags
> > >   test1 with just the cleanup reverts
> > >   test2 with a bugfix in this area
> > > 
> > > 
> > > I would very much appreciate your testing report on both -
> > > should be ok but better make sure.
> > 
> > Ok, once it percolates out I'll do that.
> 
> for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.

But test2 works fine w/wo threadirqs.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> 
> > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > 
> > >   -Mike
> > 
> > Thanks a lot.
> > OK I pushed out two new tags
> > test1 with just the cleanup reverts
> > test2 with a bugfix in this area
> > 
> > 
> > I would very much appreciate your testing report on both -
> > should be ok but better make sure.
> 
> Ok, once it percolates out I'll do that.

for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 08:44 +0200, Mike Galbraith wrote:
> On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> > On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> 
> > > Test tag works fine here w/wo threadirqs, RT works as well.
> > > 
> > >   -Mike
> > 
> > Thanks a lot.
> > OK I pushed out two new tags
> > test1 with just the cleanup reverts
> > test2 with a bugfix in this area
> > 
> > 
> > I would very much appreciate your testing report on both -
> > should be ok but better make sure.
> 
> Ok, once it percolates out I'll do that.

for_linus-10-g960bef2a6172 contains a -ENOBUILD merge conflict.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:

> > Test tag works fine here w/wo threadirqs, RT works as well.
> > 
> > -Mike
> 
> Thanks a lot.
> OK I pushed out two new tags
>   test1 with just the cleanup reverts
>   test2 with a bugfix in this area
> 
> 
> I would very much appreciate your testing report on both -
> should be ok but better make sure.

Ok, once it percolates out I'll do that.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Fri, 2017-04-07 at 09:24 +0300, Michael S. Tsirkin wrote:
> On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:

> > Test tag works fine here w/wo threadirqs, RT works as well.
> > 
> > -Mike
> 
> Thanks a lot.
> OK I pushed out two new tags
>   test1 with just the cleanup reverts
>   test2 with a bugfix in this area
> 
> 
> I would very much appreciate your testing report on both -
> should be ok but better make sure.

Ok, once it percolates out I'll do that.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Michael S. Tsirkin
On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> On Thu, 2017-04-06 at 00:38 +0300, Michael S. Tsirkin wrote:
> 
> > What I did is a revert the refactorings while keeping the affinity API -
> > we can safely postpone them until the next release without loss of
> > functionality. But that's on top of my testing tree so it has unrelated
> > stuff as well. I'm rather confident they aren't fixing the issues but
> > I'll prepare a bugfix-only tree now for testing.
> 
> Test tag works fine here w/wo threadirqs, RT works as well.
> 
>   -Mike

Thanks a lot.
OK I pushed out two new tags
test1 with just the cleanup reverts
test2 with a bugfix in this area


I would very much appreciate your testing report on both -
should be ok but better make sure.
Unfortunately it's past 2am here so I don't have the time to
test - and I'm at a conference so not a lot of time during
the day either.

Christoph, I still think your cleanups were a good idea,
but we need get this release into a stable shape ASAP.
Let's try again for the next release, OK?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Michael S. Tsirkin
On Fri, Apr 07, 2017 at 08:03:19AM +0200, Mike Galbraith wrote:
> On Thu, 2017-04-06 at 00:38 +0300, Michael S. Tsirkin wrote:
> 
> > What I did is a revert the refactorings while keeping the affinity API -
> > we can safely postpone them until the next release without loss of
> > functionality. But that's on top of my testing tree so it has unrelated
> > stuff as well. I'm rather confident they aren't fixing the issues but
> > I'll prepare a bugfix-only tree now for testing.
> 
> Test tag works fine here w/wo threadirqs, RT works as well.
> 
>   -Mike

Thanks a lot.
OK I pushed out two new tags
test1 with just the cleanup reverts
test2 with a bugfix in this area


I would very much appreciate your testing report on both -
should be ok but better make sure.
Unfortunately it's past 2am here so I don't have the time to
test - and I'm at a conference so not a lot of time during
the day either.

Christoph, I still think your cleanups were a good idea,
but we need get this release into a stable shape ASAP.
Let's try again for the next release, OK?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Thu, 2017-04-06 at 00:38 +0300, Michael S. Tsirkin wrote:

> What I did is a revert the refactorings while keeping the affinity API -
> we can safely postpone them until the next release without loss of
> functionality. But that's on top of my testing tree so it has unrelated
> stuff as well. I'm rather confident they aren't fixing the issues but
> I'll prepare a bugfix-only tree now for testing.

Test tag works fine here w/wo threadirqs, RT works as well.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-07 Thread Mike Galbraith
On Thu, 2017-04-06 at 00:38 +0300, Michael S. Tsirkin wrote:

> What I did is a revert the refactorings while keeping the affinity API -
> we can safely postpone them until the next release without loss of
> functionality. But that's on top of my testing tree so it has unrelated
> stuff as well. I'm rather confident they aren't fixing the issues but
> I'll prepare a bugfix-only tree now for testing.

Test tag works fine here w/wo threadirqs, RT works as well.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-05 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 08:29:34AM +0200, Christoph Hellwig wrote:
> On Wed, Apr 05, 2017 at 06:24:50AM +0200, Mike Galbraith wrote:
> > On Wed, 2017-04-05 at 06:51 +0300, Michael S. Tsirkin wrote:
> > 
> > > Any issues at all left with this tree?
> > > In particular any regressions?
> > 
> > Nothing blatantly obvious in a testdrive that lasted a couple minutes. 
> >  I'd have to beat on it a bit to look for things beyond the reported,
> > but can't afford to do that right now.
> 
> Can you check where the issues appear?  I'd like to do a pure revert
> of the shared interrupts, but that three has a lot more in it..

What I did is a revert the refactorings while keeping the affinity API -
we can safely postpone them until the next release without loss of
functionality. But that's on top of my testing tree so it has unrelated
stuff as well. I'm rather confident they aren't fixing the issues but
I'll prepare a bugfix-only tree now for testing.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-05 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 08:29:34AM +0200, Christoph Hellwig wrote:
> On Wed, Apr 05, 2017 at 06:24:50AM +0200, Mike Galbraith wrote:
> > On Wed, 2017-04-05 at 06:51 +0300, Michael S. Tsirkin wrote:
> > 
> > > Any issues at all left with this tree?
> > > In particular any regressions?
> > 
> > Nothing blatantly obvious in a testdrive that lasted a couple minutes. 
> >  I'd have to beat on it a bit to look for things beyond the reported,
> > but can't afford to do that right now.
> 
> Can you check where the issues appear?  I'd like to do a pure revert
> of the shared interrupts, but that three has a lot more in it..

What I did is a revert the refactorings while keeping the affinity API -
we can safely postpone them until the next release without loss of
functionality. But that's on top of my testing tree so it has unrelated
stuff as well. I'm rather confident they aren't fixing the issues but
I'll prepare a bugfix-only tree now for testing.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-05 Thread Mike Galbraith
On Wed, 2017-04-05 at 08:29 +0200, Christoph Hellwig wrote:

> Can you check where the issues appear?  I'd like to do a pure revert
> of the shared interrupts, but that three has a lot more in it..

Not immediately, one of my several pots is emitting black smoke.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-05 Thread Mike Galbraith
On Wed, 2017-04-05 at 08:29 +0200, Christoph Hellwig wrote:

> Can you check where the issues appear?  I'd like to do a pure revert
> of the shared interrupts, but that three has a lot more in it..

Not immediately, one of my several pots is emitting black smoke.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-05 Thread Christoph Hellwig
On Mon, Apr 03, 2017 at 07:14:22PM +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 03, 2017 at 04:18:23PM +0200, Christoph Hellwig wrote:
> > Mike,
> > 
> > can you try the patch below?
> > 
> > ---
> > >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001
> > From: Christoph Hellwig 
> > Date: Mon, 3 Apr 2017 14:51:35 +0200
> > Subject: virtio_pci: don't use shared irq for virtqueues
> > 
> > Reimplement the shared irq feature manually, as we might have a larger
> > number of virtqueues than the core shared interrupt code can handle
> > in threaded interrupt mode.
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  drivers/virtio/virtio_pci_common.c | 142 
> > +
> >  drivers/virtio/virtio_pci_common.h |   1 +
> >  2 files changed, 83 insertions(+), 60 deletions(-)
> 
> Well the original patch this is trying to fix is
> 07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507 which dropped just 40 lines
> with documentation. It did this by re-using error handling to switch
> from per-vq to non-per-vq mode. Now this has separate flows for errors
> and per-vq non-per-vq switch and (I think, as a result) is adding 140
> lines which doesn't make me very happy.

The above adds 23 lines.  We could entangle both loops again, but I'm
not sure it's going to buy us much.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-05 Thread Christoph Hellwig
On Mon, Apr 03, 2017 at 07:14:22PM +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 03, 2017 at 04:18:23PM +0200, Christoph Hellwig wrote:
> > Mike,
> > 
> > can you try the patch below?
> > 
> > ---
> > >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001
> > From: Christoph Hellwig 
> > Date: Mon, 3 Apr 2017 14:51:35 +0200
> > Subject: virtio_pci: don't use shared irq for virtqueues
> > 
> > Reimplement the shared irq feature manually, as we might have a larger
> > number of virtqueues than the core shared interrupt code can handle
> > in threaded interrupt mode.
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  drivers/virtio/virtio_pci_common.c | 142 
> > +
> >  drivers/virtio/virtio_pci_common.h |   1 +
> >  2 files changed, 83 insertions(+), 60 deletions(-)
> 
> Well the original patch this is trying to fix is
> 07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507 which dropped just 40 lines
> with documentation. It did this by re-using error handling to switch
> from per-vq to non-per-vq mode. Now this has separate flows for errors
> and per-vq non-per-vq switch and (I think, as a result) is adding 140
> lines which doesn't make me very happy.

The above adds 23 lines.  We could entangle both loops again, but I'm
not sure it's going to buy us much.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-05 Thread Christoph Hellwig
On Wed, Apr 05, 2017 at 06:24:50AM +0200, Mike Galbraith wrote:
> On Wed, 2017-04-05 at 06:51 +0300, Michael S. Tsirkin wrote:
> 
> > Any issues at all left with this tree?
> > In particular any regressions?
> 
> Nothing blatantly obvious in a testdrive that lasted a couple minutes. 
>  I'd have to beat on it a bit to look for things beyond the reported,
> but can't afford to do that right now.

Can you check where the issues appear?  I'd like to do a pure revert
of the shared interrupts, but that three has a lot more in it..


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-05 Thread Christoph Hellwig
On Wed, Apr 05, 2017 at 06:24:50AM +0200, Mike Galbraith wrote:
> On Wed, 2017-04-05 at 06:51 +0300, Michael S. Tsirkin wrote:
> 
> > Any issues at all left with this tree?
> > In particular any regressions?
> 
> Nothing blatantly obvious in a testdrive that lasted a couple minutes. 
>  I'd have to beat on it a bit to look for things beyond the reported,
> but can't afford to do that right now.

Can you check where the issues appear?  I'd like to do a pure revert
of the shared interrupts, but that three has a lot more in it..


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 06:51 +0300, Michael S. Tsirkin wrote:

> Any issues at all left with this tree?
> In particular any regressions?

Nothing blatantly obvious in a testdrive that lasted a couple minutes. 
 I'd have to beat on it a bit to look for things beyond the reported,
but can't afford to do that right now.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 06:51 +0300, Michael S. Tsirkin wrote:

> Any issues at all left with this tree?
> In particular any regressions?

Nothing blatantly obvious in a testdrive that lasted a couple minutes. 
 I'd have to beat on it a bit to look for things beyond the reported,
but can't afford to do that right now.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 05:24:30AM +0200, Mike Galbraith wrote:
> On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote:
> > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > since I couldn't reproduce, I decided it's worth trying to see
> > > > what happens if we revert back to before 5c34d002dcc7.
> > > > 
> > > > 
> > > > Could you please test a tag "test" in my tree above?
> > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > 
> > > Nogo.
> > > 
> > > git@homer:..git/vhost> git remote update
> > > Fetching origin
> > > git@homer:..git/vhost> git show
> > > 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > 
> > Maybe because it's a tag not a head. Pls try
> > git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > refs/tags/test
> 
> That worked.  Checked out/building.

Thanks a lot for the testing.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 05:24:30AM +0200, Mike Galbraith wrote:
> On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote:
> > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > since I couldn't reproduce, I decided it's worth trying to see
> > > > what happens if we revert back to before 5c34d002dcc7.
> > > > 
> > > > 
> > > > Could you please test a tag "test" in my tree above?
> > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > 
> > > Nogo.
> > > 
> > > git@homer:..git/vhost> git remote update
> > > Fetching origin
> > > git@homer:..git/vhost> git show
> > > 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > 
> > Maybe because it's a tag not a head. Pls try
> > git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > refs/tags/test
> 
> That worked.  Checked out/building.

Thanks a lot for the testing.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 05:40:06AM +0200, Mike Galbraith wrote:
> On Wed, 2017-04-05 at 05:24 +0200, Mike Galbraith wrote:
> > On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote:
> > > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> > > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> > > > 
> > > > > since I couldn't reproduce, I decided it's worth trying to see
> > > > > what happens if we revert back to before 5c34d002dcc7.
> > > > > 
> > > > > 
> > > > > Could you please test a tag "test" in my tree above?
> > > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > > 
> > > > Nogo.
> > > > 
> > > > git@homer:..git/vhost> git remote update
> > > > Fetching origin
> > > > git@homer:..git/vhost> git show
> > > > 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > 
> > > Maybe because it's a tag not a head. Pls try
> > > git fetch
> > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > > refs/tags/test
> > 
> > That worked.  Checked out/building.
> 
> vbox hibernated gripe free, w/wo threadirqs.
> 
>   -Mike

Any issues at all left with this tree?
In particular any regressions?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 05:40:06AM +0200, Mike Galbraith wrote:
> On Wed, 2017-04-05 at 05:24 +0200, Mike Galbraith wrote:
> > On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote:
> > > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> > > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> > > > 
> > > > > since I couldn't reproduce, I decided it's worth trying to see
> > > > > what happens if we revert back to before 5c34d002dcc7.
> > > > > 
> > > > > 
> > > > > Could you please test a tag "test" in my tree above?
> > > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > > 
> > > > Nogo.
> > > > 
> > > > git@homer:..git/vhost> git remote update
> > > > Fetching origin
> > > > git@homer:..git/vhost> git show
> > > > 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > 
> > > Maybe because it's a tag not a head. Pls try
> > > git fetch
> > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > > refs/tags/test
> > 
> > That worked.  Checked out/building.
> 
> vbox hibernated gripe free, w/wo threadirqs.
> 
>   -Mike

Any issues at all left with this tree?
In particular any regressions?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 05:24 +0200, Mike Galbraith wrote:
> On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote:
> > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > since I couldn't reproduce, I decided it's worth trying to see
> > > > what happens if we revert back to before 5c34d002dcc7.
> > > > 
> > > > 
> > > > Could you please test a tag "test" in my tree above?
> > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > 
> > > Nogo.
> > > 
> > > git@homer:..git/vhost> git remote update
> > > Fetching origin
> > > git@homer:..git/vhost> git show
> > > 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > 
> > Maybe because it's a tag not a head. Pls try
> > git fetch
> > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > refs/tags/test
> 
> That worked.  Checked out/building.

vbox hibernated gripe free, w/wo threadirqs.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 05:24 +0200, Mike Galbraith wrote:
> On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote:
> > On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> > > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> > > 
> > > > since I couldn't reproduce, I decided it's worth trying to see
> > > > what happens if we revert back to before 5c34d002dcc7.
> > > > 
> > > > 
> > > > Could you please test a tag "test" in my tree above?
> > > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > 
> > > Nogo.
> > > 
> > > git@homer:..git/vhost> git remote update
> > > Fetching origin
> > > git@homer:..git/vhost> git show
> > > 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > 
> > Maybe because it's a tag not a head. Pls try
> > git fetch
> > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> > refs/tags/test
> 
> That worked.  Checked out/building.

vbox hibernated gripe free, w/wo threadirqs.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> > 
> > > since I couldn't reproduce, I decided it's worth trying to see
> > > what happens if we revert back to before 5c34d002dcc7.
> > > 
> > > 
> > > Could you please test a tag "test" in my tree above?
> > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > 
> > Nogo.
> > 
> > git@homer:..git/vhost> git remote update
> > Fetching origin
> > git@homer:..git/vhost> git show
> > 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
> 
> Maybe because it's a tag not a head. Pls try
> git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> refs/tags/test

That worked.  Checked out/building.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 06:13 +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> > On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> > 
> > > since I couldn't reproduce, I decided it's worth trying to see
> > > what happens if we revert back to before 5c34d002dcc7.
> > > 
> > > 
> > > Could you please test a tag "test" in my tree above?
> > > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > 
> > Nogo.
> > 
> > git@homer:..git/vhost> git remote update
> > Fetching origin
> > git@homer:..git/vhost> git show
> > 6d88af1bf359417eb821370294ba489bdf7f5ab8
> > fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8
> 
> Maybe because it's a tag not a head. Pls try
> git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
> refs/tags/test

That worked.  Checked out/building.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> 
> > since I couldn't reproduce, I decided it's worth trying to see
> > what happens if we revert back to before 5c34d002dcc7.
> > 
> > 
> > Could you please test a tag "test" in my tree above?
> > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> 
> Nogo.
> 
> git@homer:..git/vhost> git remote update
> Fetching origin
> git@homer:..git/vhost> git show 6d88af1bf359417eb821370294ba489bdf7f5ab8
> fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8

Maybe because it's a tag not a head. Pls try
git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
refs/tags/test

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Wed, Apr 05, 2017 at 05:09:09AM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:
> 
> > since I couldn't reproduce, I decided it's worth trying to see
> > what happens if we revert back to before 5c34d002dcc7.
> > 
> > 
> > Could you please test a tag "test" in my tree above?
> > It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8
> 
> Nogo.
> 
> git@homer:..git/vhost> git remote update
> Fetching origin
> git@homer:..git/vhost> git show 6d88af1bf359417eb821370294ba489bdf7f5ab8
> fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8

Maybe because it's a tag not a head. Pls try
git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
refs/tags/test

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:

> since I couldn't reproduce, I decided it's worth trying to see
> what happens if we revert back to before 5c34d002dcc7.
> 
> 
> Could you please test a tag "test" in my tree above?
> It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8

Nogo.

git@homer:..git/vhost> git remote update
Fetching origin
git@homer:..git/vhost> git show 6d88af1bf359417eb821370294ba489bdf7f5ab8
fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 22:03 +0300, Michael S. Tsirkin wrote:

> since I couldn't reproduce, I decided it's worth trying to see
> what happens if we revert back to before 5c34d002dcc7.
> 
> 
> Could you please test a tag "test" in my tree above?
> It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8

Nogo.

git@homer:..git/vhost> git remote update
Fetching origin
git@homer:..git/vhost> git show 6d88af1bf359417eb821370294ba489bdf7f5ab8
fatal: bad object 6d88af1bf359417eb821370294ba489bdf7f5ab8


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 00:31 +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 04, 2017 at 08:38:35PM +0200, Mike Galbraith wrote:
> > On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote:
> > 
> > > And just making double sure, the 1st version that has the issue
> > > is 5c34d002dcc7, isn't it? I'm asking because subject says so
> > > but then goes on to list subject from another commit.
> > > This one is:
> > >   > virtio_pci: remove struct virtio_pci_vq_info
> > 
> > When the hibernation related warnings started I don't know, I
> > wasn't
> > targeting that, those fell out of subsequent testing.
> > I started out
> > hunting console breakage point w. threaded irqs, which is
> > 5c34d002dcc7.
> 
> OK but 5c34d002dcc7 isn't "virtio_pci: use shared
> interrupts for virtqueues".

Heh, wrong sha.. $subject does however correctly identify in quotes the
origin of the threaded irq woes.
 
> I'm confused at this point. I would appreciate the summary of
> which versions were tested and what did you see. Testing
> a revert might also help.

I already tested full revert.  I went looking for what busted kvm for
RT kernels, extracted the virtio series and quilt bisected that to use
shared interrupts.  I was going to just use my little turn off
multiport hacklet to put spinning kworker on the back burner until the
dust settled, but noticed that there was more going on, and none of it
is RT specific (thus freeing up a back burner).

>From there, it's all test what you/Christoph post, as you post it, in
virgin source.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Wed, 2017-04-05 at 00:31 +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 04, 2017 at 08:38:35PM +0200, Mike Galbraith wrote:
> > On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote:
> > 
> > > And just making double sure, the 1st version that has the issue
> > > is 5c34d002dcc7, isn't it? I'm asking because subject says so
> > > but then goes on to list subject from another commit.
> > > This one is:
> > >   > virtio_pci: remove struct virtio_pci_vq_info
> > 
> > When the hibernation related warnings started I don't know, I
> > wasn't
> > targeting that, those fell out of subsequent testing.
> > I started out
> > hunting console breakage point w. threaded irqs, which is
> > 5c34d002dcc7.
> 
> OK but 5c34d002dcc7 isn't "virtio_pci: use shared
> interrupts for virtqueues".

Heh, wrong sha.. $subject does however correctly identify in quotes the
origin of the threaded irq woes.
 
> I'm confused at this point. I would appreciate the summary of
> which versions were tested and what did you see. Testing
> a revert might also help.

I already tested full revert.  I went looking for what busted kvm for
RT kernels, extracted the virtio series and quilt bisected that to use
shared interrupts.  I was going to just use my little turn off
multiport hacklet to put spinning kworker on the back burner until the
dust settled, but noticed that there was more going on, and none of it
is RT specific (thus freeing up a back burner).

>From there, it's all test what you/Christoph post, as you post it, in
virgin source.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 08:38:35PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote:
> 
> > And just making double sure, the 1st version that has the issue
> > is 5c34d002dcc7, isn't it? I'm asking because subject says so
> > but then goes on to list subject from another commit.
> > This one is:
> > > virtio_pci: remove struct virtio_pci_vq_info
> 
> When the hibernation related warnings started I don't know, I wasn't
> targeting that, those fell out of subsequent testing.
> I started out
> hunting console breakage point w. threaded irqs, which is 5c34d002dcc7.

OK but 5c34d002dcc7 isn't "virtio_pci: use shared
interrupts for virtqueues".

> 
>   -Mike

I'm confused at this point. I would appreciate the summary of
which versions were tested and what did you see. Testing
a revert might also help.

Thanks a lot for your testing!

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 08:38:35PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote:
> 
> > And just making double sure, the 1st version that has the issue
> > is 5c34d002dcc7, isn't it? I'm asking because subject says so
> > but then goes on to list subject from another commit.
> > This one is:
> > > virtio_pci: remove struct virtio_pci_vq_info
> 
> When the hibernation related warnings started I don't know, I wasn't
> targeting that, those fell out of subsequent testing.
> I started out
> hunting console breakage point w. threaded irqs, which is 5c34d002dcc7.

OK but 5c34d002dcc7 isn't "virtio_pci: use shared
interrupts for virtqueues".

> 
>   -Mike

I'm confused at this point. I would appreciate the summary of
which versions were tested and what did you see. Testing
a revert might also help.

Thanks a lot for your testing!

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 07:54:36PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote:
> > On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote:
> > 
> > > I couldn't reproduce it - let's make sure we are using the
> > > same tree. Could you pls try
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux
> > > -next 
> > > 
> > > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44
> > 
> > Things that make ya go hmm...
> 
> Making double sure we're on the same page...
> 
> git@homer:..git/vhost> git branch
> * linux-next
>   master
> git@homer:..git/vhost> git describe
> warning: tag 'for_linus' is really 'tags_for_linus' here
> for_linus-220128-gcc79d42a7d7e
> git@homer:..git/vhost> git status
> On branch linux-next
> Your branch is up-to-date with 'origin/linux-next'.
> Changes not staged for commit:
>   (use "git add ..." to update what will be committed)
>   (use "git checkout -- ..." to discard changes in working directory)
> 
> modified:   Makefile
> modified:   scripts/setlocalversion
> 
> no changes added to commit (use "git add" and/or "git commit -a")
> git@homer:..git/vhost>
> 
> Modifications are me whacking '+' sign and -rc5.. I don't do those.


since I couldn't reproduce, I decided it's worth trying to see
what happens if we revert back to before 5c34d002dcc7.


Could you please test a tag "test" in my tree above?
It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8


That has reverts for code refactorings since 5c34d002dcc7
inclusive. If this finally works, maybe you could
go back and see which of the reverts helps?

The idea is that this only has refactorings nicely isolated,
if all else fails we can even do the reverts without losing
functionality.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 07:54:36PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote:
> > On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote:
> > 
> > > I couldn't reproduce it - let's make sure we are using the
> > > same tree. Could you pls try
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux
> > > -next 
> > > 
> > > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44
> > 
> > Things that make ya go hmm...
> 
> Making double sure we're on the same page...
> 
> git@homer:..git/vhost> git branch
> * linux-next
>   master
> git@homer:..git/vhost> git describe
> warning: tag 'for_linus' is really 'tags_for_linus' here
> for_linus-220128-gcc79d42a7d7e
> git@homer:..git/vhost> git status
> On branch linux-next
> Your branch is up-to-date with 'origin/linux-next'.
> Changes not staged for commit:
>   (use "git add ..." to update what will be committed)
>   (use "git checkout -- ..." to discard changes in working directory)
> 
> modified:   Makefile
> modified:   scripts/setlocalversion
> 
> no changes added to commit (use "git add" and/or "git commit -a")
> git@homer:..git/vhost>
> 
> Modifications are me whacking '+' sign and -rc5.. I don't do those.


since I couldn't reproduce, I decided it's worth trying to see
what happens if we revert back to before 5c34d002dcc7.


Could you please test a tag "test" in my tree above?
It should point at 6d88af1bf359417eb821370294ba489bdf7f5ab8


That has reverts for code refactorings since 5c34d002dcc7
inclusive. If this finally works, maybe you could
go back and see which of the reverts helps?

The idea is that this only has refactorings nicely isolated,
if all else fails we can even do the reverts without losing
functionality.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote:

> And just making double sure, the 1st version that has the issue
> is 5c34d002dcc7, isn't it? I'm asking because subject says so
> but then goes on to list subject from another commit.
> This one is:
>   > virtio_pci: remove struct virtio_pci_vq_info

When the hibernation related warnings started I don't know, I wasn't
targeting that, those fell out of subsequent testing.  I started out
hunting console breakage point w. threaded irqs, which is 5c34d002dcc7.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 21:00 +0300, Michael S. Tsirkin wrote:

> And just making double sure, the 1st version that has the issue
> is 5c34d002dcc7, isn't it? I'm asking because subject says so
> but then goes on to list subject from another commit.
> This one is:
>   > virtio_pci: remove struct virtio_pci_vq_info

When the hibernation related warnings started I don't know, I wasn't
targeting that, those fell out of subsequent testing.  I started out
hunting console breakage point w. threaded irqs, which is 5c34d002dcc7.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 07:54:36PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote:
> > On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote:
> > 
> > > I couldn't reproduce it - let's make sure we are using the
> > > same tree. Could you pls try
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux
> > > -next 
> > > 
> > > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44
> > 
> > Things that make ya go hmm...
> 
> Making double sure we're on the same page...
> 
> git@homer:..git/vhost> git branch
> * linux-next
>   master
> git@homer:..git/vhost> git describe
> warning: tag 'for_linus' is really 'tags_for_linus' here
> for_linus-220128-gcc79d42a7d7e
> git@homer:..git/vhost> git status
> On branch linux-next
> Your branch is up-to-date with 'origin/linux-next'.
> Changes not staged for commit:
>   (use "git add ..." to update what will be committed)
>   (use "git checkout -- ..." to discard changes in working directory)
> 
> modified:   Makefile
> modified:   scripts/setlocalversion
> 
> no changes added to commit (use "git add" and/or "git commit -a")
> git@homer:..git/vhost>
> 
> Modifications are me whacking '+' sign and -rc5.. I don't do those.

And just making double sure, the 1st version that has the issue
is 5c34d002dcc7, isn't it? I'm asking because subject says so
but then goes on to list subject from another commit.
This one is:
virtio_pci: remove struct virtio_pci_vq_info

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 07:54:36PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote:
> > On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote:
> > 
> > > I couldn't reproduce it - let's make sure we are using the
> > > same tree. Could you pls try
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux
> > > -next 
> > > 
> > > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44
> > 
> > Things that make ya go hmm...
> 
> Making double sure we're on the same page...
> 
> git@homer:..git/vhost> git branch
> * linux-next
>   master
> git@homer:..git/vhost> git describe
> warning: tag 'for_linus' is really 'tags_for_linus' here
> for_linus-220128-gcc79d42a7d7e
> git@homer:..git/vhost> git status
> On branch linux-next
> Your branch is up-to-date with 'origin/linux-next'.
> Changes not staged for commit:
>   (use "git add ..." to update what will be committed)
>   (use "git checkout -- ..." to discard changes in working directory)
> 
> modified:   Makefile
> modified:   scripts/setlocalversion
> 
> no changes added to commit (use "git add" and/or "git commit -a")
> git@homer:..git/vhost>
> 
> Modifications are me whacking '+' sign and -rc5.. I don't do those.

And just making double sure, the 1st version that has the issue
is 5c34d002dcc7, isn't it? I'm asking because subject says so
but then goes on to list subject from another commit.
This one is:
virtio_pci: remove struct virtio_pci_vq_info

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote:
> 
> > I couldn't reproduce it - let's make sure we are using the
> > same tree. Could you pls try
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux
> > -next 
> > 
> > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44
> 
> Things that make ya go hmm...

Making double sure we're on the same page...

git@homer:..git/vhost> git branch
* linux-next
  master
git@homer:..git/vhost> git describe
warning: tag 'for_linus' is really 'tags_for_linus' here
for_linus-220128-gcc79d42a7d7e
git@homer:..git/vhost> git status
On branch linux-next
Your branch is up-to-date with 'origin/linux-next'.
Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

modified:   Makefile
modified:   scripts/setlocalversion

no changes added to commit (use "git add" and/or "git commit -a")
git@homer:..git/vhost>

Modifications are me whacking '+' sign and -rc5.. I don't do those.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 19:40 +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote:
> 
> > I couldn't reproduce it - let's make sure we are using the
> > same tree. Could you pls try
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux
> > -next 
> > 
> > It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44
> 
> Things that make ya go hmm...

Making double sure we're on the same page...

git@homer:..git/vhost> git branch
* linux-next
  master
git@homer:..git/vhost> git describe
warning: tag 'for_linus' is really 'tags_for_linus' here
for_linus-220128-gcc79d42a7d7e
git@homer:..git/vhost> git status
On branch linux-next
Your branch is up-to-date with 'origin/linux-next'.
Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

modified:   Makefile
modified:   scripts/setlocalversion

no changes added to commit (use "git add" and/or "git commit -a")
git@homer:..git/vhost>

Modifications are me whacking '+' sign and -rc5.. I don't do those.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote:

> I couldn't reproduce it - let's make sure we are using the
> same tree. Could you pls try
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next 
> 
> It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44

Things that make ya go hmm...

[   87.940161] [ cut here ]
[   87.940180] WARNING: CPU: 0 PID: 97 at drivers/pci/msi.c:1251 
pci_irq_vector+0xcb/0xe0
[   87.940181] Modules linked in: dm_mod(E) fuse(E) ebtable_filter(E) 
ebtables(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) 
nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) 
xt_limit(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) 
xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) 
ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
snd_hda_intel(E) snd_hda_codec(E) joydev(E) snd_hda_core(E) snd_hwdep(E) 
snd_pcm(E) snd_timer(E) snd(E) 8139too(E) ppdev(E) soundcore(E) parport_pc(E) 
i2c_piix4(E)
[   87.940206]  parport(E) virtio_balloon(E) crct10dif_pclmul(E) 
crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) serio_raw(E) 
acpi_cpufreq(E) pcbc(E) button(E) aesni_intel(E) pcspkr(E) aes_x86_64(E) 
crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) 
lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) jbd2(E) mbcache(E) hid_generic(E) 
usbhid(E) ata_generic(E) ata_piix(E) sr_mod(E) cdrom(E) virtio_blk(E) 
virtio_rng(E) virtio_console(E) qxl(E) drm_kms_helper(E) syscopyarea(E) 
sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ehci_pci(E) ttm(E) uhci_hcd(E) 
ehci_hcd(E) floppy(E) ahci(E) libahci(E) virtio_pci(E) drm(E) virtio_ring(E) 
virtio(E) usbcore(E) libata(E) 8139cp(E) mii(E) sg(E) scsi_mod(E) autofs4(E)
[   87.940233] CPU: 0 PID: 97 Comm: kworker/u16:1 Tainted: GE   
4.11.0-default #1
[   87.940234] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[   87.940240] Workqueue: events_unbound async_run_entry_fn
[   87.940241] Call Trace:
[   87.940246]  ? dump_stack+0x5c/0x85
[   87.940255]  ? __warn+0xc4/0xe0
[   87.940258]  ? pci_pm_poweroff+0xf0/0xf0
[   87.940269]  ? pci_irq_vector+0xcb/0xe0
[   87.940272]  ? vp_synchronize_vectors+0x3e/0x50 [virtio_pci]
[   87.940275]  ? virtcons_freeze+0x1a/0xd0 [virtio_console]
[   87.940276]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
[   87.940277]  ? pci_pm_freeze+0x59/0xe0
[   87.940281]  ? dpm_run_callback+0x4d/0x170
[   87.940283]  ? __device_suspend+0x11f/0x3b0
[   87.940283]  ? pm_dev_dbg+0x70/0x70
[   87.940284]  ? async_suspend+0x1a/0x90
[   87.940286]  ? async_run_entry_fn+0x34/0x160
[   87.940287]  ? process_one_work+0x164/0x430
[   87.940288]  ? worker_thread+0x135/0x4d0
[   87.940290]  ? kthread+0xff/0x140
[   87.940291]  ? rescuer_thread+0x3c0/0x3c0
[   87.940292]  ? kthread_park+0x80/0x80
[   87.940293]  ? kthread_park+0x80/0x80
[   87.940299]  ? ret_from_fork+0x26/0x40
[   87.940300] ---[ end trace 5d65fe0efc4b61d7 ]---


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 18:30 +0300, Michael S. Tsirkin wrote:

> I couldn't reproduce it - let's make sure we are using the
> same tree. Could you pls try
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next 
> 
> It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44

Things that make ya go hmm...

[   87.940161] [ cut here ]
[   87.940180] WARNING: CPU: 0 PID: 97 at drivers/pci/msi.c:1251 
pci_irq_vector+0xcb/0xe0
[   87.940181] Modules linked in: dm_mod(E) fuse(E) ebtable_filter(E) 
ebtables(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) 
nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) 
xt_limit(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) 
xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) 
ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
snd_hda_intel(E) snd_hda_codec(E) joydev(E) snd_hda_core(E) snd_hwdep(E) 
snd_pcm(E) snd_timer(E) snd(E) 8139too(E) ppdev(E) soundcore(E) parport_pc(E) 
i2c_piix4(E)
[   87.940206]  parport(E) virtio_balloon(E) crct10dif_pclmul(E) 
crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) serio_raw(E) 
acpi_cpufreq(E) pcbc(E) button(E) aesni_intel(E) pcspkr(E) aes_x86_64(E) 
crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) 
lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) jbd2(E) mbcache(E) hid_generic(E) 
usbhid(E) ata_generic(E) ata_piix(E) sr_mod(E) cdrom(E) virtio_blk(E) 
virtio_rng(E) virtio_console(E) qxl(E) drm_kms_helper(E) syscopyarea(E) 
sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ehci_pci(E) ttm(E) uhci_hcd(E) 
ehci_hcd(E) floppy(E) ahci(E) libahci(E) virtio_pci(E) drm(E) virtio_ring(E) 
virtio(E) usbcore(E) libata(E) 8139cp(E) mii(E) sg(E) scsi_mod(E) autofs4(E)
[   87.940233] CPU: 0 PID: 97 Comm: kworker/u16:1 Tainted: GE   
4.11.0-default #1
[   87.940234] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[   87.940240] Workqueue: events_unbound async_run_entry_fn
[   87.940241] Call Trace:
[   87.940246]  ? dump_stack+0x5c/0x85
[   87.940255]  ? __warn+0xc4/0xe0
[   87.940258]  ? pci_pm_poweroff+0xf0/0xf0
[   87.940269]  ? pci_irq_vector+0xcb/0xe0
[   87.940272]  ? vp_synchronize_vectors+0x3e/0x50 [virtio_pci]
[   87.940275]  ? virtcons_freeze+0x1a/0xd0 [virtio_console]
[   87.940276]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
[   87.940277]  ? pci_pm_freeze+0x59/0xe0
[   87.940281]  ? dpm_run_callback+0x4d/0x170
[   87.940283]  ? __device_suspend+0x11f/0x3b0
[   87.940283]  ? pm_dev_dbg+0x70/0x70
[   87.940284]  ? async_suspend+0x1a/0x90
[   87.940286]  ? async_run_entry_fn+0x34/0x160
[   87.940287]  ? process_one_work+0x164/0x430
[   87.940288]  ? worker_thread+0x135/0x4d0
[   87.940290]  ? kthread+0xff/0x140
[   87.940291]  ? rescuer_thread+0x3c0/0x3c0
[   87.940292]  ? kthread_park+0x80/0x80
[   87.940293]  ? kthread_park+0x80/0x80
[   87.940299]  ? ret_from_fork+0x26/0x40
[   87.940300] ---[ end trace 5d65fe0efc4b61d7 ]---


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 04:18:02PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> > > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > > > Mike,
> > > > > > 
> > > > > > can you try the patch below?
> > > > > 
> > > > > No more spinning kworker woes, but I still have a warning on
> > > > > hibernate,
> > > > > threadirqs invariant.  I'm also seeing intermittent post
> > > > > hibernate hang
> > > > > funnies in virgin source +- this patch, and without threadirqs.
> > > > > 
> > > > > [  110.223953] WARNING: CPU: 5 PID: 452 at
> > > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0
> > > > > 
> > > > >   > > -Mike
> > > > 
> > > > I just sent a patch fixing that.
> > > > However I think we want to print a message when MSI fails to work
> > > > so we
> > > > know guest is falling back on legacy interrupts.
> > > 
> > > The warning persists.
> > > 
> > > [  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261
> > > pci_irq_vector+0xb1/0xe0
> > 
> > Can you post the rest of the backtrace? Is it still in the console?
> 
> This is from a dump of post hibernate loop dying vbox I captured and
> squirreled away, so pid is different.  I'm not absolutely certain that
> I didn't have my local patch set re-applied when I did this, so I'll
> rebuild in the a.m..  My stuff is unrelated, so this should be fine.
> 
> [  328.475988] [ cut here ]
> [  328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 
> pci_irq_vector+0xb1/0xe0
> [  328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
> nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) 
> xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) 
> af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) 
> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) 
> iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
> nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
> nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
> ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
> snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) 
> snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) 
> virtio_balloon(E) crct10dif_pclmul(E)
> [  328.476019]  crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) 
> acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) 
> serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) 
> auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) 
> crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) 
> ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) 
> qxl(E) drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) 
> sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) 
> crc32c_intel(E) ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) 
> usbcore(E) floppy(E) mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
> [  328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: GE   
> 4.11.0-default #20
> [  328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
> [  328.476041] Workqueue: events_unbound async_run_entry_fn
> [  328.476042] Call Trace:
> [  328.476056]  ? dump_stack+0x5c/0x85
> [  328.476058]  ? __warn+0xc4/0xe0
> [  328.476060]  ? pci_pm_poweroff+0xf0/0xf0
> [  328.476062]  ? pci_irq_vector+0xb1/0xe0
> [  328.476064]  ? vp_del_vqs+0xcb/0x120 [virtio_pci]
> [  328.476066]  ? remove_common+0x60/0x80 [virtio_rng]
> [  328.476067]  ? virtrng_freeze+0xa/0x10 [virtio_rng]
> [  328.476068]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
> [  328.476069]  ? pci_pm_freeze+0x59/0xe0
> [  328.476070]  ? dpm_run_callback+0x4d/0x170
> [  328.476071]  ? __device_suspend+0x11f/0x3b0
> [  328.476072]  ? pm_dev_dbg+0x70/0x70
> [  328.476072]  ? async_suspend+0x1a/0x90
> [  328.476082]  ? async_run_entry_fn+0x34/0x160
> [  328.476083]  ? process_one_work+0x164/0x430
> [  328.476084]  ? worker_thread+0x135/0x4d0
> [  328.476085]  ? kthread+0xff/0x140
> [  328.476086]  ? rescuer_thread+0x3c0/0x3c0
> [  328.476087]  ? kthread_park+0x80/0x80
> [  328.476088]  ? do_group_exit+0x39/0xa0
> [  328.476090]  ? ret_from_fork+0x26/0x40
> [  328.476091] ---[ end trace a045c2118936902f ]---


I couldn't reproduce it - let's make sure we are using the
same tree. Could you pls try

git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next 

It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44
-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 04:18:02PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> > > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > > > Mike,
> > > > > > 
> > > > > > can you try the patch below?
> > > > > 
> > > > > No more spinning kworker woes, but I still have a warning on
> > > > > hibernate,
> > > > > threadirqs invariant.  I'm also seeing intermittent post
> > > > > hibernate hang
> > > > > funnies in virgin source +- this patch, and without threadirqs.
> > > > > 
> > > > > [  110.223953] WARNING: CPU: 5 PID: 452 at
> > > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0
> > > > > 
> > > > >   > > -Mike
> > > > 
> > > > I just sent a patch fixing that.
> > > > However I think we want to print a message when MSI fails to work
> > > > so we
> > > > know guest is falling back on legacy interrupts.
> > > 
> > > The warning persists.
> > > 
> > > [  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261
> > > pci_irq_vector+0xb1/0xe0
> > 
> > Can you post the rest of the backtrace? Is it still in the console?
> 
> This is from a dump of post hibernate loop dying vbox I captured and
> squirreled away, so pid is different.  I'm not absolutely certain that
> I didn't have my local patch set re-applied when I did this, so I'll
> rebuild in the a.m..  My stuff is unrelated, so this should be fine.
> 
> [  328.475988] [ cut here ]
> [  328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 
> pci_irq_vector+0xb1/0xe0
> [  328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
> nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) 
> xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) 
> af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) 
> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) 
> iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
> nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
> nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
> ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
> snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) 
> snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) 
> virtio_balloon(E) crct10dif_pclmul(E)
> [  328.476019]  crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) 
> acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) 
> serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) 
> auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) 
> crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) 
> ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) 
> qxl(E) drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) 
> sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) 
> crc32c_intel(E) ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) 
> usbcore(E) floppy(E) mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
> [  328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: GE   
> 4.11.0-default #20
> [  328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
> [  328.476041] Workqueue: events_unbound async_run_entry_fn
> [  328.476042] Call Trace:
> [  328.476056]  ? dump_stack+0x5c/0x85
> [  328.476058]  ? __warn+0xc4/0xe0
> [  328.476060]  ? pci_pm_poweroff+0xf0/0xf0
> [  328.476062]  ? pci_irq_vector+0xb1/0xe0
> [  328.476064]  ? vp_del_vqs+0xcb/0x120 [virtio_pci]
> [  328.476066]  ? remove_common+0x60/0x80 [virtio_rng]
> [  328.476067]  ? virtrng_freeze+0xa/0x10 [virtio_rng]
> [  328.476068]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
> [  328.476069]  ? pci_pm_freeze+0x59/0xe0
> [  328.476070]  ? dpm_run_callback+0x4d/0x170
> [  328.476071]  ? __device_suspend+0x11f/0x3b0
> [  328.476072]  ? pm_dev_dbg+0x70/0x70
> [  328.476072]  ? async_suspend+0x1a/0x90
> [  328.476082]  ? async_run_entry_fn+0x34/0x160
> [  328.476083]  ? process_one_work+0x164/0x430
> [  328.476084]  ? worker_thread+0x135/0x4d0
> [  328.476085]  ? kthread+0xff/0x140
> [  328.476086]  ? rescuer_thread+0x3c0/0x3c0
> [  328.476087]  ? kthread_park+0x80/0x80
> [  328.476088]  ? do_group_exit+0x39/0xa0
> [  328.476090]  ? ret_from_fork+0x26/0x40
> [  328.476091] ---[ end trace a045c2118936902f ]---


I couldn't reproduce it - let's make sure we are using the
same tree. Could you pls try

git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next 

It's currently at cc79d42a7d7e57ff64f406a1fd3740afebac0b44
-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 04:18:02PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> > > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > > > Mike,
> > > > > > 
> > > > > > can you try the patch below?
> > > > > 
> > > > > No more spinning kworker woes, but I still have a warning on
> > > > > hibernate,
> > > > > threadirqs invariant.  I'm also seeing intermittent post
> > > > > hibernate hang
> > > > > funnies in virgin source +- this patch, and without threadirqs.
> > > > > 
> > > > > [  110.223953] WARNING: CPU: 5 PID: 452 at
> > > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0
> > > > > 
> > > > >   > > -Mike
> > > > 
> > > > I just sent a patch fixing that.
> > > > However I think we want to print a message when MSI fails to work
> > > > so we
> > > > know guest is falling back on legacy interrupts.
> > > 
> > > The warning persists.
> > > 
> > > [  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261
> > > pci_irq_vector+0xb1/0xe0
> > 
> > Can you post the rest of the backtrace? Is it still in the console?
> 
> This is from a dump of post hibernate loop dying vbox I captured and
> squirreled away, so pid is different.  I'm not absolutely certain that
> I didn't have my local patch set re-applied when I did this, so I'll
> rebuild in the a.m..  My stuff is unrelated, so this should be fine.
> 
> [  328.475988] [ cut here ]
> [  328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 
> pci_irq_vector+0xb1/0xe0
> [  328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
> nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) 
> xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) 
> af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) 
> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) 
> iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
> nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
> nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
> ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
> snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) 
> snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) 
> virtio_balloon(E) crct10dif_pclmul(E)
> [  328.476019]  crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) 
> acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) 
> serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) 
> auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) 
> crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) 
> ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) 
> qxl(E) drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) 
> sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) 
> crc32c_intel(E) ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) 
> usbcore(E) floppy(E) mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
> [  328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: GE   
> 4.11.0-default #20
> [  328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
> [  328.476041] Workqueue: events_unbound async_run_entry_fn
> [  328.476042] Call Trace:
> [  328.476056]  ? dump_stack+0x5c/0x85
> [  328.476058]  ? __warn+0xc4/0xe0
> [  328.476060]  ? pci_pm_poweroff+0xf0/0xf0
> [  328.476062]  ? pci_irq_vector+0xb1/0xe0
> [  328.476064]  ? vp_del_vqs+0xcb/0x120 [virtio_pci]
> [  328.476066]  ? remove_common+0x60/0x80 [virtio_rng]
> [  328.476067]  ? virtrng_freeze+0xa/0x10 [virtio_rng]
> [  328.476068]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
> [  328.476069]  ? pci_pm_freeze+0x59/0xe0
> [  328.476070]  ? dpm_run_callback+0x4d/0x170
> [  328.476071]  ? __device_suspend+0x11f/0x3b0
> [  328.476072]  ? pm_dev_dbg+0x70/0x70
> [  328.476072]  ? async_suspend+0x1a/0x90
> [  328.476082]  ? async_run_entry_fn+0x34/0x160
> [  328.476083]  ? process_one_work+0x164/0x430
> [  328.476084]  ? worker_thread+0x135/0x4d0
> [  328.476085]  ? kthread+0xff/0x140
> [  328.476086]  ? rescuer_thread+0x3c0/0x3c0
> [  328.476087]  ? kthread_park+0x80/0x80
> [  328.476088]  ? do_group_exit+0x39/0xa0
> [  328.476090]  ? ret_from_fork+0x26/0x40
> [  328.476091] ---[ end trace a045c2118936902f ]---

Interesting, it's rng this time. I'll try that.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 04:18:02PM +0200, Mike Galbraith wrote:
> On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> > > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > > > Mike,
> > > > > > 
> > > > > > can you try the patch below?
> > > > > 
> > > > > No more spinning kworker woes, but I still have a warning on
> > > > > hibernate,
> > > > > threadirqs invariant.  I'm also seeing intermittent post
> > > > > hibernate hang
> > > > > funnies in virgin source +- this patch, and without threadirqs.
> > > > > 
> > > > > [  110.223953] WARNING: CPU: 5 PID: 452 at
> > > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0
> > > > > 
> > > > >   > > -Mike
> > > > 
> > > > I just sent a patch fixing that.
> > > > However I think we want to print a message when MSI fails to work
> > > > so we
> > > > know guest is falling back on legacy interrupts.
> > > 
> > > The warning persists.
> > > 
> > > [  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261
> > > pci_irq_vector+0xb1/0xe0
> > 
> > Can you post the rest of the backtrace? Is it still in the console?
> 
> This is from a dump of post hibernate loop dying vbox I captured and
> squirreled away, so pid is different.  I'm not absolutely certain that
> I didn't have my local patch set re-applied when I did this, so I'll
> rebuild in the a.m..  My stuff is unrelated, so this should be fine.
> 
> [  328.475988] [ cut here ]
> [  328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 
> pci_irq_vector+0xb1/0xe0
> [  328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
> nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) 
> xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) 
> af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) 
> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) 
> iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
> nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
> nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
> ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
> snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) 
> snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) 
> virtio_balloon(E) crct10dif_pclmul(E)
> [  328.476019]  crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) 
> acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) 
> serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) 
> auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) 
> crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) 
> ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) 
> qxl(E) drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) 
> sysfillrect(E) sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) 
> crc32c_intel(E) ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) 
> usbcore(E) floppy(E) mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
> [  328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: GE   
> 4.11.0-default #20
> [  328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
> [  328.476041] Workqueue: events_unbound async_run_entry_fn
> [  328.476042] Call Trace:
> [  328.476056]  ? dump_stack+0x5c/0x85
> [  328.476058]  ? __warn+0xc4/0xe0
> [  328.476060]  ? pci_pm_poweroff+0xf0/0xf0
> [  328.476062]  ? pci_irq_vector+0xb1/0xe0
> [  328.476064]  ? vp_del_vqs+0xcb/0x120 [virtio_pci]
> [  328.476066]  ? remove_common+0x60/0x80 [virtio_rng]
> [  328.476067]  ? virtrng_freeze+0xa/0x10 [virtio_rng]
> [  328.476068]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
> [  328.476069]  ? pci_pm_freeze+0x59/0xe0
> [  328.476070]  ? dpm_run_callback+0x4d/0x170
> [  328.476071]  ? __device_suspend+0x11f/0x3b0
> [  328.476072]  ? pm_dev_dbg+0x70/0x70
> [  328.476072]  ? async_suspend+0x1a/0x90
> [  328.476082]  ? async_run_entry_fn+0x34/0x160
> [  328.476083]  ? process_one_work+0x164/0x430
> [  328.476084]  ? worker_thread+0x135/0x4d0
> [  328.476085]  ? kthread+0xff/0x140
> [  328.476086]  ? rescuer_thread+0x3c0/0x3c0
> [  328.476087]  ? kthread_park+0x80/0x80
> [  328.476088]  ? do_group_exit+0x39/0xa0
> [  328.476090]  ? ret_from_fork+0x26/0x40
> [  328.476091] ---[ end trace a045c2118936902f ]---

Interesting, it's rng this time. I'll try that.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > > Mike,
> > > > > 
> > > > > can you try the patch below?
> > > > 
> > > > No more spinning kworker woes, but I still have a warning on
> > > > hibernate,
> > > > threadirqs invariant.  I'm also seeing intermittent post
> > > > hibernate hang
> > > > funnies in virgin source +- this patch, and without threadirqs.
> > > > 
> > > > [  110.223953] WARNING: CPU: 5 PID: 452 at
> > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0
> > > > 
> > > > > > -Mike
> > > 
> > > I just sent a patch fixing that.
> > > However I think we want to print a message when MSI fails to work
> > > so we
> > > know guest is falling back on legacy interrupts.
> > 
> > The warning persists.
> > 
> > [  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261
> > pci_irq_vector+0xb1/0xe0
> 
> Can you post the rest of the backtrace? Is it still in the console?

This is from a dump of post hibernate loop dying vbox I captured and
squirreled away, so pid is different.  I'm not absolutely certain that
I didn't have my local patch set re-applied when I did this, so I'll
rebuild in the a.m..  My stuff is unrelated, so this should be fine.

[  328.475988] [ cut here ]
[  328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 
pci_irq_vector+0xb1/0xe0
[  328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) 
xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) 
af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) 
nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) 
iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) 
snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) 
virtio_balloon(E) crct10dif_pclmul(E)
[  328.476019]  crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) 
acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) 
serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) 
auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) 
crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) 
ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) qxl(E) 
drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) sysfillrect(E) 
sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) crc32c_intel(E) 
ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) usbcore(E) floppy(E) 
mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
[  328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: GE   
4.11.0-default #20
[  328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[  328.476041] Workqueue: events_unbound async_run_entry_fn
[  328.476042] Call Trace:
[  328.476056]  ? dump_stack+0x5c/0x85
[  328.476058]  ? __warn+0xc4/0xe0
[  328.476060]  ? pci_pm_poweroff+0xf0/0xf0
[  328.476062]  ? pci_irq_vector+0xb1/0xe0
[  328.476064]  ? vp_del_vqs+0xcb/0x120 [virtio_pci]
[  328.476066]  ? remove_common+0x60/0x80 [virtio_rng]
[  328.476067]  ? virtrng_freeze+0xa/0x10 [virtio_rng]
[  328.476068]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
[  328.476069]  ? pci_pm_freeze+0x59/0xe0
[  328.476070]  ? dpm_run_callback+0x4d/0x170
[  328.476071]  ? __device_suspend+0x11f/0x3b0
[  328.476072]  ? pm_dev_dbg+0x70/0x70
[  328.476072]  ? async_suspend+0x1a/0x90
[  328.476082]  ? async_run_entry_fn+0x34/0x160
[  328.476083]  ? process_one_work+0x164/0x430
[  328.476084]  ? worker_thread+0x135/0x4d0
[  328.476085]  ? kthread+0xff/0x140
[  328.476086]  ? rescuer_thread+0x3c0/0x3c0
[  328.476087]  ? kthread_park+0x80/0x80
[  328.476088]  ? do_group_exit+0x39/0xa0
[  328.476090]  ? ret_from_fork+0x26/0x40
[  328.476091] ---[ end trace a045c2118936902f ]---


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Mike Galbraith
On Tue, 2017-04-04 at 16:38 +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> > On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > > Mike,
> > > > > 
> > > > > can you try the patch below?
> > > > 
> > > > No more spinning kworker woes, but I still have a warning on
> > > > hibernate,
> > > > threadirqs invariant.  I'm also seeing intermittent post
> > > > hibernate hang
> > > > funnies in virgin source +- this patch, and without threadirqs.
> > > > 
> > > > [  110.223953] WARNING: CPU: 5 PID: 452 at
> > > > drivers/pci/msi.c:1261 pci_irq_vector+0xb1/0xe0
> > > > 
> > > > > > -Mike
> > > 
> > > I just sent a patch fixing that.
> > > However I think we want to print a message when MSI fails to work
> > > so we
> > > know guest is falling back on legacy interrupts.
> > 
> > The warning persists.
> > 
> > [  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261
> > pci_irq_vector+0xb1/0xe0
> 
> Can you post the rest of the backtrace? Is it still in the console?

This is from a dump of post hibernate loop dying vbox I captured and
squirreled away, so pid is different.  I'm not absolutely certain that
I didn't have my local patch set re-applied when I did this, so I'll
rebuild in the a.m..  My stuff is unrelated, so this should be fine.

[  328.475988] [ cut here ]
[  328.476002] WARNING: CPU: 6 PID: 313 at drivers/pci/msi.c:1261 
pci_irq_vector+0xb1/0xe0
[  328.476003] Modules linked in: fuse(E) ebtable_filter(E) ebtables(E) 
nf_log_ipv6(E) xt_pkttype(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) 
xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) 
af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) 
nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) 
iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) joydev(E) snd_hwdep(E) 
snd_pcm(E) snd_timer(E) snd(E) 8139too(E) soundcore(E) i2c_piix4(E) 
virtio_balloon(E) crct10dif_pclmul(E)
[  328.476019]  crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) parport_pc(E) 
acpi_cpufreq(E) pcbc(E) button(E) parport(E) aesni_intel(E) aes_x86_64(E) 
serio_raw(E) pcspkr(E) crypto_simd(E) glue_helper(E) cryptd(E) nfsd(E) 
auth_rpcgss(E) nfs_acl(E) lockd(E) dm_mod(E) grace(E) sunrpc(E) ext4(E) 
crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) 
ata_generic(E) virtio_blk(E) virtio_rng(E) virtio_console(E) ata_piix(E) qxl(E) 
drm_kms_helper(E) syscopyarea(E) uhci_hcd(E) ehci_pci(E) sysfillrect(E) 
sysimgblt(E) ahci(E) fb_sys_fops(E) ehci_hcd(E) libahci(E) crc32c_intel(E) 
ttm(E) virtio_pci(E) virtio_ring(E) 8139cp(E) virtio(E) usbcore(E) floppy(E) 
mii(E) drm(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
[  328.476037] CPU: 6 PID: 313 Comm: kworker/u16:2 Tainted: GE   
4.11.0-default #20
[  328.476038] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[  328.476041] Workqueue: events_unbound async_run_entry_fn
[  328.476042] Call Trace:
[  328.476056]  ? dump_stack+0x5c/0x85
[  328.476058]  ? __warn+0xc4/0xe0
[  328.476060]  ? pci_pm_poweroff+0xf0/0xf0
[  328.476062]  ? pci_irq_vector+0xb1/0xe0
[  328.476064]  ? vp_del_vqs+0xcb/0x120 [virtio_pci]
[  328.476066]  ? remove_common+0x60/0x80 [virtio_rng]
[  328.476067]  ? virtrng_freeze+0xa/0x10 [virtio_rng]
[  328.476068]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
[  328.476069]  ? pci_pm_freeze+0x59/0xe0
[  328.476070]  ? dpm_run_callback+0x4d/0x170
[  328.476071]  ? __device_suspend+0x11f/0x3b0
[  328.476072]  ? pm_dev_dbg+0x70/0x70
[  328.476072]  ? async_suspend+0x1a/0x90
[  328.476082]  ? async_run_entry_fn+0x34/0x160
[  328.476083]  ? process_one_work+0x164/0x430
[  328.476084]  ? worker_thread+0x135/0x4d0
[  328.476085]  ? kthread+0xff/0x140
[  328.476086]  ? rescuer_thread+0x3c0/0x3c0
[  328.476087]  ? kthread_park+0x80/0x80
[  328.476088]  ? do_group_exit+0x39/0xa0
[  328.476090]  ? ret_from_fork+0x26/0x40
[  328.476091] ---[ end trace a045c2118936902f ]---


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > Mike,
> > > > 
> > > > can you try the patch below?
> > > 
> > > No more spinning kworker woes, but I still have a warning on hibernate,
> > > threadirqs invariant.  I'm also seeing intermittent post hibernate hang
> > > funnies in virgin source +- this patch, and without threadirqs.
> > > 
> > > [  110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 
> > > pci_irq_vector+0xb1/0xe0
> > > 
> > >   > > -Mike
> > 
> > I just sent a patch fixing that.
> > However I think we want to print a message when MSI fails to work so we
> > know guest is falling back on legacy interrupts.
> 
> The warning persists.
> 
> [  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261 
> pci_irq_vector+0xb1/0xe0

Can you post the rest of the backtrace? Is it still in the console?

> WRT the post hibernate hang business, that is apparently not part of
> the 4.11 woes (at least not solely), as 4.10.8 did not survive a 10
> hibernate cycle loop.  RT is better at reproducing trouble (shrug, it
> frequently is), but it matters not whether I'm running 4.10, master or
> master-rt, they will all hang.
> 
> WRT gripe, I wedged virtio_pci-fix-msix-vector-tracking-on-cleanup in
> on top, but it wasn't impressed.
> 
>   -Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-04 Thread Michael S. Tsirkin
On Tue, Apr 04, 2017 at 06:02:52AM +0200, Mike Galbraith wrote:
> On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> > On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > > Mike,
> > > > 
> > > > can you try the patch below?
> > > 
> > > No more spinning kworker woes, but I still have a warning on hibernate,
> > > threadirqs invariant.  I'm also seeing intermittent post hibernate hang
> > > funnies in virgin source +- this patch, and without threadirqs.
> > > 
> > > [  110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 
> > > pci_irq_vector+0xb1/0xe0
> > > 
> > >   > > -Mike
> > 
> > I just sent a patch fixing that.
> > However I think we want to print a message when MSI fails to work so we
> > know guest is falling back on legacy interrupts.
> 
> The warning persists.
> 
> [  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261 
> pci_irq_vector+0xb1/0xe0

Can you post the rest of the backtrace? Is it still in the console?

> WRT the post hibernate hang business, that is apparently not part of
> the 4.11 woes (at least not solely), as 4.10.8 did not survive a 10
> hibernate cycle loop.  RT is better at reproducing trouble (shrug, it
> frequently is), but it matters not whether I'm running 4.10, master or
> master-rt, they will all hang.
> 
> WRT gripe, I wedged virtio_pci-fix-msix-vector-tracking-on-cleanup in
> on top, but it wasn't impressed.
> 
>   -Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Mike Galbraith
On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > Mike,
> > > 
> > > can you try the patch below?
> > 
> > No more spinning kworker woes, but I still have a warning on hibernate,
> > threadirqs invariant.  I'm also seeing intermittent post hibernate hang
> > funnies in virgin source +- this patch, and without threadirqs.
> > 
> > [  110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 
> > pci_irq_vector+0xb1/0xe0
> > 
> > > > -Mike
> 
> I just sent a patch fixing that.
> However I think we want to print a message when MSI fails to work so we
> know guest is falling back on legacy interrupts.

The warning persists.

[  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261 
pci_irq_vector+0xb1/0xe0

WRT the post hibernate hang business, that is apparently not part of
the 4.11 woes (at least not solely), as 4.10.8 did not survive a 10
hibernate cycle loop.  RT is better at reproducing trouble (shrug, it
frequently is), but it matters not whether I'm running 4.10, master or
master-rt, they will all hang.

WRT gripe, I wedged virtio_pci-fix-msix-vector-tracking-on-cleanup in
on top, but it wasn't impressed.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Mike Galbraith
On Mon, 2017-04-03 at 21:11 +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> > On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > > Mike,
> > > 
> > > can you try the patch below?
> > 
> > No more spinning kworker woes, but I still have a warning on hibernate,
> > threadirqs invariant.  I'm also seeing intermittent post hibernate hang
> > funnies in virgin source +- this patch, and without threadirqs.
> > 
> > [  110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 
> > pci_irq_vector+0xb1/0xe0
> > 
> > > > -Mike
> 
> I just sent a patch fixing that.
> However I think we want to print a message when MSI fails to work so we
> know guest is falling back on legacy interrupts.

The warning persists.

[  137.656423] WARNING: CPU: 1 PID: 535 at drivers/pci/msi.c:1261 
pci_irq_vector+0xb1/0xe0

WRT the post hibernate hang business, that is apparently not part of
the 4.11 woes (at least not solely), as 4.10.8 did not survive a 10
hibernate cycle loop.  RT is better at reproducing trouble (shrug, it
frequently is), but it matters not whether I'm running 4.10, master or
master-rt, they will all hang.

WRT gripe, I wedged virtio_pci-fix-msix-vector-tracking-on-cleanup in
on top, but it wasn't impressed.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > Mike,
> > 
> > can you try the patch below?
> 
> No more spinning kworker woes, but I still have a warning on hibernate,
> threadirqs invariant.  I'm also seeing intermittent post hibernate hang
> funnies in virgin source +- this patch, and without threadirqs.
> 
> [  110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 
> pci_irq_vector+0xb1/0xe0
> 
>   -Mike

I just sent a patch fixing that.
However I think we want to print a message when MSI fails to work so we
know guest is falling back on legacy interrupts.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2017 at 07:56:32PM +0200, Mike Galbraith wrote:
> On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> > Mike,
> > 
> > can you try the patch below?
> 
> No more spinning kworker woes, but I still have a warning on hibernate,
> threadirqs invariant.  I'm also seeing intermittent post hibernate hang
> funnies in virgin source +- this patch, and without threadirqs.
> 
> [  110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 
> pci_irq_vector+0xb1/0xe0
> 
>   -Mike

I just sent a patch fixing that.
However I think we want to print a message when MSI fails to work so we
know guest is falling back on legacy interrupts.

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Mike Galbraith
On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> Mike,
> 
> can you try the patch below?

No more spinning kworker woes, but I still have a warning on hibernate,
threadirqs invariant.  I'm also seeing intermittent post hibernate hang
funnies in virgin source +- this patch, and without threadirqs.

[  110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 
pci_irq_vector+0xb1/0xe0

-Mike



Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Mike Galbraith
On Mon, 2017-04-03 at 16:18 +0200, Christoph Hellwig wrote:
> Mike,
> 
> can you try the patch below?

No more spinning kworker woes, but I still have a warning on hibernate,
threadirqs invariant.  I'm also seeing intermittent post hibernate hang
funnies in virgin source +- this patch, and without threadirqs.

[  110.223953] WARNING: CPU: 5 PID: 452 at drivers/pci/msi.c:1261 
pci_irq_vector+0xb1/0xe0

-Mike



Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2017 at 04:18:23PM +0200, Christoph Hellwig wrote:
> Mike,
> 
> can you try the patch below?
> 
> ---
> >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Mon, 3 Apr 2017 14:51:35 +0200
> Subject: virtio_pci: don't use shared irq for virtqueues
> 
> Reimplement the shared irq feature manually, as we might have a larger
> number of virtqueues than the core shared interrupt code can handle
> in threaded interrupt mode.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/virtio/virtio_pci_common.c | 142 
> +
>  drivers/virtio/virtio_pci_common.h |   1 +
>  2 files changed, 83 insertions(+), 60 deletions(-)

Well the original patch this is trying to fix is
07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507 which dropped just 40 lines
with documentation. It did this by re-using error handling to switch
from per-vq to non-per-vq mode. Now this has separate flows for errors
and per-vq non-per-vq switch and (I think, as a result) is adding 140
lines which doesn't make me very happy.

> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index 590534910dc6..6dd719543410 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -137,6 +137,9 @@ void vp_del_vqs(struct virtio_device *vdev)
>   kfree(vp_dev->msix_vector_map);
>   }
>  
> + /* free the shared virtuqueue irq if we don't use per-vq irqs */

typo

> + if (vp_dev->shared_vq_vec)
> + free_irq(pci_irq_vector(vp_dev->pci_dev, 1), vp_dev);

So we used to have enums for 1 and 0. I think it was cleaner.


>   free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev);
>   pci_free_irq_vectors(vp_dev->pci_dev);
>  }
> @@ -147,10 +150,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>  {
>   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>   const char *name = dev_name(_dev->vdev.dev);
> - int i, j, err = -ENOMEM, allocated_vectors, nvectors;
> + struct pci_dev *pdev = vp_dev->pci_dev;
> + int i, err = -ENOMEM, nvectors;
>   unsigned flags = PCI_IRQ_MSIX;
> - bool shared = false;
> - u16 msix_vec;
> + u16 msix_vec = 0;
>  
>   if (desc) {
>   flags |= PCI_IRQ_AFFINITY;
> @@ -162,19 +165,18 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   if (callbacks[i])
>   nvectors++;
>  
> - /* Try one vector per queue first. */
> - err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors,
> - nvectors, flags, desc);
> + /* Try one vector for config and one per queue first. */
> + err = pci_alloc_irq_vectors_affinity(pdev, nvectors, nvectors, flags,
> + desc);
>   if (err < 0) {
>   /* Fallback to one vector for config, one shared for queues. */
> - shared = true;
> - err = pci_alloc_irq_vectors(vp_dev->pci_dev, 2, 2,
> + nvectors = 2;
> + vp_dev->shared_vq_vec = true;
> + err = pci_alloc_irq_vectors(pdev, nvectors, nvectors,
>   PCI_IRQ_MSIX);
>   if (err < 0)
>   return err;
>   }
> - if (err < 0)
> - return err;
>  
>   vp_dev->msix_vectors = nvectors;
>   vp_dev->msix_names = kmalloc_array(nvectors,
> @@ -194,79 +196,99 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   }
>  
>   /* Set the vector used for configuration */
> - snprintf(vp_dev->msix_names[0], sizeof(*vp_dev->msix_names),
> + snprintf(vp_dev->msix_names[msix_vec], sizeof(*vp_dev->msix_names),
>"%s-config", name);
> - err = request_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_config_changed,
> - 0, vp_dev->msix_names[0], vp_dev);
> + err = request_irq(pci_irq_vector(pdev, msix_vec), vp_config_changed, 0,
> +   vp_dev->msix_names[msix_vec], vp_dev);
>   if (err)
>   goto out_free_msix_affinity_masks;
>  
>   /* Verify we had enough resources to assign the vector */
> - if (vp_dev->config_vector(vp_dev, 0) == VIRTIO_MSI_NO_VECTOR) {
> + if (vp_dev->config_vector(vp_dev, msix_vec) == VIRTIO_MSI_NO_VECTOR) {
>   err = -EBUSY;
>   goto out_free_config_irq;
>   }
>  
> - vp_dev->msix_vector_map = kmalloc_array(nvqs,
> - sizeof(*vp_dev->msix_vector_map), GFP_KERNEL);
> - if (!vp_dev->msix_vector_map)
> - goto out_disable_config_irq;
> -
> - allocated_vectors = j = 1; /* vector 0 is the config interrupt */
> - for (i = 0; i < nvqs; ++i) {
> - if (!names[i]) {
> - vqs[i] = NULL;
> - continue;
> - }
> -
> - if 

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2017 at 04:18:23PM +0200, Christoph Hellwig wrote:
> Mike,
> 
> can you try the patch below?
> 
> ---
> >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Mon, 3 Apr 2017 14:51:35 +0200
> Subject: virtio_pci: don't use shared irq for virtqueues
> 
> Reimplement the shared irq feature manually, as we might have a larger
> number of virtqueues than the core shared interrupt code can handle
> in threaded interrupt mode.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/virtio/virtio_pci_common.c | 142 
> +
>  drivers/virtio/virtio_pci_common.h |   1 +
>  2 files changed, 83 insertions(+), 60 deletions(-)

Well the original patch this is trying to fix is
07ec51480b5eb1233f8c1b0f5d7a7c8d1247c507 which dropped just 40 lines
with documentation. It did this by re-using error handling to switch
from per-vq to non-per-vq mode. Now this has separate flows for errors
and per-vq non-per-vq switch and (I think, as a result) is adding 140
lines which doesn't make me very happy.

> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index 590534910dc6..6dd719543410 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -137,6 +137,9 @@ void vp_del_vqs(struct virtio_device *vdev)
>   kfree(vp_dev->msix_vector_map);
>   }
>  
> + /* free the shared virtuqueue irq if we don't use per-vq irqs */

typo

> + if (vp_dev->shared_vq_vec)
> + free_irq(pci_irq_vector(vp_dev->pci_dev, 1), vp_dev);

So we used to have enums for 1 and 0. I think it was cleaner.


>   free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev);
>   pci_free_irq_vectors(vp_dev->pci_dev);
>  }
> @@ -147,10 +150,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>  {
>   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>   const char *name = dev_name(_dev->vdev.dev);
> - int i, j, err = -ENOMEM, allocated_vectors, nvectors;
> + struct pci_dev *pdev = vp_dev->pci_dev;
> + int i, err = -ENOMEM, nvectors;
>   unsigned flags = PCI_IRQ_MSIX;
> - bool shared = false;
> - u16 msix_vec;
> + u16 msix_vec = 0;
>  
>   if (desc) {
>   flags |= PCI_IRQ_AFFINITY;
> @@ -162,19 +165,18 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   if (callbacks[i])
>   nvectors++;
>  
> - /* Try one vector per queue first. */
> - err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors,
> - nvectors, flags, desc);
> + /* Try one vector for config and one per queue first. */
> + err = pci_alloc_irq_vectors_affinity(pdev, nvectors, nvectors, flags,
> + desc);
>   if (err < 0) {
>   /* Fallback to one vector for config, one shared for queues. */
> - shared = true;
> - err = pci_alloc_irq_vectors(vp_dev->pci_dev, 2, 2,
> + nvectors = 2;
> + vp_dev->shared_vq_vec = true;
> + err = pci_alloc_irq_vectors(pdev, nvectors, nvectors,
>   PCI_IRQ_MSIX);
>   if (err < 0)
>   return err;
>   }
> - if (err < 0)
> - return err;
>  
>   vp_dev->msix_vectors = nvectors;
>   vp_dev->msix_names = kmalloc_array(nvectors,
> @@ -194,79 +196,99 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   }
>  
>   /* Set the vector used for configuration */
> - snprintf(vp_dev->msix_names[0], sizeof(*vp_dev->msix_names),
> + snprintf(vp_dev->msix_names[msix_vec], sizeof(*vp_dev->msix_names),
>"%s-config", name);
> - err = request_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_config_changed,
> - 0, vp_dev->msix_names[0], vp_dev);
> + err = request_irq(pci_irq_vector(pdev, msix_vec), vp_config_changed, 0,
> +   vp_dev->msix_names[msix_vec], vp_dev);
>   if (err)
>   goto out_free_msix_affinity_masks;
>  
>   /* Verify we had enough resources to assign the vector */
> - if (vp_dev->config_vector(vp_dev, 0) == VIRTIO_MSI_NO_VECTOR) {
> + if (vp_dev->config_vector(vp_dev, msix_vec) == VIRTIO_MSI_NO_VECTOR) {
>   err = -EBUSY;
>   goto out_free_config_irq;
>   }
>  
> - vp_dev->msix_vector_map = kmalloc_array(nvqs,
> - sizeof(*vp_dev->msix_vector_map), GFP_KERNEL);
> - if (!vp_dev->msix_vector_map)
> - goto out_disable_config_irq;
> -
> - allocated_vectors = j = 1; /* vector 0 is the config interrupt */
> - for (i = 0; i < nvqs; ++i) {
> - if (!names[i]) {
> - vqs[i] = NULL;
> - continue;
> - }
> -
> - if (callbacks[i])
> - 

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2017 at 04:18:23PM +0200, Christoph Hellwig wrote:
> Mike,
> 
> can you try the patch below?

It's really easy to test on qemu so I will - just add a dummy
virtio-serial-pci device with -device virtio-serial-pci and
add threadirqs to kernel command line.

However it doesn't look like this will fix the error recovery
for when request irq fails - it will just make the error less likely.

So we still need to look into that - failure should recover
and use the intx path, ATM it causes hybernation to hang.

> ---
> >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Mon, 3 Apr 2017 14:51:35 +0200
> Subject: virtio_pci: don't use shared irq for virtqueues
> 
> Reimplement the shared irq feature manually, as we might have a larger
> number of virtqueues than the core shared interrupt code can handle
> in threaded interrupt mode.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/virtio/virtio_pci_common.c | 142 
> +
>  drivers/virtio/virtio_pci_common.h |   1 +
>  2 files changed, 83 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index 590534910dc6..6dd719543410 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -137,6 +137,9 @@ void vp_del_vqs(struct virtio_device *vdev)
>   kfree(vp_dev->msix_vector_map);
>   }
>  
> + /* free the shared virtuqueue irq if we don't use per-vq irqs */
> + if (vp_dev->shared_vq_vec)
> + free_irq(pci_irq_vector(vp_dev->pci_dev, 1), vp_dev);
>   free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev);
>   pci_free_irq_vectors(vp_dev->pci_dev);
>  }
> @@ -147,10 +150,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>  {
>   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>   const char *name = dev_name(_dev->vdev.dev);
> - int i, j, err = -ENOMEM, allocated_vectors, nvectors;
> + struct pci_dev *pdev = vp_dev->pci_dev;
> + int i, err = -ENOMEM, nvectors;
>   unsigned flags = PCI_IRQ_MSIX;
> - bool shared = false;
> - u16 msix_vec;
> + u16 msix_vec = 0;
>  
>   if (desc) {
>   flags |= PCI_IRQ_AFFINITY;
> @@ -162,19 +165,18 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   if (callbacks[i])
>   nvectors++;
>  
> - /* Try one vector per queue first. */
> - err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors,
> - nvectors, flags, desc);
> + /* Try one vector for config and one per queue first. */
> + err = pci_alloc_irq_vectors_affinity(pdev, nvectors, nvectors, flags,
> + desc);
>   if (err < 0) {
>   /* Fallback to one vector for config, one shared for queues. */
> - shared = true;
> - err = pci_alloc_irq_vectors(vp_dev->pci_dev, 2, 2,
> + nvectors = 2;
> + vp_dev->shared_vq_vec = true;
> + err = pci_alloc_irq_vectors(pdev, nvectors, nvectors,
>   PCI_IRQ_MSIX);
>   if (err < 0)
>   return err;
>   }
> - if (err < 0)
> - return err;
>  
>   vp_dev->msix_vectors = nvectors;
>   vp_dev->msix_names = kmalloc_array(nvectors,
> @@ -194,79 +196,99 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   }
>  
>   /* Set the vector used for configuration */
> - snprintf(vp_dev->msix_names[0], sizeof(*vp_dev->msix_names),
> + snprintf(vp_dev->msix_names[msix_vec], sizeof(*vp_dev->msix_names),
>"%s-config", name);
> - err = request_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_config_changed,
> - 0, vp_dev->msix_names[0], vp_dev);
> + err = request_irq(pci_irq_vector(pdev, msix_vec), vp_config_changed, 0,
> +   vp_dev->msix_names[msix_vec], vp_dev);
>   if (err)
>   goto out_free_msix_affinity_masks;
>  
>   /* Verify we had enough resources to assign the vector */
> - if (vp_dev->config_vector(vp_dev, 0) == VIRTIO_MSI_NO_VECTOR) {
> + if (vp_dev->config_vector(vp_dev, msix_vec) == VIRTIO_MSI_NO_VECTOR) {
>   err = -EBUSY;
>   goto out_free_config_irq;
>   }
>  
> - vp_dev->msix_vector_map = kmalloc_array(nvqs,
> - sizeof(*vp_dev->msix_vector_map), GFP_KERNEL);
> - if (!vp_dev->msix_vector_map)
> - goto out_disable_config_irq;
> -
> - allocated_vectors = j = 1; /* vector 0 is the config interrupt */
> - for (i = 0; i < nvqs; ++i) {
> - if (!names[i]) {
> - vqs[i] = NULL;
> - continue;
> - }
> -
> - if (callbacks[i])
> - 

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2017 at 04:18:23PM +0200, Christoph Hellwig wrote:
> Mike,
> 
> can you try the patch below?

It's really easy to test on qemu so I will - just add a dummy
virtio-serial-pci device with -device virtio-serial-pci and
add threadirqs to kernel command line.

However it doesn't look like this will fix the error recovery
for when request irq fails - it will just make the error less likely.

So we still need to look into that - failure should recover
and use the intx path, ATM it causes hybernation to hang.

> ---
> >From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Mon, 3 Apr 2017 14:51:35 +0200
> Subject: virtio_pci: don't use shared irq for virtqueues
> 
> Reimplement the shared irq feature manually, as we might have a larger
> number of virtqueues than the core shared interrupt code can handle
> in threaded interrupt mode.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/virtio/virtio_pci_common.c | 142 
> +
>  drivers/virtio/virtio_pci_common.h |   1 +
>  2 files changed, 83 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index 590534910dc6..6dd719543410 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -137,6 +137,9 @@ void vp_del_vqs(struct virtio_device *vdev)
>   kfree(vp_dev->msix_vector_map);
>   }
>  
> + /* free the shared virtuqueue irq if we don't use per-vq irqs */
> + if (vp_dev->shared_vq_vec)
> + free_irq(pci_irq_vector(vp_dev->pci_dev, 1), vp_dev);
>   free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev);
>   pci_free_irq_vectors(vp_dev->pci_dev);
>  }
> @@ -147,10 +150,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>  {
>   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>   const char *name = dev_name(_dev->vdev.dev);
> - int i, j, err = -ENOMEM, allocated_vectors, nvectors;
> + struct pci_dev *pdev = vp_dev->pci_dev;
> + int i, err = -ENOMEM, nvectors;
>   unsigned flags = PCI_IRQ_MSIX;
> - bool shared = false;
> - u16 msix_vec;
> + u16 msix_vec = 0;
>  
>   if (desc) {
>   flags |= PCI_IRQ_AFFINITY;
> @@ -162,19 +165,18 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   if (callbacks[i])
>   nvectors++;
>  
> - /* Try one vector per queue first. */
> - err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors,
> - nvectors, flags, desc);
> + /* Try one vector for config and one per queue first. */
> + err = pci_alloc_irq_vectors_affinity(pdev, nvectors, nvectors, flags,
> + desc);
>   if (err < 0) {
>   /* Fallback to one vector for config, one shared for queues. */
> - shared = true;
> - err = pci_alloc_irq_vectors(vp_dev->pci_dev, 2, 2,
> + nvectors = 2;
> + vp_dev->shared_vq_vec = true;
> + err = pci_alloc_irq_vectors(pdev, nvectors, nvectors,
>   PCI_IRQ_MSIX);
>   if (err < 0)
>   return err;
>   }
> - if (err < 0)
> - return err;
>  
>   vp_dev->msix_vectors = nvectors;
>   vp_dev->msix_names = kmalloc_array(nvectors,
> @@ -194,79 +196,99 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
> unsigned nvqs,
>   }
>  
>   /* Set the vector used for configuration */
> - snprintf(vp_dev->msix_names[0], sizeof(*vp_dev->msix_names),
> + snprintf(vp_dev->msix_names[msix_vec], sizeof(*vp_dev->msix_names),
>"%s-config", name);
> - err = request_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_config_changed,
> - 0, vp_dev->msix_names[0], vp_dev);
> + err = request_irq(pci_irq_vector(pdev, msix_vec), vp_config_changed, 0,
> +   vp_dev->msix_names[msix_vec], vp_dev);
>   if (err)
>   goto out_free_msix_affinity_masks;
>  
>   /* Verify we had enough resources to assign the vector */
> - if (vp_dev->config_vector(vp_dev, 0) == VIRTIO_MSI_NO_VECTOR) {
> + if (vp_dev->config_vector(vp_dev, msix_vec) == VIRTIO_MSI_NO_VECTOR) {
>   err = -EBUSY;
>   goto out_free_config_irq;
>   }
>  
> - vp_dev->msix_vector_map = kmalloc_array(nvqs,
> - sizeof(*vp_dev->msix_vector_map), GFP_KERNEL);
> - if (!vp_dev->msix_vector_map)
> - goto out_disable_config_irq;
> -
> - allocated_vectors = j = 1; /* vector 0 is the config interrupt */
> - for (i = 0; i < nvqs; ++i) {
> - if (!names[i]) {
> - vqs[i] = NULL;
> - continue;
> - }
> -
> - if (callbacks[i])
> - msix_vec = 

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Christoph Hellwig
Mike,

can you try the patch below?

---
>From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Mon, 3 Apr 2017 14:51:35 +0200
Subject: virtio_pci: don't use shared irq for virtqueues

Reimplement the shared irq feature manually, as we might have a larger
number of virtqueues than the core shared interrupt code can handle
in threaded interrupt mode.

Signed-off-by: Christoph Hellwig 
---
 drivers/virtio/virtio_pci_common.c | 142 +
 drivers/virtio/virtio_pci_common.h |   1 +
 2 files changed, 83 insertions(+), 60 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 590534910dc6..6dd719543410 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -137,6 +137,9 @@ void vp_del_vqs(struct virtio_device *vdev)
kfree(vp_dev->msix_vector_map);
}
 
+   /* free the shared virtuqueue irq if we don't use per-vq irqs */
+   if (vp_dev->shared_vq_vec)
+   free_irq(pci_irq_vector(vp_dev->pci_dev, 1), vp_dev);
free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev);
pci_free_irq_vectors(vp_dev->pci_dev);
 }
@@ -147,10 +150,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
const char *name = dev_name(_dev->vdev.dev);
-   int i, j, err = -ENOMEM, allocated_vectors, nvectors;
+   struct pci_dev *pdev = vp_dev->pci_dev;
+   int i, err = -ENOMEM, nvectors;
unsigned flags = PCI_IRQ_MSIX;
-   bool shared = false;
-   u16 msix_vec;
+   u16 msix_vec = 0;
 
if (desc) {
flags |= PCI_IRQ_AFFINITY;
@@ -162,19 +165,18 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
if (callbacks[i])
nvectors++;
 
-   /* Try one vector per queue first. */
-   err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors,
-   nvectors, flags, desc);
+   /* Try one vector for config and one per queue first. */
+   err = pci_alloc_irq_vectors_affinity(pdev, nvectors, nvectors, flags,
+   desc);
if (err < 0) {
/* Fallback to one vector for config, one shared for queues. */
-   shared = true;
-   err = pci_alloc_irq_vectors(vp_dev->pci_dev, 2, 2,
+   nvectors = 2;
+   vp_dev->shared_vq_vec = true;
+   err = pci_alloc_irq_vectors(pdev, nvectors, nvectors,
PCI_IRQ_MSIX);
if (err < 0)
return err;
}
-   if (err < 0)
-   return err;
 
vp_dev->msix_vectors = nvectors;
vp_dev->msix_names = kmalloc_array(nvectors,
@@ -194,79 +196,99 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
}
 
/* Set the vector used for configuration */
-   snprintf(vp_dev->msix_names[0], sizeof(*vp_dev->msix_names),
+   snprintf(vp_dev->msix_names[msix_vec], sizeof(*vp_dev->msix_names),
 "%s-config", name);
-   err = request_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_config_changed,
-   0, vp_dev->msix_names[0], vp_dev);
+   err = request_irq(pci_irq_vector(pdev, msix_vec), vp_config_changed, 0,
+ vp_dev->msix_names[msix_vec], vp_dev);
if (err)
goto out_free_msix_affinity_masks;
 
/* Verify we had enough resources to assign the vector */
-   if (vp_dev->config_vector(vp_dev, 0) == VIRTIO_MSI_NO_VECTOR) {
+   if (vp_dev->config_vector(vp_dev, msix_vec) == VIRTIO_MSI_NO_VECTOR) {
err = -EBUSY;
goto out_free_config_irq;
}
 
-   vp_dev->msix_vector_map = kmalloc_array(nvqs,
-   sizeof(*vp_dev->msix_vector_map), GFP_KERNEL);
-   if (!vp_dev->msix_vector_map)
-   goto out_disable_config_irq;
-
-   allocated_vectors = j = 1; /* vector 0 is the config interrupt */
-   for (i = 0; i < nvqs; ++i) {
-   if (!names[i]) {
-   vqs[i] = NULL;
-   continue;
-   }
-
-   if (callbacks[i])
-   msix_vec = allocated_vectors;
-   else
-   msix_vec = VIRTIO_MSI_NO_VECTOR;
-
-   vqs[i] = vp_dev->setup_vq(vp_dev, i, callbacks[i], names[i],
-   msix_vec);
-   if (IS_ERR(vqs[i])) {
-   err = PTR_ERR(vqs[i]);
-   goto out_remove_vqs;
+   msix_vec++;
+
+   /*
+* Use a different vector for each queue if they are available,
+* else share the same vector for all VQs.
+*/
+   if (vp_dev->shared_vq_vec) {
+

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-04-03 Thread Christoph Hellwig
Mike,

can you try the patch below?

---
>From fe41a30b54878cc631623b7511267125e0da4b15 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Mon, 3 Apr 2017 14:51:35 +0200
Subject: virtio_pci: don't use shared irq for virtqueues

Reimplement the shared irq feature manually, as we might have a larger
number of virtqueues than the core shared interrupt code can handle
in threaded interrupt mode.

Signed-off-by: Christoph Hellwig 
---
 drivers/virtio/virtio_pci_common.c | 142 +
 drivers/virtio/virtio_pci_common.h |   1 +
 2 files changed, 83 insertions(+), 60 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 590534910dc6..6dd719543410 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -137,6 +137,9 @@ void vp_del_vqs(struct virtio_device *vdev)
kfree(vp_dev->msix_vector_map);
}
 
+   /* free the shared virtuqueue irq if we don't use per-vq irqs */
+   if (vp_dev->shared_vq_vec)
+   free_irq(pci_irq_vector(vp_dev->pci_dev, 1), vp_dev);
free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev);
pci_free_irq_vectors(vp_dev->pci_dev);
 }
@@ -147,10 +150,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
const char *name = dev_name(_dev->vdev.dev);
-   int i, j, err = -ENOMEM, allocated_vectors, nvectors;
+   struct pci_dev *pdev = vp_dev->pci_dev;
+   int i, err = -ENOMEM, nvectors;
unsigned flags = PCI_IRQ_MSIX;
-   bool shared = false;
-   u16 msix_vec;
+   u16 msix_vec = 0;
 
if (desc) {
flags |= PCI_IRQ_AFFINITY;
@@ -162,19 +165,18 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
if (callbacks[i])
nvectors++;
 
-   /* Try one vector per queue first. */
-   err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors,
-   nvectors, flags, desc);
+   /* Try one vector for config and one per queue first. */
+   err = pci_alloc_irq_vectors_affinity(pdev, nvectors, nvectors, flags,
+   desc);
if (err < 0) {
/* Fallback to one vector for config, one shared for queues. */
-   shared = true;
-   err = pci_alloc_irq_vectors(vp_dev->pci_dev, 2, 2,
+   nvectors = 2;
+   vp_dev->shared_vq_vec = true;
+   err = pci_alloc_irq_vectors(pdev, nvectors, nvectors,
PCI_IRQ_MSIX);
if (err < 0)
return err;
}
-   if (err < 0)
-   return err;
 
vp_dev->msix_vectors = nvectors;
vp_dev->msix_names = kmalloc_array(nvectors,
@@ -194,79 +196,99 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
}
 
/* Set the vector used for configuration */
-   snprintf(vp_dev->msix_names[0], sizeof(*vp_dev->msix_names),
+   snprintf(vp_dev->msix_names[msix_vec], sizeof(*vp_dev->msix_names),
 "%s-config", name);
-   err = request_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_config_changed,
-   0, vp_dev->msix_names[0], vp_dev);
+   err = request_irq(pci_irq_vector(pdev, msix_vec), vp_config_changed, 0,
+ vp_dev->msix_names[msix_vec], vp_dev);
if (err)
goto out_free_msix_affinity_masks;
 
/* Verify we had enough resources to assign the vector */
-   if (vp_dev->config_vector(vp_dev, 0) == VIRTIO_MSI_NO_VECTOR) {
+   if (vp_dev->config_vector(vp_dev, msix_vec) == VIRTIO_MSI_NO_VECTOR) {
err = -EBUSY;
goto out_free_config_irq;
}
 
-   vp_dev->msix_vector_map = kmalloc_array(nvqs,
-   sizeof(*vp_dev->msix_vector_map), GFP_KERNEL);
-   if (!vp_dev->msix_vector_map)
-   goto out_disable_config_irq;
-
-   allocated_vectors = j = 1; /* vector 0 is the config interrupt */
-   for (i = 0; i < nvqs; ++i) {
-   if (!names[i]) {
-   vqs[i] = NULL;
-   continue;
-   }
-
-   if (callbacks[i])
-   msix_vec = allocated_vectors;
-   else
-   msix_vec = VIRTIO_MSI_NO_VECTOR;
-
-   vqs[i] = vp_dev->setup_vq(vp_dev, i, callbacks[i], names[i],
-   msix_vec);
-   if (IS_ERR(vqs[i])) {
-   err = PTR_ERR(vqs[i]);
-   goto out_remove_vqs;
+   msix_vec++;
+
+   /*
+* Use a different vector for each queue if they are available,
+* else share the same vector for all VQs.
+*/
+   if (vp_dev->shared_vq_vec) {
+   

Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-31 Thread Michael S. Tsirkin
On Fri, Mar 31, 2017 at 10:20:49AM +0200, Christoph Hellwig wrote:
> On Fri, Mar 31, 2017 at 06:22:31AM +0300, Michael S. Tsirkin wrote:
> > I'm not sure why does it fail after 32 on 64 bit, but as
> > virtio devices aren't limited to 32 vqs it looks like we
> > should go back to requesting the irq only once for all vqs.
> 
> Meh.
> 
> > 
> > Christoph, should I just revert for now, or do you
> > want to look into a smaller patch for this?
> 
> I think we'll need to do a different patch than just a simple revert,
> mostly because so much infrastructure depends on the patch.
> 
> I'll take a look over the weekend.
> 
> > Another question is looking into intx support - that
> > should work but it seems to be broken at the moment.
> 
> Does it?  I'm pretty sure I tested it back when I came up with the
> series by artifically disabling MSI-X in the kernel.  I can try this
> again, though.

I'm not 100% sure - what I see is that we do not handle failure to
request irqs correctly, we seem to fall back on intx but
the following freeze then blows up trying to free non-existing
vectors.

Does not seem to trigger with just msix off so maybe that is
simply failure to recover from an error correctly.


-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-31 Thread Michael S. Tsirkin
On Fri, Mar 31, 2017 at 10:20:49AM +0200, Christoph Hellwig wrote:
> On Fri, Mar 31, 2017 at 06:22:31AM +0300, Michael S. Tsirkin wrote:
> > I'm not sure why does it fail after 32 on 64 bit, but as
> > virtio devices aren't limited to 32 vqs it looks like we
> > should go back to requesting the irq only once for all vqs.
> 
> Meh.
> 
> > 
> > Christoph, should I just revert for now, or do you
> > want to look into a smaller patch for this?
> 
> I think we'll need to do a different patch than just a simple revert,
> mostly because so much infrastructure depends on the patch.
> 
> I'll take a look over the weekend.
> 
> > Another question is looking into intx support - that
> > should work but it seems to be broken at the moment.
> 
> Does it?  I'm pretty sure I tested it back when I came up with the
> series by artifically disabling MSI-X in the kernel.  I can try this
> again, though.

I'm not 100% sure - what I see is that we do not handle failure to
request irqs correctly, we seem to fall back on intx but
the following freeze then blows up trying to free non-existing
vectors.

Does not seem to trigger with just msix off so maybe that is
simply failure to recover from an error correctly.


-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-31 Thread Christoph Hellwig
On Fri, Mar 31, 2017 at 06:22:31AM +0300, Michael S. Tsirkin wrote:
> I'm not sure why does it fail after 32 on 64 bit, but as
> virtio devices aren't limited to 32 vqs it looks like we
> should go back to requesting the irq only once for all vqs.

Meh.

> 
> Christoph, should I just revert for now, or do you
> want to look into a smaller patch for this?

I think we'll need to do a different patch than just a simple revert,
mostly because so much infrastructure depends on the patch.

I'll take a look over the weekend.

> Another question is looking into intx support - that
> should work but it seems to be broken at the moment.

Does it?  I'm pretty sure I tested it back when I came up with the
series by artifically disabling MSI-X in the kernel.  I can try this
again, though.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-31 Thread Christoph Hellwig
On Fri, Mar 31, 2017 at 06:22:31AM +0300, Michael S. Tsirkin wrote:
> I'm not sure why does it fail after 32 on 64 bit, but as
> virtio devices aren't limited to 32 vqs it looks like we
> should go back to requesting the irq only once for all vqs.

Meh.

> 
> Christoph, should I just revert for now, or do you
> want to look into a smaller patch for this?

I think we'll need to do a different patch than just a simple revert,
mostly because so much infrastructure depends on the patch.

I'll take a look over the weekend.

> Another question is looking into intx support - that
> should work but it seems to be broken at the moment.

Does it?  I'm pretty sure I tested it back when I came up with the
series by artifically disabling MSI-X in the kernel.  I can try this
again, though.


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-30 Thread Michael S. Tsirkin
On Fri, Mar 31, 2017 at 04:23:35AM +0300, Michael S. Tsirkin wrote:
> On Thu, Mar 30, 2017 at 09:20:35AM +0200, Mike Galbraith wrote:
> > On Thu, 2017-03-30 at 05:10 +0200, Mike Galbraith wrote:
> > 
> > > WRT spin, you should need do nothing more than boot with threadirqs,
> > > that's 100% repeatable here in absolutely virgin source.
> > 
> > No idea why virtqueue_get_buf() in __send_control_msg() fails forever
> > with threadirqs, but marking that vq as being busted (it clearly is)
> > results in one gripe, and a vbox that seemingly cares not one whit that
> > something went missing.  CONFIG_DEBUG_SHIRQ OTOH notices, mutters
> > something that sounds like "idiot" when I hibernate the thing ;-)
> > 
> > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > index e9b7e0b3cabe..831406dae1cb 100644
> > --- a/drivers/char/virtio_console.c
> > +++ b/drivers/char/virtio_console.c
> > @@ -567,6 +567,7 @@ static ssize_t __send_control_msg(struct ports_device 
> > *portdev, u32 port_id,
> > struct scatterlist sg[1];
> > struct virtqueue *vq;
> > unsigned int len;
> > +   unsigned long deadline = jiffies+1;
> >  
> > if (!use_multiport(portdev))
> > return 0;
> > @@ -583,9 +584,13 @@ static ssize_t __send_control_msg(struct ports_device 
> > *portdev, u32 port_id,
> >  
> > if (virtqueue_add_outbuf(vq, sg, 1, >cpkt, GFP_ATOMIC) == 0) {
> > virtqueue_kick(vq);
> > -   while (!virtqueue_get_buf(vq, )
> > -   && !virtqueue_is_broken(vq))
> > +   while (!virtqueue_get_buf(vq, ) && 
> > !virtqueue_is_broken(vq)) {
> > cpu_relax();
> > +   if (time_after(jiffies, deadline)) {
> > +   trace_printk("Aw crap, I'm stuck.. breaking 
> > device\n");
> > +   virtio_break_device(portdev->vdev);
> > +   }
> > +   }
> > }
> >  
> > spin_unlock(>c_ovq_lock);
> 
> 
> OK so with your help I was able to reproduce. Surprisingly easy:
> 
> 1. add threadirqs
> 2. add to qemu -device virtio-serial-pci -no-shutdown
> 3. within guest, do echo disk > /sys/power/state
> 
> This produces a warning. Looking deeper into it, I find:
> the device has 64 vqs. This line
> 
>err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
>   vring_interrupt, IRQF_SHARED,
>   vp_dev->msix_names[j], vqs[i]);
> 
> fails after assigning interrupts to 33 vqs.
> Is there a limit to how many threaded irqs can share a line?

In fact it fails on the 33'rd one, and I see this:

/*
 * Unlikely to have 32 resp 64 irqs sharing one line,
 * but who knows.
 */
if (thread_mask == ~0UL) {
printk(KERN_ERR "%s +%d\n", __FILE__, __LINE__);
ret = -EBUSY;
goto out_mask;
}


I'm not sure why does it fail after 32 on 64 bit, but as
virtio devices aren't limited to 32 vqs it looks like we
should go back to requesting the irq only once for all vqs.

Christoph, should I just revert for now, or do you
want to look into a smaller patch for this?

Another question is looking into intx support - that
should work but it seems to be broken at the moment.


> 
> If so we need to rethink the whole approach.
> 
> Still looking into it.
> 
> Christoph, any idea?
> 
> 
> -- 
> MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-30 Thread Michael S. Tsirkin
On Fri, Mar 31, 2017 at 04:23:35AM +0300, Michael S. Tsirkin wrote:
> On Thu, Mar 30, 2017 at 09:20:35AM +0200, Mike Galbraith wrote:
> > On Thu, 2017-03-30 at 05:10 +0200, Mike Galbraith wrote:
> > 
> > > WRT spin, you should need do nothing more than boot with threadirqs,
> > > that's 100% repeatable here in absolutely virgin source.
> > 
> > No idea why virtqueue_get_buf() in __send_control_msg() fails forever
> > with threadirqs, but marking that vq as being busted (it clearly is)
> > results in one gripe, and a vbox that seemingly cares not one whit that
> > something went missing.  CONFIG_DEBUG_SHIRQ OTOH notices, mutters
> > something that sounds like "idiot" when I hibernate the thing ;-)
> > 
> > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > index e9b7e0b3cabe..831406dae1cb 100644
> > --- a/drivers/char/virtio_console.c
> > +++ b/drivers/char/virtio_console.c
> > @@ -567,6 +567,7 @@ static ssize_t __send_control_msg(struct ports_device 
> > *portdev, u32 port_id,
> > struct scatterlist sg[1];
> > struct virtqueue *vq;
> > unsigned int len;
> > +   unsigned long deadline = jiffies+1;
> >  
> > if (!use_multiport(portdev))
> > return 0;
> > @@ -583,9 +584,13 @@ static ssize_t __send_control_msg(struct ports_device 
> > *portdev, u32 port_id,
> >  
> > if (virtqueue_add_outbuf(vq, sg, 1, >cpkt, GFP_ATOMIC) == 0) {
> > virtqueue_kick(vq);
> > -   while (!virtqueue_get_buf(vq, )
> > -   && !virtqueue_is_broken(vq))
> > +   while (!virtqueue_get_buf(vq, ) && 
> > !virtqueue_is_broken(vq)) {
> > cpu_relax();
> > +   if (time_after(jiffies, deadline)) {
> > +   trace_printk("Aw crap, I'm stuck.. breaking 
> > device\n");
> > +   virtio_break_device(portdev->vdev);
> > +   }
> > +   }
> > }
> >  
> > spin_unlock(>c_ovq_lock);
> 
> 
> OK so with your help I was able to reproduce. Surprisingly easy:
> 
> 1. add threadirqs
> 2. add to qemu -device virtio-serial-pci -no-shutdown
> 3. within guest, do echo disk > /sys/power/state
> 
> This produces a warning. Looking deeper into it, I find:
> the device has 64 vqs. This line
> 
>err = request_irq(pci_irq_vector(vp_dev->pci_dev, msix_vec),
>   vring_interrupt, IRQF_SHARED,
>   vp_dev->msix_names[j], vqs[i]);
> 
> fails after assigning interrupts to 33 vqs.
> Is there a limit to how many threaded irqs can share a line?

In fact it fails on the 33'rd one, and I see this:

/*
 * Unlikely to have 32 resp 64 irqs sharing one line,
 * but who knows.
 */
if (thread_mask == ~0UL) {
printk(KERN_ERR "%s +%d\n", __FILE__, __LINE__);
ret = -EBUSY;
goto out_mask;
}


I'm not sure why does it fail after 32 on 64 bit, but as
virtio devices aren't limited to 32 vqs it looks like we
should go back to requesting the irq only once for all vqs.

Christoph, should I just revert for now, or do you
want to look into a smaller patch for this?

Another question is looking into intx support - that
should work but it seems to be broken at the moment.


> 
> If so we need to rethink the whole approach.
> 
> Still looking into it.
> 
> Christoph, any idea?
> 
> 
> -- 
> MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-30 Thread Mike Galbraith
On Thu, 2017-03-30 at 05:10 +0200, Mike Galbraith wrote:

> WRT spin, you should need do nothing more than boot with threadirqs,
> that's 100% repeatable here in absolutely virgin source.

No idea why virtqueue_get_buf() in __send_control_msg() fails forever
with threadirqs, but marking that vq as being busted (it clearly is)
results in one gripe, and a vbox that seemingly cares not one whit that
something went missing.  CONFIG_DEBUG_SHIRQ OTOH notices, mutters
something that sounds like "idiot" when I hibernate the thing ;-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index e9b7e0b3cabe..831406dae1cb 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -567,6 +567,7 @@ static ssize_t __send_control_msg(struct ports_device 
*portdev, u32 port_id,
struct scatterlist sg[1];
struct virtqueue *vq;
unsigned int len;
+   unsigned long deadline = jiffies+1;
 
if (!use_multiport(portdev))
return 0;
@@ -583,9 +584,13 @@ static ssize_t __send_control_msg(struct ports_device 
*portdev, u32 port_id,
 
if (virtqueue_add_outbuf(vq, sg, 1, >cpkt, GFP_ATOMIC) == 0) {
virtqueue_kick(vq);
-   while (!virtqueue_get_buf(vq, )
-   && !virtqueue_is_broken(vq))
+   while (!virtqueue_get_buf(vq, ) && 
!virtqueue_is_broken(vq)) {
cpu_relax();
+   if (time_after(jiffies, deadline)) {
+   trace_printk("Aw crap, I'm stuck.. breaking 
device\n");
+   virtio_break_device(portdev->vdev);
+   }
+   }
}
 
spin_unlock(>c_ovq_lock);


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-30 Thread Mike Galbraith
On Thu, 2017-03-30 at 05:10 +0200, Mike Galbraith wrote:

> WRT spin, you should need do nothing more than boot with threadirqs,
> that's 100% repeatable here in absolutely virgin source.

No idea why virtqueue_get_buf() in __send_control_msg() fails forever
with threadirqs, but marking that vq as being busted (it clearly is)
results in one gripe, and a vbox that seemingly cares not one whit that
something went missing.  CONFIG_DEBUG_SHIRQ OTOH notices, mutters
something that sounds like "idiot" when I hibernate the thing ;-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index e9b7e0b3cabe..831406dae1cb 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -567,6 +567,7 @@ static ssize_t __send_control_msg(struct ports_device 
*portdev, u32 port_id,
struct scatterlist sg[1];
struct virtqueue *vq;
unsigned int len;
+   unsigned long deadline = jiffies+1;
 
if (!use_multiport(portdev))
return 0;
@@ -583,9 +584,13 @@ static ssize_t __send_control_msg(struct ports_device 
*portdev, u32 port_id,
 
if (virtqueue_add_outbuf(vq, sg, 1, >cpkt, GFP_ATOMIC) == 0) {
virtqueue_kick(vq);
-   while (!virtqueue_get_buf(vq, )
-   && !virtqueue_is_broken(vq))
+   while (!virtqueue_get_buf(vq, ) && 
!virtqueue_is_broken(vq)) {
cpu_relax();
+   if (time_after(jiffies, deadline)) {
+   trace_printk("Aw crap, I'm stuck.. breaking 
device\n");
+   virtio_break_device(portdev->vdev);
+   }
+   }
}
 
spin_unlock(>c_ovq_lock);


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:19 +0300, Michael S. Tsirkin wrote:
>  > >  > > > > > > > >  >max_nr_ports) == 0) {
> > @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
> >  
> >  static unsigned int features[] = {
> >  > >> > VIRTIO_CONSOLE_F_SIZE,
> > +#ifndef CONFIG_IRQ_FORCED_THREADING
> >  > >> > VIRTIO_CONSOLE_F_MULTIPORT,
> > +#endif
> >  };
> 
> These look kind of questionable.
> Is this part needed?

I would have sworn it was, but double checking, nope, it's not.

Hm, so I could make a prettier bandaid with a runtime check.. but it'd
remain a bandaid, so I'll go do some beans 'n' biscuits work instead.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:19 +0300, Michael S. Tsirkin wrote:
>  > >  > > > > > > > >  >max_nr_ports) == 0) {
> > @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
> >  
> >  static unsigned int features[] = {
> >  > >> > VIRTIO_CONSOLE_F_SIZE,
> > +#ifndef CONFIG_IRQ_FORCED_THREADING
> >  > >> > VIRTIO_CONSOLE_F_MULTIPORT,
> > +#endif
> >  };
> 
> These look kind of questionable.
> Is this part needed?

I would have sworn it was, but double checking, nope, it's not.

Hm, so I could make a prettier bandaid with a runtime check.. but it'd
remain a bandaid, so I'll go do some beans 'n' biscuits work instead.

-Mike


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:10 +0300, Michael S. Tsirkin wrote:

> Poking at this some more, I was able to reproduce at
> least some warnings. I still do not see a spin
> but is there a chance this helps your case too?

Well, it's down to one warning, clean on the way back up.

WRT spin, you should need do nothing more than boot with threadirqs,
that's 100% repeatable here in absolutely virgin source.  Attaching
(obese enterprise-ish) config.

[  174.147626] [ cut here ]
[  174.147640] WARNING: CPU: 7 PID: 339 at drivers/pci/msi.c:1251 
pci_irq_vector+0xcb/0xe0
[  174.147640] Modules linked in: dm_mod(E) fuse(E) ebtable_filter(E) 
ebtables(E) nf_log_ipv6(E) rpcsec_gss_krb5(E) xt_pkttype(E) nfsv4(E) 
nf_log_ipv4(E) nf_log_common(E) dns_resolver(E) xt_LOG(E) xt_limit(E) nfs(E) 
fscache(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) 
xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) 
ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
snd_hda_intel(E) snd_hda_codec(E) joydev(E) snd_hda_core(E) snd_hwdep(E) 
snd_pcm(E) snd_timer(E) crct10dif_pclmul(E) snd(E) crc32_pclmul(E) 
ghash_clmulni_intel(E)
[  174.147664]  pcbc(E) soundcore(E) 8139too(E) aesni_intel(E) i2c_piix4(E) 
ppdev(E) aes_x86_64(E) virtio_balloon(E) crypto_simd(E) parport_pc(E) 
glue_helper(E) serio_raw(E) pcspkr(E) parport(E) cryptd(E) button(E) 
acpi_cpufreq(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) 
ext4(E) crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) 
ata_generic(E) ata_piix(E) virtio_console(E) virtio_blk(E) virtio_rng(E) qxl(E) 
drm_kms_helper(E) syscopyarea(E) sysfillrect(E) ehci_pci(E) sysimgblt(E) 
ahci(E) fb_sys_fops(E) libahci(E) ttm(E) uhci_hcd(E) ehci_hcd(E) virtio_pci(E) 
virtio_ring(E) drm(E) crc32c_intel(E) 8139cp(E) libata(E) usbcore(E) mii(E) 
virtio(E) floppy(E) sg(E) scsi_mod(E) autofs4(E)
[  174.147702] CPU: 7 PID: 339 Comm: kworker/u16:3 Tainted: GE   
4.11.0-default #2
[  174.147702] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[  174.147707] Workqueue: events_unbound async_run_entry_fn
[  174.147708] Call Trace:
[  174.147713]  ? dump_stack+0x5c/0x85
[  174.147718]  ? __warn+0xc4/0xe0
[  174.147721]  ? pci_pm_poweroff+0xf0/0xf0
[  174.147722]  ? pci_irq_vector+0xcb/0xe0
[  174.147725]  ? vp_synchronize_vectors+0x3e/0x50 [virtio_pci]
[  174.147727]  ? virtcons_freeze+0x1f/0xa0 [virtio_console]
[  174.147729]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
[  174.147730]  ? pci_pm_freeze+0x59/0xe0
[  174.147737]  ? dpm_run_callback+0x4d/0x170
[  174.147738]  ? __device_suspend+0x11f/0x3b0
[  174.147739]  ? pm_dev_dbg+0x70/0x70
[  174.147739]  ? async_suspend+0x1a/0x90
[  174.147740]  ? async_run_entry_fn+0x34/0x160
[  174.147742]  ? process_one_work+0x164/0x430
[  174.147743]  ? worker_thread+0x135/0x4d0
[  174.147744]  ? kthread+0xff/0x140
[  174.147745]  ? rescuer_thread+0x3c0/0x3c0
[  174.147746]  ? kthread_park+0x80/0x80
[  174.147753]  ? ret_from_fork+0x26/0x40
[  174.147754] ---[ end trace 02cd3f1b527dc954 ]---

config.xz
Description: application/xz


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Wed, 2017-03-29 at 23:10 +0300, Michael S. Tsirkin wrote:

> Poking at this some more, I was able to reproduce at
> least some warnings. I still do not see a spin
> but is there a chance this helps your case too?

Well, it's down to one warning, clean on the way back up.

WRT spin, you should need do nothing more than boot with threadirqs,
that's 100% repeatable here in absolutely virgin source.  Attaching
(obese enterprise-ish) config.

[  174.147626] [ cut here ]
[  174.147640] WARNING: CPU: 7 PID: 339 at drivers/pci/msi.c:1251 
pci_irq_vector+0xcb/0xe0
[  174.147640] Modules linked in: dm_mod(E) fuse(E) ebtable_filter(E) 
ebtables(E) nf_log_ipv6(E) rpcsec_gss_krb5(E) xt_pkttype(E) nfsv4(E) 
nf_log_ipv4(E) nf_log_common(E) dns_resolver(E) xt_LOG(E) xt_limit(E) nfs(E) 
fscache(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) 
xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) 
ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) 
nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) 
ip6table_filter(E) ip6_tables(E) x_tables(E) snd_hda_codec_generic(E) 
snd_hda_intel(E) snd_hda_codec(E) joydev(E) snd_hda_core(E) snd_hwdep(E) 
snd_pcm(E) snd_timer(E) crct10dif_pclmul(E) snd(E) crc32_pclmul(E) 
ghash_clmulni_intel(E)
[  174.147664]  pcbc(E) soundcore(E) 8139too(E) aesni_intel(E) i2c_piix4(E) 
ppdev(E) aes_x86_64(E) virtio_balloon(E) crypto_simd(E) parport_pc(E) 
glue_helper(E) serio_raw(E) pcspkr(E) parport(E) cryptd(E) button(E) 
acpi_cpufreq(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) 
ext4(E) crc16(E) jbd2(E) mbcache(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) 
ata_generic(E) ata_piix(E) virtio_console(E) virtio_blk(E) virtio_rng(E) qxl(E) 
drm_kms_helper(E) syscopyarea(E) sysfillrect(E) ehci_pci(E) sysimgblt(E) 
ahci(E) fb_sys_fops(E) libahci(E) ttm(E) uhci_hcd(E) ehci_hcd(E) virtio_pci(E) 
virtio_ring(E) drm(E) crc32c_intel(E) 8139cp(E) libata(E) usbcore(E) mii(E) 
virtio(E) floppy(E) sg(E) scsi_mod(E) autofs4(E)
[  174.147702] CPU: 7 PID: 339 Comm: kworker/u16:3 Tainted: GE   
4.11.0-default #2
[  174.147702] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[  174.147707] Workqueue: events_unbound async_run_entry_fn
[  174.147708] Call Trace:
[  174.147713]  ? dump_stack+0x5c/0x85
[  174.147718]  ? __warn+0xc4/0xe0
[  174.147721]  ? pci_pm_poweroff+0xf0/0xf0
[  174.147722]  ? pci_irq_vector+0xcb/0xe0
[  174.147725]  ? vp_synchronize_vectors+0x3e/0x50 [virtio_pci]
[  174.147727]  ? virtcons_freeze+0x1f/0xa0 [virtio_console]
[  174.147729]  ? virtio_pci_freeze+0x19/0x40 [virtio_pci]
[  174.147730]  ? pci_pm_freeze+0x59/0xe0
[  174.147737]  ? dpm_run_callback+0x4d/0x170
[  174.147738]  ? __device_suspend+0x11f/0x3b0
[  174.147739]  ? pm_dev_dbg+0x70/0x70
[  174.147739]  ? async_suspend+0x1a/0x90
[  174.147740]  ? async_run_entry_fn+0x34/0x160
[  174.147742]  ? process_one_work+0x164/0x430
[  174.147743]  ? worker_thread+0x135/0x4d0
[  174.147744]  ? kthread+0xff/0x140
[  174.147745]  ? rescuer_thread+0x3c0/0x3c0
[  174.147746]  ? kthread_park+0x80/0x80
[  174.147753]  ? ret_from_fork+0x26/0x40
[  174.147754] ---[ end trace 02cd3f1b527dc954 ]---

config.xz
Description: application/xz


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Michael S. Tsirkin
On Wed, Mar 29, 2017 at 08:23:22AM +0200, Mike Galbraith wrote:
> On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote:
> 
> > BTW, WRT RT woes with $subject, I tried booting a generic kernel with
> > threadirqs, and bingo, same deal, just a bit more painful than for RT,
> > where there's no watchdog moaning accompanying the (preemptible) spin.
> 
> BTW++: the last hunk of this bandaid may be a bug fix.  With only the
> first two, box tried to use uninitialized stuff on hibernate, went
> boom.  Looks like that may be possible without help from me.
> 
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -2058,7 +2058,7 @@ static int virtcons_probe(struct virtio_
>   portdev->max_nr_ports = 1;
>  
>   /* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
> - if (!is_rproc_serial(vdev) &&
> + if (!is_rproc_serial(vdev) && !IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) 
> &&
>   virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
>struct virtio_console_config, max_nr_ports,
>>max_nr_ports) == 0) {
> @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
>  
>  static unsigned int features[] = {
>   VIRTIO_CONSOLE_F_SIZE,
> +#ifndef CONFIG_IRQ_FORCED_THREADING
>   VIRTIO_CONSOLE_F_MULTIPORT,
> +#endif
>  };

These look kind of questionable.
Is this part needed?

>  static struct virtio_device_id rproc_serial_id_table[] = {
> @@ -2202,14 +2204,16 @@ static int virtcons_freeze(struct virtio
>  
>   vdev->config->reset(vdev);
>  
> - virtqueue_disable_cb(portdev->c_ivq);
> + if (use_multiport(portdev))
> + virtqueue_disable_cb(portdev->c_ivq);
>   cancel_work_sync(>control_work);
>   cancel_work_sync(>config_work);
>   /*
>* Once more: if control_work_handler() was running, it would
>* enable the cb as the last step.
>*/
> - virtqueue_disable_cb(portdev->c_ivq);
> + if (use_multiport(portdev))
> + virtqueue_disable_cb(portdev->c_ivq);
>   remove_controlq_data(portdev);
>  
>   list_for_each_entry(port, >ports, list) {

This looks real. No idea why would interrupt sharing
trigger anything like this but go figure.
Can you pls submit this separately with
a signature?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Michael S. Tsirkin
On Wed, Mar 29, 2017 at 08:23:22AM +0200, Mike Galbraith wrote:
> On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote:
> 
> > BTW, WRT RT woes with $subject, I tried booting a generic kernel with
> > threadirqs, and bingo, same deal, just a bit more painful than for RT,
> > where there's no watchdog moaning accompanying the (preemptible) spin.
> 
> BTW++: the last hunk of this bandaid may be a bug fix.  With only the
> first two, box tried to use uninitialized stuff on hibernate, went
> boom.  Looks like that may be possible without help from me.
> 
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -2058,7 +2058,7 @@ static int virtcons_probe(struct virtio_
>   portdev->max_nr_ports = 1;
>  
>   /* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
> - if (!is_rproc_serial(vdev) &&
> + if (!is_rproc_serial(vdev) && !IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) 
> &&
>   virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
>struct virtio_console_config, max_nr_ports,
>>max_nr_ports) == 0) {
> @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
>  
>  static unsigned int features[] = {
>   VIRTIO_CONSOLE_F_SIZE,
> +#ifndef CONFIG_IRQ_FORCED_THREADING
>   VIRTIO_CONSOLE_F_MULTIPORT,
> +#endif
>  };

These look kind of questionable.
Is this part needed?

>  static struct virtio_device_id rproc_serial_id_table[] = {
> @@ -2202,14 +2204,16 @@ static int virtcons_freeze(struct virtio
>  
>   vdev->config->reset(vdev);
>  
> - virtqueue_disable_cb(portdev->c_ivq);
> + if (use_multiport(portdev))
> + virtqueue_disable_cb(portdev->c_ivq);
>   cancel_work_sync(>control_work);
>   cancel_work_sync(>config_work);
>   /*
>* Once more: if control_work_handler() was running, it would
>* enable the cb as the last step.
>*/
> - virtqueue_disable_cb(portdev->c_ivq);
> + if (use_multiport(portdev))
> + virtqueue_disable_cb(portdev->c_ivq);
>   remove_controlq_data(portdev);
>  
>   list_for_each_entry(port, >ports, list) {

This looks real. No idea why would interrupt sharing
trigger anything like this but go figure.
Can you pls submit this separately with
a signature?

-- 
MST


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Michael S. Tsirkin
On Wed, Mar 29, 2017 at 08:23:22AM +0200, Mike Galbraith wrote:
> On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote:
> 
> > BTW, WRT RT woes with $subject, I tried booting a generic kernel with
> > threadirqs, and bingo, same deal, just a bit more painful than for RT,
> > where there's no watchdog moaning accompanying the (preemptible) spin.
> 
> BTW++: the last hunk of this bandaid may be a bug fix.  With only the
> first two, box tried to use uninitialized stuff on hibernate, went
> boom.  Looks like that may be possible without help from me.
> 
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -2058,7 +2058,7 @@ static int virtcons_probe(struct virtio_
>   portdev->max_nr_ports = 1;
>  
>   /* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
> - if (!is_rproc_serial(vdev) &&
> + if (!is_rproc_serial(vdev) && !IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) 
> &&
>   virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
>struct virtio_console_config, max_nr_ports,
>>max_nr_ports) == 0) {
> @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
>  
>  static unsigned int features[] = {
>   VIRTIO_CONSOLE_F_SIZE,
> +#ifndef CONFIG_IRQ_FORCED_THREADING
>   VIRTIO_CONSOLE_F_MULTIPORT,
> +#endif
>  };
>  
>  static struct virtio_device_id rproc_serial_id_table[] = {
> @@ -2202,14 +2204,16 @@ static int virtcons_freeze(struct virtio
>  
>   vdev->config->reset(vdev);
>  
> - virtqueue_disable_cb(portdev->c_ivq);
> + if (use_multiport(portdev))
> + virtqueue_disable_cb(portdev->c_ivq);
>   cancel_work_sync(>control_work);
>   cancel_work_sync(>config_work);
>   /*
>* Once more: if control_work_handler() was running, it would
>* enable the cb as the last step.
>*/
> - virtqueue_disable_cb(portdev->c_ivq);
> + if (use_multiport(portdev))
> + virtqueue_disable_cb(portdev->c_ivq);
>   remove_controlq_data(portdev);
>  
>   list_for_each_entry(port, >ports, list) {


Poking at this some more, I was able to reproduce at
least some warnings. I still do not see a spin
but is there a chance this helps your case too?

commit 85039ca3162295759cf986aa753778043a90012c
Author: Michael S. Tsirkin 
Date:   Wed Mar 29 23:02:28 2017 +0300

virtio_pci: fix msix vector tracking on cleanup

virtio pci tracks allocated vectors in a variable: msix_vectors. This
isn't reset on del_vqs, as a result if reset is called after vqs are
deleted we try to synchronize non-existing irqs producing a (probably
harmless) warning.

Fixes: 07ec51480b5e ("virtio_pci: use shared interrupts for virtqueues")
Signed-off-by: Michael S. Tsirkin 

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index baae423..a70bed6 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -151,6 +151,7 @@ void vp_del_vqs(struct virtio_device *vdev)
}
 
free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev);
+   vp_dev->msix_vectors = 0;
pci_free_irq_vectors(vp_dev->pci_dev);
 }
 
@@ -294,6 +295,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
 out_free_msix_names:
kfree(vp_dev->msix_names);
 out_free_irq_vectors:
+   vp_dev->msix_vectors = 0;
pci_free_irq_vectors(vp_dev->pci_dev);
return err;
 }


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Michael S. Tsirkin
On Wed, Mar 29, 2017 at 08:23:22AM +0200, Mike Galbraith wrote:
> On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote:
> 
> > BTW, WRT RT woes with $subject, I tried booting a generic kernel with
> > threadirqs, and bingo, same deal, just a bit more painful than for RT,
> > where there's no watchdog moaning accompanying the (preemptible) spin.
> 
> BTW++: the last hunk of this bandaid may be a bug fix.  With only the
> first two, box tried to use uninitialized stuff on hibernate, went
> boom.  Looks like that may be possible without help from me.
> 
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -2058,7 +2058,7 @@ static int virtcons_probe(struct virtio_
>   portdev->max_nr_ports = 1;
>  
>   /* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
> - if (!is_rproc_serial(vdev) &&
> + if (!is_rproc_serial(vdev) && !IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) 
> &&
>   virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
>struct virtio_console_config, max_nr_ports,
>>max_nr_ports) == 0) {
> @@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
>  
>  static unsigned int features[] = {
>   VIRTIO_CONSOLE_F_SIZE,
> +#ifndef CONFIG_IRQ_FORCED_THREADING
>   VIRTIO_CONSOLE_F_MULTIPORT,
> +#endif
>  };
>  
>  static struct virtio_device_id rproc_serial_id_table[] = {
> @@ -2202,14 +2204,16 @@ static int virtcons_freeze(struct virtio
>  
>   vdev->config->reset(vdev);
>  
> - virtqueue_disable_cb(portdev->c_ivq);
> + if (use_multiport(portdev))
> + virtqueue_disable_cb(portdev->c_ivq);
>   cancel_work_sync(>control_work);
>   cancel_work_sync(>config_work);
>   /*
>* Once more: if control_work_handler() was running, it would
>* enable the cb as the last step.
>*/
> - virtqueue_disable_cb(portdev->c_ivq);
> + if (use_multiport(portdev))
> + virtqueue_disable_cb(portdev->c_ivq);
>   remove_controlq_data(portdev);
>  
>   list_for_each_entry(port, >ports, list) {


Poking at this some more, I was able to reproduce at
least some warnings. I still do not see a spin
but is there a chance this helps your case too?

commit 85039ca3162295759cf986aa753778043a90012c
Author: Michael S. Tsirkin 
Date:   Wed Mar 29 23:02:28 2017 +0300

virtio_pci: fix msix vector tracking on cleanup

virtio pci tracks allocated vectors in a variable: msix_vectors. This
isn't reset on del_vqs, as a result if reset is called after vqs are
deleted we try to synchronize non-existing irqs producing a (probably
harmless) warning.

Fixes: 07ec51480b5e ("virtio_pci: use shared interrupts for virtqueues")
Signed-off-by: Michael S. Tsirkin 

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index baae423..a70bed6 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -151,6 +151,7 @@ void vp_del_vqs(struct virtio_device *vdev)
}
 
free_irq(pci_irq_vector(vp_dev->pci_dev, 0), vp_dev);
+   vp_dev->msix_vectors = 0;
pci_free_irq_vectors(vp_dev->pci_dev);
 }
 
@@ -294,6 +295,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
 out_free_msix_names:
kfree(vp_dev->msix_names);
 out_free_irq_vectors:
+   vp_dev->msix_vectors = 0;
pci_free_irq_vectors(vp_dev->pci_dev);
return err;
 }


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote:

> BTW, WRT RT woes with $subject, I tried booting a generic kernel with
> threadirqs, and bingo, same deal, just a bit more painful than for RT,
> where there's no watchdog moaning accompanying the (preemptible) spin.

BTW++: the last hunk of this bandaid may be a bug fix.  With only the
first two, box tried to use uninitialized stuff on hibernate, went
boom.  Looks like that may be possible without help from me.

--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -2058,7 +2058,7 @@ static int virtcons_probe(struct virtio_
portdev->max_nr_ports = 1;
 
/* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
-   if (!is_rproc_serial(vdev) &&
+   if (!is_rproc_serial(vdev) && !IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) 
&&
virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
 struct virtio_console_config, max_nr_ports,
 >max_nr_ports) == 0) {
@@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
 
 static unsigned int features[] = {
VIRTIO_CONSOLE_F_SIZE,
+#ifndef CONFIG_IRQ_FORCED_THREADING
VIRTIO_CONSOLE_F_MULTIPORT,
+#endif
 };
 
 static struct virtio_device_id rproc_serial_id_table[] = {
@@ -2202,14 +2204,16 @@ static int virtcons_freeze(struct virtio
 
vdev->config->reset(vdev);
 
-   virtqueue_disable_cb(portdev->c_ivq);
+   if (use_multiport(portdev))
+   virtqueue_disable_cb(portdev->c_ivq);
cancel_work_sync(>control_work);
cancel_work_sync(>config_work);
/*
 * Once more: if control_work_handler() was running, it would
 * enable the cb as the last step.
 */
-   virtqueue_disable_cb(portdev->c_ivq);
+   if (use_multiport(portdev))
+   virtqueue_disable_cb(portdev->c_ivq);
remove_controlq_data(portdev);
 
list_for_each_entry(port, >ports, list) {


Re: Random guest crashes since 5c34d002dcc7 ("virtio_pci: use shared interrupts for virtqueues")

2017-03-29 Thread Mike Galbraith
On Mon, 2017-03-27 at 20:18 +0200, Mike Galbraith wrote:

> BTW, WRT RT woes with $subject, I tried booting a generic kernel with
> threadirqs, and bingo, same deal, just a bit more painful than for RT,
> where there's no watchdog moaning accompanying the (preemptible) spin.

BTW++: the last hunk of this bandaid may be a bug fix.  With only the
first two, box tried to use uninitialized stuff on hibernate, went
boom.  Looks like that may be possible without help from me.

--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -2058,7 +2058,7 @@ static int virtcons_probe(struct virtio_
portdev->max_nr_ports = 1;
 
/* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
-   if (!is_rproc_serial(vdev) &&
+   if (!is_rproc_serial(vdev) && !IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) 
&&
virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
 struct virtio_console_config, max_nr_ports,
 >max_nr_ports) == 0) {
@@ -2179,7 +2179,9 @@ static struct virtio_device_id id_table[
 
 static unsigned int features[] = {
VIRTIO_CONSOLE_F_SIZE,
+#ifndef CONFIG_IRQ_FORCED_THREADING
VIRTIO_CONSOLE_F_MULTIPORT,
+#endif
 };
 
 static struct virtio_device_id rproc_serial_id_table[] = {
@@ -2202,14 +2204,16 @@ static int virtcons_freeze(struct virtio
 
vdev->config->reset(vdev);
 
-   virtqueue_disable_cb(portdev->c_ivq);
+   if (use_multiport(portdev))
+   virtqueue_disable_cb(portdev->c_ivq);
cancel_work_sync(>control_work);
cancel_work_sync(>config_work);
/*
 * Once more: if control_work_handler() was running, it would
 * enable the cb as the last step.
 */
-   virtqueue_disable_cb(portdev->c_ivq);
+   if (use_multiport(portdev))
+   virtqueue_disable_cb(portdev->c_ivq);
remove_controlq_data(portdev);
 
list_for_each_entry(port, >ports, list) {


  1   2   >