Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-27 Thread Michael S. Tsirkin
On Tue, Mar 26, 2024 at 03:46:29PM +, Will Deacon wrote: > On Tue, Mar 26, 2024 at 11:43:13AM +, Will Deacon wrote: > > On Tue, Mar 26, 2024 at 09:38:55AM +, Keir Fraser wrote: > > > On Tue, Mar 26, 2024 at 03:49:02AM -0400, Michael S. Tsirkin wrote: > > > > > Secondly, the debugging

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-26 Thread Gavin Shan
On 3/27/24 09:14, Gavin Shan wrote: On 3/27/24 01:46, Will Deacon wrote: On Tue, Mar 26, 2024 at 11:43:13AM +, Will Deacon wrote: Ok, long shot after eyeballing the vhost code, but does the diff below help at all? It looks like vhost_vq_avail_empty() can advance the value saved in

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-26 Thread Gavin Shan
On 3/27/24 01:46, Will Deacon wrote: On Tue, Mar 26, 2024 at 11:43:13AM +, Will Deacon wrote: Ok, long shot after eyeballing the vhost code, but does the diff below help at all? It looks like vhost_vq_avail_empty() can advance the value saved in 'vq->avail_idx' but without the read

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-26 Thread Will Deacon
On Tue, Mar 26, 2024 at 11:43:13AM +, Will Deacon wrote: > On Tue, Mar 26, 2024 at 09:38:55AM +, Keir Fraser wrote: > > On Tue, Mar 26, 2024 at 03:49:02AM -0400, Michael S. Tsirkin wrote: > > > > Secondly, the debugging code is enhanced so that the available head for > > > >

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-26 Thread Will Deacon
On Tue, Mar 26, 2024 at 09:38:55AM +, Keir Fraser wrote: > On Tue, Mar 26, 2024 at 03:49:02AM -0400, Michael S. Tsirkin wrote: > > > Secondly, the debugging code is enhanced so that the available head for > > > (last_avail_idx - 1) is read for twice and recorded. It means the > > > available

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-26 Thread Keir Fraser
On Tue, Mar 26, 2024 at 03:49:02AM -0400, Michael S. Tsirkin wrote: > On Mon, Mar 25, 2024 at 05:34:29PM +1000, Gavin Shan wrote: > > > > On 3/20/24 17:14, Michael S. Tsirkin wrote: > > > On Wed, Mar 20, 2024 at 03:24:16PM +1000, Gavin Shan wrote: > > > > On 3/20/24 10:49, Michael S. Tsirkin

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-26 Thread Michael S. Tsirkin
On Mon, Mar 25, 2024 at 05:34:29PM +1000, Gavin Shan wrote: > > On 3/20/24 17:14, Michael S. Tsirkin wrote: > > On Wed, Mar 20, 2024 at 03:24:16PM +1000, Gavin Shan wrote: > > > On 3/20/24 10:49, Michael S. Tsirkin wrote:> > > > > diff --git a/drivers/virtio/virtio_ring.c

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-25 Thread Gavin Shan
On 3/20/24 17:14, Michael S. Tsirkin wrote: On Wed, Mar 20, 2024 at 03:24:16PM +1000, Gavin Shan wrote: On 3/20/24 10:49, Michael S. Tsirkin wrote:> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 6f7e5010a673..79456706d0bd 100644 ---

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-21 Thread Gavin Shan
On 3/21/24 03:15, Keir Fraser wrote: On Wed, Mar 20, 2024 at 03:24:16PM +1000, Gavin Shan wrote: Before this patch was posted, I had debugging code to record last 16 transactions to the available and used queue from guest and host side. It did reveal the wrong head was fetched from the

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-20 Thread Keir Fraser
On Wed, Mar 20, 2024 at 03:24:16PM +1000, Gavin Shan wrote: > > Before this patch was posted, I had debugging code to record last 16 > transactions > to the available and used queue from guest and host side. It did reveal the > wrong > head was fetched from the available queue. > > [

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-20 Thread Michael S. Tsirkin
On Wed, Mar 20, 2024 at 03:24:16PM +1000, Gavin Shan wrote: > On 3/20/24 10:49, Michael S. Tsirkin wrote:> > > I think you are wasting the time with these tests. Even if it helps what > > does this tell us? Try setting a flag as I suggested elsewhere. > > Then check it in vhost. > > Or here's

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Gavin Shan
On 3/20/24 10:49, Michael S. Tsirkin wrote:> I think you are wasting the time with these tests. Even if it helps what does this tell us? Try setting a flag as I suggested elsewhere. Then check it in vhost. Or here's another idea - possibly easier. Copy the high bits from index into ring itself.

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Michael S. Tsirkin
On Wed, Mar 20, 2024 at 09:56:58AM +1000, Gavin Shan wrote: > On 3/20/24 04:22, Will Deacon wrote: > > On Tue, Mar 19, 2024 at 02:59:23PM +1000, Gavin Shan wrote: > > > On 3/19/24 02:59, Will Deacon wrote: > > > > >drivers/virtio/virtio_ring.c | 12 +--- > > > > >1 file changed, 9

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Gavin Shan
On 3/20/24 04:22, Will Deacon wrote: On Tue, Mar 19, 2024 at 02:59:23PM +1000, Gavin Shan wrote: On 3/19/24 02:59, Will Deacon wrote: drivers/virtio/virtio_ring.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/virtio/virtio_ring.c

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Will Deacon
On Tue, Mar 19, 2024 at 02:59:23PM +1000, Gavin Shan wrote: > On 3/19/24 02:59, Will Deacon wrote: > > On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: > > > The issue is reported by Yihuang Yu who have 'netperf' test on > > > NVidia's grace-grace and grace-hopper machines. The

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Will Deacon
On Tue, Mar 19, 2024 at 03:36:31AM -0400, Michael S. Tsirkin wrote: > On Mon, Mar 18, 2024 at 04:59:24PM +, Will Deacon wrote: > > On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > > index

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Michael S. Tsirkin
On Tue, Mar 19, 2024 at 06:08:27PM +1000, Gavin Shan wrote: > On 3/19/24 17:09, Michael S. Tsirkin wrote: > > On Tue, Mar 19, 2024 at 04:49:50PM +1000, Gavin Shan wrote: > > > > > > On 3/19/24 16:43, Michael S. Tsirkin wrote: > > > > On Tue, Mar 19, 2024 at 04:38:49PM +1000, Gavin Shan wrote: > >

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Michael S. Tsirkin
On Tue, Mar 19, 2024 at 04:54:15PM +1000, Gavin Shan wrote: > On 3/19/24 16:10, Michael S. Tsirkin wrote: > > On Tue, Mar 19, 2024 at 02:09:34AM -0400, Michael S. Tsirkin wrote: > > > On Tue, Mar 19, 2024 at 02:59:23PM +1000, Gavin Shan wrote: > > > > On 3/19/24 02:59, Will Deacon wrote: > [...] >

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Gavin Shan
On 3/19/24 17:09, Michael S. Tsirkin wrote: On Tue, Mar 19, 2024 at 04:49:50PM +1000, Gavin Shan wrote: On 3/19/24 16:43, Michael S. Tsirkin wrote: On Tue, Mar 19, 2024 at 04:38:49PM +1000, Gavin Shan wrote: On 3/19/24 16:09, Michael S. Tsirkin wrote: diff --git

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Gavin Shan
On 3/19/24 17:04, Michael S. Tsirkin wrote: On Tue, Mar 19, 2024 at 04:54:15PM +1000, Gavin Shan wrote: On 3/19/24 16:10, Michael S. Tsirkin wrote: On Tue, Mar 19, 2024 at 02:09:34AM -0400, Michael S. Tsirkin wrote: On Tue, Mar 19, 2024 at 02:59:23PM +1000, Gavin Shan wrote: On 3/19/24

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Michael S. Tsirkin
On Mon, Mar 18, 2024 at 04:59:24PM +, Will Deacon wrote: > On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: > > The issue is reported by Yihuang Yu who have 'netperf' test on > > NVidia's grace-grace and grace-hopper machines. The 'netperf' > > client is started in the VM hosted by

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Michael S. Tsirkin
On Tue, Mar 19, 2024 at 04:49:50PM +1000, Gavin Shan wrote: > > On 3/19/24 16:43, Michael S. Tsirkin wrote: > > On Tue, Mar 19, 2024 at 04:38:49PM +1000, Gavin Shan wrote: > > > On 3/19/24 16:09, Michael S. Tsirkin wrote: > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c > > > > > >

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Michael S. Tsirkin
On Tue, Mar 19, 2024 at 04:54:15PM +1000, Gavin Shan wrote: > On 3/19/24 16:10, Michael S. Tsirkin wrote: > > On Tue, Mar 19, 2024 at 02:09:34AM -0400, Michael S. Tsirkin wrote: > > > On Tue, Mar 19, 2024 at 02:59:23PM +1000, Gavin Shan wrote: > > > > On 3/19/24 02:59, Will Deacon wrote: > [...] >

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Gavin Shan
On 3/19/24 16:10, Michael S. Tsirkin wrote: On Tue, Mar 19, 2024 at 02:09:34AM -0400, Michael S. Tsirkin wrote: On Tue, Mar 19, 2024 at 02:59:23PM +1000, Gavin Shan wrote: On 3/19/24 02:59, Will Deacon wrote: [...] diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Gavin Shan
On 3/19/24 16:43, Michael S. Tsirkin wrote: On Tue, Mar 19, 2024 at 04:38:49PM +1000, Gavin Shan wrote: On 3/19/24 16:09, Michael S. Tsirkin wrote: diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 49299b1f9ec7..7d852811c912 100644 ---

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Michael S. Tsirkin
On Tue, Mar 19, 2024 at 04:38:49PM +1000, Gavin Shan wrote: > On 3/19/24 16:09, Michael S. Tsirkin wrote: > > > > > > diff --git a/drivers/virtio/virtio_ring.c > > > > > b/drivers/virtio/virtio_ring.c > > > > > index 49299b1f9ec7..7d852811c912 100644 > > > > > --- a/drivers/virtio/virtio_ring.c

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Gavin Shan
On 3/19/24 16:09, Michael S. Tsirkin wrote: diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 49299b1f9ec7..7d852811c912 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -687,9 +687,15 @@ static inline int virtqueue_add_split(struct

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Michael S. Tsirkin
On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: > The issue is reported by Yihuang Yu who have 'netperf' test on > NVidia's grace-grace and grace-hopper machines. The 'netperf' > client is started in the VM hosted by grace-hopper machine, > while the 'netperf' server is running on

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Michael S. Tsirkin
On Tue, Mar 19, 2024 at 02:09:34AM -0400, Michael S. Tsirkin wrote: > On Tue, Mar 19, 2024 at 02:59:23PM +1000, Gavin Shan wrote: > > On 3/19/24 02:59, Will Deacon wrote: > > > On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: > > > > The issue is reported by Yihuang Yu who have

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-19 Thread Michael S. Tsirkin
On Tue, Mar 19, 2024 at 02:59:23PM +1000, Gavin Shan wrote: > On 3/19/24 02:59, Will Deacon wrote: > > On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: > > > The issue is reported by Yihuang Yu who have 'netperf' test on > > > NVidia's grace-grace and grace-hopper machines. The

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-18 Thread Gavin Shan
On 3/19/24 02:59, Will Deacon wrote: On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: The issue is reported by Yihuang Yu who have 'netperf' test on NVidia's grace-grace and grace-hopper machines. The 'netperf' client is started in the VM hosted by grace-hopper machine, while the

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-18 Thread Will Deacon
On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: > The issue is reported by Yihuang Yu who have 'netperf' test on > NVidia's grace-grace and grace-hopper machines. The 'netperf' > client is started in the VM hosted by grace-hopper machine, > while the 'netperf' server is running on

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-18 Thread Michael S. Tsirkin
On Mon, Mar 18, 2024 at 09:41:45AM +1000, Gavin Shan wrote: > On 3/18/24 02:50, Michael S. Tsirkin wrote: > > On Fri, Mar 15, 2024 at 09:24:36PM +1000, Gavin Shan wrote: > > > > > > On 3/15/24 21:05, Michael S. Tsirkin wrote: > > > > On Fri, Mar 15, 2024 at 08:45:10PM +1000, Gavin Shan wrote: > >

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-17 Thread Gavin Shan
On 3/18/24 02:50, Michael S. Tsirkin wrote: On Fri, Mar 15, 2024 at 09:24:36PM +1000, Gavin Shan wrote: On 3/15/24 21:05, Michael S. Tsirkin wrote: On Fri, Mar 15, 2024 at 08:45:10PM +1000, Gavin Shan wrote: Yes, I guess smp_wmb() ('dmb') is buggy on NVidia's grace-hopper platform. I tried

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-17 Thread Michael S. Tsirkin
On Fri, Mar 15, 2024 at 09:24:36PM +1000, Gavin Shan wrote: > > On 3/15/24 21:05, Michael S. Tsirkin wrote: > > On Fri, Mar 15, 2024 at 08:45:10PM +1000, Gavin Shan wrote: > > > > > Yes, I guess smp_wmb() ('dmb') is buggy on NVidia's grace-hopper > > > > > platform. I tried > > > to reproduce it

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-15 Thread Gavin Shan
On 3/15/24 21:05, Michael S. Tsirkin wrote: On Fri, Mar 15, 2024 at 08:45:10PM +1000, Gavin Shan wrote: Yes, I guess smp_wmb() ('dmb') is buggy on NVidia's grace-hopper platform. I tried to reproduce it with my own driver where one thread writes to the shared buffer and another thread reads

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-15 Thread Michael S. Tsirkin
On Fri, Mar 15, 2024 at 08:45:10PM +1000, Gavin Shan wrote: > > + Will, Catalin and Matt from Nvidia > > On 3/14/24 22:59, Michael S. Tsirkin wrote: > > On Thu, Mar 14, 2024 at 10:50:15PM +1000, Gavin Shan wrote: > > > On 3/14/24 21:50, Michael S. Tsirkin wrote: > > > > On Thu, Mar 14, 2024 at

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-15 Thread Gavin Shan
+ Will, Catalin and Matt from Nvidia On 3/14/24 22:59, Michael S. Tsirkin wrote: On Thu, Mar 14, 2024 at 10:50:15PM +1000, Gavin Shan wrote: On 3/14/24 21:50, Michael S. Tsirkin wrote: On Thu, Mar 14, 2024 at 08:15:22PM +1000, Gavin Shan wrote: On 3/14/24 18:05, Michael S. Tsirkin wrote:

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-14 Thread Michael S. Tsirkin
On Thu, Mar 14, 2024 at 10:50:15PM +1000, Gavin Shan wrote: > On 3/14/24 21:50, Michael S. Tsirkin wrote: > > On Thu, Mar 14, 2024 at 08:15:22PM +1000, Gavin Shan wrote: > > > On 3/14/24 18:05, Michael S. Tsirkin wrote: > > > > On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: > > > > >

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-14 Thread Gavin Shan
On 3/14/24 21:50, Michael S. Tsirkin wrote: On Thu, Mar 14, 2024 at 08:15:22PM +1000, Gavin Shan wrote: On 3/14/24 18:05, Michael S. Tsirkin wrote: On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: The issue is reported by Yihuang Yu who have 'netperf' test on NVidia's grace-grace

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-14 Thread Michael S. Tsirkin
On Thu, Mar 14, 2024 at 08:15:22PM +1000, Gavin Shan wrote: > On 3/14/24 18:05, Michael S. Tsirkin wrote: > > On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: > > > The issue is reported by Yihuang Yu who have 'netperf' test on > > > NVidia's grace-grace and grace-hopper machines. The

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-14 Thread Gavin Shan
On 3/14/24 18:05, Michael S. Tsirkin wrote: On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: The issue is reported by Yihuang Yu who have 'netperf' test on NVidia's grace-grace and grace-hopper machines. The 'netperf' client is started in the VM hosted by grace-hopper machine, while

Re: [PATCH] virtio_ring: Fix the stale index in available ring

2024-03-14 Thread Michael S. Tsirkin
On Thu, Mar 14, 2024 at 05:49:23PM +1000, Gavin Shan wrote: > The issue is reported by Yihuang Yu who have 'netperf' test on > NVidia's grace-grace and grace-hopper machines. The 'netperf' > client is started in the VM hosted by grace-hopper machine, > while the 'netperf' server is running on