Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-06 Thread Boris Ostrovsky

On 05/06/2015 03:12 AM, Jan Beulich wrote:

On 05.05.15 at 18:11,  wrote:

On 05/05/2015 11:58 AM, Jan Beulich wrote:

On 05.05.15 at 17:46,  wrote:

On 05/04/2015 05:14 AM, Andrew Cooper wrote:

On 04/05/2015 09:52, Jan Beulich wrote:

On 04.05.15 at 04:16,  wrote:

--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)

void cacheline_flush(char * addr)

{
+mb();
clflush(addr);
+mb();
}

I think the purpose of the flush is to force write back, not to evict
the cache line, and if so wmb() would appear to be sufficient. As
the SDM says that's not the case, a comment explaining why wmb()
is not sufficient would seem necessary. Plus in the description I
think "serializing" needs to be changed to "fencing", as serialization
is not what we really care about here. If you and the maintainers
agree, I could certainly fix up both aspects while committing.

On the subject of writebacks, we should get around to alternating-up the
use of clflushopt and clwb, either of which would be better than a
clflush in this case (avoiding the need for the leading mfence).

However, the ISA extension document does not indicate which processors
will have support for these new instructions.

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

We really should add support for this. On shutting down a very large
guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
This was due to serializing nature of CLFLUSH.

But flushing the IOMMU isn't being done via CPU instructions, but
rather via commands sent to the IOMMU. I.e. I'm somewhat
confused by your reply.

I didn't mean flushing IOMMU itself, sorry. I meant
__iommu_flush_cache() (or whatever it's equivalent we had in the
product, which was 4.1-based).

In that case I wonder how much of that flushing is really necessary
during IOMMU teardown. VT-d maintainers?


I should mention that we saw this on 4.1, where iommu teardown is done 
in the domain destruction code path. After we backported code that moves 
this into a tasklet things got much better.


Of course we still are doing flushing, even if in the background, and 
therefore system resources are still being consumed (and memory is not 
available until flushing is done). So if this is unnecessary then there 
are good reasons not to do it.


-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-06 Thread Jan Beulich
>>> On 06.05.15 at 09:26,  wrote:
> On 2015/5/6 15:12, Jan Beulich wrote:
> On 05.05.15 at 18:11,  wrote:
>>> On 05/05/2015 11:58 AM, Jan Beulich wrote:
>>> On 05.05.15 at 17:46,  wrote:
> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>> On 04/05/2015 09:52, Jan Beulich wrote:
>> On 04.05.15 at 04:16,  wrote:
 --- a/xen/drivers/passthrough/vtd/x86/vtd.c
 +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
 @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)

 void cacheline_flush(char * addr)
 {
 +mb();
 clflush(addr);
 +mb();
 }
>>> I think the purpose of the flush is to force write back, not to evict
>>> the cache line, and if so wmb() would appear to be sufficient. As
>>> the SDM says that's not the case, a comment explaining why wmb()
>>> is not sufficient would seem necessary. Plus in the description I
>>> think "serializing" needs to be changed to "fencing", as serialization
>>> is not what we really care about here. If you and the maintainers
>>> agree, I could certainly fix up both aspects while committing.
>> On the subject of writebacks, we should get around to alternating-up the
>> use of clflushopt and clwb, either of which would be better than a
>> clflush in this case (avoiding the need for the leading mfence).
>>
>> However, the ISA extension document does not indicate which processors
>> will have support for these new instructions.
> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>  
>
> We really should add support for this. On shutting down a very large
> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
> This was due to serializing nature of CLFLUSH.
 But flushing the IOMMU isn't being done via CPU instructions, but
 rather via commands sent to the IOMMU. I.e. I'm somewhat
 confused by your reply.
>>>
>>> I didn't mean flushing IOMMU itself, sorry. I meant
>>> __iommu_flush_cache() (or whatever it's equivalent we had in the
>>> product, which was 4.1-based).
>>
>> In that case I wonder how much of that flushing is really necessary
> 
> Sorry, what is that case?

The flushing CPU of side caches during IOMMU teardown for a guest.

>> during IOMMU teardown. VT-d maintainers?
>>
> 
> In most cases __iommu_flush_cache() is used to flush any remapping 
> structures into memory then IOMMU can get proper data.

Right, but here we're talking about teardown. Since the IOMMU
isn't to use any mappings anymore anyway for the dying guest,
there's little point in flushing respective changes from caches (or
at least that flushing could be limited to the ripping out of top
level structures, provided these get zapped before anything
hanging off of them).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-06 Thread Chen, Tiejun

On 2015/5/6 15:12, Jan Beulich wrote:

On 05.05.15 at 18:11,  wrote:

On 05/05/2015 11:58 AM, Jan Beulich wrote:

On 05.05.15 at 17:46,  wrote:

On 05/04/2015 05:14 AM, Andrew Cooper wrote:

On 04/05/2015 09:52, Jan Beulich wrote:

On 04.05.15 at 04:16,  wrote:

--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)

void cacheline_flush(char * addr)
{
+mb();
clflush(addr);
+mb();
}

I think the purpose of the flush is to force write back, not to evict
the cache line, and if so wmb() would appear to be sufficient. As
the SDM says that's not the case, a comment explaining why wmb()
is not sufficient would seem necessary. Plus in the description I
think "serializing" needs to be changed to "fencing", as serialization
is not what we really care about here. If you and the maintainers
agree, I could certainly fix up both aspects while committing.

On the subject of writebacks, we should get around to alternating-up the
use of clflushopt and clwb, either of which would be better than a
clflush in this case (avoiding the need for the leading mfence).

However, the ISA extension document does not indicate which processors
will have support for these new instructions.

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

We really should add support for this. On shutting down a very large
guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
This was due to serializing nature of CLFLUSH.

But flushing the IOMMU isn't being done via CPU instructions, but
rather via commands sent to the IOMMU. I.e. I'm somewhat
confused by your reply.


I didn't mean flushing IOMMU itself, sorry. I meant
__iommu_flush_cache() (or whatever it's equivalent we had in the
product, which was 4.1-based).


In that case I wonder how much of that flushing is really necessary


Sorry, what is that case?


during IOMMU teardown. VT-d maintainers?



In most cases __iommu_flush_cache() is used to flush any remapping 
structures into memory then IOMMU can get proper data.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-06 Thread Jan Beulich
>>> On 06.05.15 at 08:47,  wrote:
> On 2015/5/5 17:24, Jan Beulich wrote:
>> Btw, isn't the !iommus_incoherent check in that function inverted?
>> I.e. why would we _not_ need to flush caches when IOMMUs
>> are not coherent (and why would flushing be needed when they're
>> coherent anyway)?
>>
> 
> I guess you're misunderstanding this
> 
>  if ( !ecap_coherent(iommu->ecap) )
>  iommus_incoherent = 1;
> 
> So here !iommus_incoherent means IOMMU is coherent and then we don't 
> need to flush cache in this case.

Ah, indeed - I overlooked the "in" in the variable name (together with
the thus resulting double negation in the expression). Sorry for the
noise.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-06 Thread Jan Beulich
>>> On 05.05.15 at 18:11,  wrote:
> On 05/05/2015 11:58 AM, Jan Beulich wrote:
> On 05.05.15 at 17:46,  wrote:
>>> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
 On 04/05/2015 09:52, Jan Beulich wrote:
 On 04.05.15 at 04:16,  wrote:
>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>
>>void cacheline_flush(char * addr)
>>{
>> +mb();
>>clflush(addr);
>> +mb();
>>}
> I think the purpose of the flush is to force write back, not to evict
> the cache line, and if so wmb() would appear to be sufficient. As
> the SDM says that's not the case, a comment explaining why wmb()
> is not sufficient would seem necessary. Plus in the description I
> think "serializing" needs to be changed to "fencing", as serialization
> is not what we really care about here. If you and the maintainers
> agree, I could certainly fix up both aspects while committing.
 On the subject of writebacks, we should get around to alternating-up the
 use of clflushopt and clwb, either of which would be better than a
 clflush in this case (avoiding the need for the leading mfence).

 However, the ISA extension document does not indicate which processors
 will have support for these new instructions.
>>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf 
>>>
>>> We really should add support for this. On shutting down a very large
>>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
>>> This was due to serializing nature of CLFLUSH.
>> But flushing the IOMMU isn't being done via CPU instructions, but
>> rather via commands sent to the IOMMU. I.e. I'm somewhat
>> confused by your reply.
> 
> I didn't mean flushing IOMMU itself, sorry. I meant 
> __iommu_flush_cache() (or whatever it's equivalent we had in the 
> product, which was 4.1-based).

In that case I wonder how much of that flushing is really necessary
during IOMMU teardown. VT-d maintainers?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-06 Thread Chen, Tiejun

On 2015/5/5 23:46, Boris Ostrovsky wrote:

On 05/04/2015 05:14 AM, Andrew Cooper wrote:

On 04/05/2015 09:52, Jan Beulich wrote:

On 04.05.15 at 04:16,  wrote:

--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
  void cacheline_flush(char * addr)
  {
+mb();
  clflush(addr);
+mb();
  }

I think the purpose of the flush is to force write back, not to evict
the cache line, and if so wmb() would appear to be sufficient. As
the SDM says that's not the case, a comment explaining why wmb()
is not sufficient would seem necessary. Plus in the description I
think "serializing" needs to be changed to "fencing", as serialization
is not what we really care about here. If you and the maintainers
agree, I could certainly fix up both aspects while committing.

On the subject of writebacks, we should get around to alternating-up the
use of clflushopt and clwb, either of which would be better than a
clflush in this case (avoiding the need for the leading mfence).

However, the ISA extension document does not indicate which processors
will have support for these new instructions.


https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

We really should add support for this. On shutting down a very large
guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
This was due to serializing nature of CLFLUSH.



CLFLUSHOPT/CLWB instructions are ordered only by store-fencing 
operations, so at least we still need SFENCE.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-05 Thread Chen, Tiejun

On 2015/5/5 17:24, Jan Beulich wrote:

On 05.05.15 at 04:45,  wrote:

Does this work for everyone?


Please first of all explain why the interfaces in asm/flushtlb.h can't
be used here (at least when flushing entire pages). Because - as


I also don't understand any reason we didn't use this previously on 
IOMMU side...



said before - for a complete fix you'd need to deal with the CLFLUSH
use(s) elsewhere in the system too.


Looks we need to do this in xen/arch/x86/flushtlb.c:flush_area_local().



Btw, isn't the !iommus_incoherent check in that function inverted?
I.e. why would we _not_ need to flush caches when IOMMUs
are not coherent (and why would flushing be needed when they're
coherent anyway)?



I guess you're misunderstanding this

if ( !ecap_coherent(iommu->ecap) )
iommus_incoherent = 1;

So here !iommus_incoherent means IOMMU is coherent and then we don't 
need to flush cache in this case.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-05 Thread Boris Ostrovsky

On 05/05/2015 11:58 AM, Jan Beulich wrote:

On 05.05.15 at 17:46,  wrote:

On 05/04/2015 05:14 AM, Andrew Cooper wrote:

On 04/05/2015 09:52, Jan Beulich wrote:

On 04.05.15 at 04:16,  wrote:

--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
   
   void cacheline_flush(char * addr)

   {
+mb();
   clflush(addr);
+mb();
   }

I think the purpose of the flush is to force write back, not to evict
the cache line, and if so wmb() would appear to be sufficient. As
the SDM says that's not the case, a comment explaining why wmb()
is not sufficient would seem necessary. Plus in the description I
think "serializing" needs to be changed to "fencing", as serialization
is not what we really care about here. If you and the maintainers
agree, I could certainly fix up both aspects while committing.

On the subject of writebacks, we should get around to alternating-up the
use of clflushopt and clwb, either of which would be better than a
clflush in this case (avoiding the need for the leading mfence).

However, the ISA extension document does not indicate which processors
will have support for these new instructions.

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

We really should add support for this. On shutting down a very large
guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
This was due to serializing nature of CLFLUSH.

But flushing the IOMMU isn't being done via CPU instructions, but
rather via commands sent to the IOMMU. I.e. I'm somewhat
confused by your reply.


I didn't mean flushing IOMMU itself, sorry. I meant 
__iommu_flush_cache() (or whatever it's equivalent we had in the 
product, which was 4.1-based).


-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-05 Thread Jan Beulich
>>> On 05.05.15 at 17:46,  wrote:
> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>> On 04/05/2015 09:52, Jan Beulich wrote:
>> On 04.05.15 at 04:16,  wrote:
 --- a/xen/drivers/passthrough/vtd/x86/vtd.c
 +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
 @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
   
   void cacheline_flush(char * addr)
   {
 +mb();
   clflush(addr);
 +mb();
   }
>>> I think the purpose of the flush is to force write back, not to evict
>>> the cache line, and if so wmb() would appear to be sufficient. As
>>> the SDM says that's not the case, a comment explaining why wmb()
>>> is not sufficient would seem necessary. Plus in the description I
>>> think "serializing" needs to be changed to "fencing", as serialization
>>> is not what we really care about here. If you and the maintainers
>>> agree, I could certainly fix up both aspects while committing.
>> On the subject of writebacks, we should get around to alternating-up the
>> use of clflushopt and clwb, either of which would be better than a
>> clflush in this case (avoiding the need for the leading mfence).
>>
>> However, the ISA extension document does not indicate which processors
>> will have support for these new instructions.
> 
> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf 
> 
> We really should add support for this. On shutting down a very large 
> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. 
> This was due to serializing nature of CLFLUSH.

But flushing the IOMMU isn't being done via CPU instructions, but
rather via commands sent to the IOMMU. I.e. I'm somewhat
confused by your reply.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-05 Thread Boris Ostrovsky

On 05/04/2015 05:14 AM, Andrew Cooper wrote:

On 04/05/2015 09:52, Jan Beulich wrote:

On 04.05.15 at 04:16,  wrote:

--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
  
  void cacheline_flush(char * addr)

  {
+mb();
  clflush(addr);
+mb();
  }

I think the purpose of the flush is to force write back, not to evict
the cache line, and if so wmb() would appear to be sufficient. As
the SDM says that's not the case, a comment explaining why wmb()
is not sufficient would seem necessary. Plus in the description I
think "serializing" needs to be changed to "fencing", as serialization
is not what we really care about here. If you and the maintainers
agree, I could certainly fix up both aspects while committing.

On the subject of writebacks, we should get around to alternating-up the
use of clflushopt and clwb, either of which would be better than a
clflush in this case (avoiding the need for the leading mfence).

However, the ISA extension document does not indicate which processors
will have support for these new instructions.


https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

We really should add support for this. On shutting down a very large 
guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. 
This was due to serializing nature of CLFLUSH.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-05 Thread Jan Beulich
>>> On 05.05.15 at 04:45,  wrote:
> Does this work for everyone?

Please first of all explain why the interfaces in asm/flushtlb.h can't
be used here (at least when flushing entire pages). Because - as
said before - for a complete fix you'd need to deal with the CLFLUSH
use(s) elsewhere in the system too.

Btw, isn't the !iommus_incoherent check in that function inverted?
I.e. why would we _not_ need to flush caches when IOMMUs
are not coherent (and why would flushing be needed when they're
coherent anyway)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Chen, Tiejun

On 2015/5/4 18:43, Jan Beulich wrote:

On 04.05.15 at 12:39,  wrote:

On 2015/5/4 16:52, Jan Beulich wrote:

On 04.05.15 at 04:16,  wrote:

--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)

   void cacheline_flush(char * addr)
   {
+mb();
   clflush(addr);
+mb();
   }


I think the purpose of the flush is to force write back, not to evict
the cache line, and if so wmb() would appear to be sufficient. As
the SDM says that's not the case, a comment explaining why wmb()
is not sufficient would seem necessary. Plus in the description I


Seems wmb() is not sufficient here.

"CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
to be ordered by any other fencing, serializing or other CLFLUSH
instruction."


Right - that's what I said in the second sentence.



Thanks for all guys' comments.

Does this work for everyone?

xen/vt-d: need barriers to workaround CLFLUSH

clflush is a light weight but not fencing instruction, so
hence memory fence is necessary to make sure all load/store
visible before flush cache line.

Signed-off-by: Tiejun Chen 

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c

index c7bda73..1248a17 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -170,8 +170,15 @@ static void __iommu_flush_cache(void *addr, 
unsigned int size)

 if ( clflush_size == 0 )
 clflush_size = get_cache_line_size();

+/*
+ * CLFLUSH is only ordered by the MFENCE instruction.
+ * It is not guaranteed to be ordered by any other fencing,
+ * serializing or other CLFLUSH instruction.
+ */
+mb();
 for ( i = 0; i < size; i += clflush_size )
 cacheline_flush((char *)addr + i);
+mb();
 }

 void iommu_flush_cache_entry(void *addr, unsigned int size)

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Tian, Kevin
> From: Chen, Tiejun
> Sent: Monday, May 04, 2015 7:26 PM
> 
> On 2015/5/4 18:52, Tian, Kevin wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Monday, May 04, 2015 6:44 PM
> >>
> > On 04.05.15 at 12:39,  wrote:
> >>> On 2015/5/4 16:52, Jan Beulich wrote:
> >>> On 04.05.15 at 04:16,  wrote:
> > --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> > +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> > @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
> >
> >void cacheline_flush(char * addr)
> >{
> > +mb();
> >clflush(addr);
> > +mb();
> >}
> 
>  I think the purpose of the flush is to force write back, not to evict
>  the cache line, and if so wmb() would appear to be sufficient. As
>  the SDM says that's not the case, a comment explaining why wmb()
>  is not sufficient would seem necessary. Plus in the description I
> >>>
> >>> Seems wmb() is not sufficient here.
> >>>
> >>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
> >>> to be ordered by any other fencing, serializing or other CLFLUSH
> >>> instruction."
> >>
> >> Right - that's what I said in the second sentence.
> >>
> >
> > btw why do we need two fences here? Suppose we just care about
> > writes before the flush point...
> >
> 
> The first MFENCE guarantees all load/store visible before flush cache
> line. But the second MFENCE just makes sure CLFLUSH is not ordered by
> that ensuing load/store, right?
> 

It's not an usual case to have another store to same destination address
right after CLFLUSH. Usually there's some protocol to do another update
until device completes access. Store to different address is not a concern.
But yes if just looking at this interface alone, we'd better not make
assumption about the caller so two mb() look necessary...

btw as Jan commented in another thread, if there's no other caller of
this function, it might be more reasonable to expand it into the bigger
loop in __iommu_flush_cache so you don't need to fence for each
cache line flush.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-05-04:
 On 04.05.15 at 11:14,  wrote:
>> On 04/05/2015 09:52, Jan Beulich wrote:
>> On 04.05.15 at 04:16,  wrote:
 --- a/xen/drivers/passthrough/vtd/x86/vtd.c
 +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
 @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
 
  void cacheline_flush(char * addr)  {
 +mb();
  clflush(addr);
 +mb();
  }
>>> I think the purpose of the flush is to force write back, not to
>>> evict the cache line, and if so wmb() would appear to be
>>> sufficient. As the SDM says that's not the case, a comment
>>> explaining why wmb() is not sufficient would seem necessary. Plus
>>> in the description I think "serializing" needs to be changed to
>>> "fencing", as serialization is not what we really care about here.
>>> If you and the maintainers agree, I could certainly fix up both aspects 
>>> while committing.
>> 
>> On the subject of writebacks, we should get around to alternating-up
>> the use of clflushopt and clwb, either of which would be better than
>> a clflush in this case (avoiding the need for the leading mfence).
> 
> Plus the barrier would perhaps rather sit around the loop invoking
> cacheline_flush() in __iommu_flush_cache(), and I wonder whether VT-d

Agree. It's better to batch the flush operations, like:

@@ -167,11 +167,15 @@ static void __iommu_flush_cache(void *addr, unsigned int 
size)
 if ( !iommus_incoherent )
 return;
 
+mb();
+
 if ( clflush_size == 0 )
 clflush_size = get_cache_line_size();
 
 for ( i = 0; i < size; i += clflush_size )
 cacheline_flush((char *)addr + i);
+
+mb();
 }

> code shouldn't use available flushing code elsewhere in the system,
> and whether that code then wouldn't need barriers added (or use
> clflushopt/clwb as you
> suggest) instead.


> 
> Jan


Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Andrew Cooper
On 04/05/2015 16:22, Konrad Rzeszutek Wilk wrote:
> On Mon, May 04, 2015 at 06:39:56PM +0800, Chen, Tiejun wrote:
>> On 2015/5/4 16:52, Jan Beulich wrote:
>> On 04.05.15 at 04:16,  wrote:
 --- a/xen/drivers/passthrough/vtd/x86/vtd.c
 +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
 @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)

  void cacheline_flush(char * addr)
  {
 +mb();
  clflush(addr);
 +mb();
  }
>>> I think the purpose of the flush is to force write back, not to evict
>>> the cache line, and if so wmb() would appear to be sufficient. As
>>> the SDM says that's not the case, a comment explaining why wmb()
>>> is not sufficient would seem necessary. Plus in the description I
>> Seems wmb() is not sufficient here.
>>
>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to
>> be ordered by any other fencing, serializing or other CLFLUSH instruction."
> That is incorrect. We have observed that CLFLUSH instruction do serialize each
> other. That is if on a core you send a bunch of CLFLUSH it stalls the 
> pipeline.
>
> Cc-ing Boris who discovered this.

Simply stalling the pipeline says nothing about its ordering with
respect to other instructions.

It is plausible that certain processor pipelines employ stricter
restrictions on CLFLUSH, but Xen must not assume that this is the case
in general, especially as it is in direct contradiction to both the
Intel and AMD instruction manuals.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Konrad Rzeszutek Wilk
On Mon, May 04, 2015 at 06:39:56PM +0800, Chen, Tiejun wrote:
> On 2015/5/4 16:52, Jan Beulich wrote:
> On 04.05.15 at 04:16,  wrote:
> >>--- a/xen/drivers/passthrough/vtd/x86/vtd.c
> >>+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> >>@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
> >>
> >>  void cacheline_flush(char * addr)
> >>  {
> >>+mb();
> >>  clflush(addr);
> >>+mb();
> >>  }
> >
> >I think the purpose of the flush is to force write back, not to evict
> >the cache line, and if so wmb() would appear to be sufficient. As
> >the SDM says that's not the case, a comment explaining why wmb()
> >is not sufficient would seem necessary. Plus in the description I
> 
> Seems wmb() is not sufficient here.
> 
> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to
> be ordered by any other fencing, serializing or other CLFLUSH instruction."

That is incorrect. We have observed that CLFLUSH instruction do serialize each
other. That is if on a core you send a bunch of CLFLUSH it stalls the pipeline.

Cc-ing Boris who discovered this.
> 
> Thanks
> Tiejun
> 
> >think "serializing" needs to be changed to "fencing", as serialization
> >is not what we really care about here. If you and the maintainers
> >agree, I could certainly fix up both aspects while committing.
> >
> >Jan
> >
> >
> >
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Chen, Tiejun

On 2015/5/4 18:52, Tian, Kevin wrote:

From: Jan Beulich [mailto:jbeul...@suse.com]
Sent: Monday, May 04, 2015 6:44 PM


On 04.05.15 at 12:39,  wrote:

On 2015/5/4 16:52, Jan Beulich wrote:

On 04.05.15 at 04:16,  wrote:

--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)

   void cacheline_flush(char * addr)
   {
+mb();
   clflush(addr);
+mb();
   }


I think the purpose of the flush is to force write back, not to evict
the cache line, and if so wmb() would appear to be sufficient. As
the SDM says that's not the case, a comment explaining why wmb()
is not sufficient would seem necessary. Plus in the description I


Seems wmb() is not sufficient here.

"CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
to be ordered by any other fencing, serializing or other CLFLUSH
instruction."


Right - that's what I said in the second sentence.



btw why do we need two fences here? Suppose we just care about
writes before the flush point...



The first MFENCE guarantees all load/store visible before flush cache 
line. But the second MFENCE just makes sure CLFLUSH is not ordered by 
that ensuing load/store, right?


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, May 04, 2015 6:44 PM
> 
> >>> On 04.05.15 at 12:39,  wrote:
> > On 2015/5/4 16:52, Jan Beulich wrote:
> > On 04.05.15 at 04:16,  wrote:
> >>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> >>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> >>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
> >>>
> >>>   void cacheline_flush(char * addr)
> >>>   {
> >>> +mb();
> >>>   clflush(addr);
> >>> +mb();
> >>>   }
> >>
> >> I think the purpose of the flush is to force write back, not to evict
> >> the cache line, and if so wmb() would appear to be sufficient. As
> >> the SDM says that's not the case, a comment explaining why wmb()
> >> is not sufficient would seem necessary. Plus in the description I
> >
> > Seems wmb() is not sufficient here.
> >
> > "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
> > to be ordered by any other fencing, serializing or other CLFLUSH
> > instruction."
> 
> Right - that's what I said in the second sentence.
> 

btw why do we need two fences here? Suppose we just care about
writes before the flush point...

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Jan Beulich
>>> On 04.05.15 at 12:39,  wrote:
> On 2015/5/4 16:52, Jan Beulich wrote:
> On 04.05.15 at 04:16,  wrote:
>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>
>>>   void cacheline_flush(char * addr)
>>>   {
>>> +mb();
>>>   clflush(addr);
>>> +mb();
>>>   }
>>
>> I think the purpose of the flush is to force write back, not to evict
>> the cache line, and if so wmb() would appear to be sufficient. As
>> the SDM says that's not the case, a comment explaining why wmb()
>> is not sufficient would seem necessary. Plus in the description I
> 
> Seems wmb() is not sufficient here.
> 
> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed 
> to be ordered by any other fencing, serializing or other CLFLUSH 
> instruction."

Right - that's what I said in the second sentence.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Chen, Tiejun

On 2015/5/4 16:52, Jan Beulich wrote:

On 04.05.15 at 04:16,  wrote:

--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)

  void cacheline_flush(char * addr)
  {
+mb();
  clflush(addr);
+mb();
  }


I think the purpose of the flush is to force write back, not to evict
the cache line, and if so wmb() would appear to be sufficient. As
the SDM says that's not the case, a comment explaining why wmb()
is not sufficient would seem necessary. Plus in the description I


Seems wmb() is not sufficient here.

"CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed 
to be ordered by any other fencing, serializing or other CLFLUSH 
instruction."


Thanks
Tiejun


think "serializing" needs to be changed to "fencing", as serialization
is not what we really care about here. If you and the maintainers
agree, I could certainly fix up both aspects while committing.

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Jan Beulich
>>> On 04.05.15 at 11:14,  wrote:
> On 04/05/2015 09:52, Jan Beulich wrote:
> On 04.05.15 at 04:16,  wrote:
>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>  
>>>  void cacheline_flush(char * addr)
>>>  {
>>> +mb();
>>>  clflush(addr);
>>> +mb();
>>>  }
>> I think the purpose of the flush is to force write back, not to evict
>> the cache line, and if so wmb() would appear to be sufficient. As
>> the SDM says that's not the case, a comment explaining why wmb()
>> is not sufficient would seem necessary. Plus in the description I
>> think "serializing" needs to be changed to "fencing", as serialization
>> is not what we really care about here. If you and the maintainers
>> agree, I could certainly fix up both aspects while committing.
> 
> On the subject of writebacks, we should get around to alternating-up the
> use of clflushopt and clwb, either of which would be better than a
> clflush in this case (avoiding the need for the leading mfence).

Plus the barrier would perhaps rather sit around the loop
invoking cacheline_flush() in __iommu_flush_cache(), and I
wonder whether VT-d code shouldn't use available flushing
code elsewhere in the system, and whether that code then
wouldn't need barriers added (or use clflushopt/clwb as you
suggest) instead.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Andrew Cooper
On 04/05/2015 09:52, Jan Beulich wrote:
 On 04.05.15 at 04:16,  wrote:
>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>  
>>  void cacheline_flush(char * addr)
>>  {
>> +mb();
>>  clflush(addr);
>> +mb();
>>  }
> I think the purpose of the flush is to force write back, not to evict
> the cache line, and if so wmb() would appear to be sufficient. As
> the SDM says that's not the case, a comment explaining why wmb()
> is not sufficient would seem necessary. Plus in the description I
> think "serializing" needs to be changed to "fencing", as serialization
> is not what we really care about here. If you and the maintainers
> agree, I could certainly fix up both aspects while committing.

On the subject of writebacks, we should get around to alternating-up the
use of clflushopt and clwb, either of which would be better than a
clflush in this case (avoiding the need for the leading mfence).

However, the ISA extension document does not indicate which processors
will have support for these new instructions.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Jan Beulich
>>> On 04.05.15 at 04:16,  wrote:
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>  
>  void cacheline_flush(char * addr)
>  {
> +mb();
>  clflush(addr);
> +mb();
>  }

I think the purpose of the flush is to force write back, not to evict
the cache line, and if so wmb() would appear to be sufficient. As
the SDM says that's not the case, a comment explaining why wmb()
is not sufficient would seem necessary. Plus in the description I
think "serializing" needs to be changed to "fencing", as serialization
is not what we really care about here. If you and the maintainers
agree, I could certainly fix up both aspects while committing.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-03 Thread Zhang, Yang Z
Chen, Tiejun wrote on 2015-05-04:
> clflush is a light weight but not serializing instruction, so hence
> memory fence is necessary to make sure all load/store visible before flush 
> cache line.
> 
> Signed-off-by: Tiejun Chen 

Acked-by: Yang Zhang 

> ---
>  xen/drivers/passthrough/vtd/x86/vtd.c | 2 ++
>  1 file changed, 2 insertions(+)
> diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c
> b/xen/drivers/passthrough/vtd/x86/vtd.c index 109234e..fd2ff04 100644
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++
> b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int
> get_cache_line_size(void)
> 
>  void cacheline_flush(char * addr)
>  {
> +mb();
>  clflush(addr);
> +mb();
>  }
>  
>  void flush_all_cache()


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel