Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
On 05/06/2015 03:12 AM, Jan Beulich wrote: On 05.05.15 at 18:11, wrote: On 05/05/2015 11:58 AM, Jan Beulich wrote: On 05.05.15 at 17:46, wrote: On 05/04/2015 05:14 AM, Andrew Cooper wrote: On 04/05/2015 09:52, Jan Beulich wrote: On 04.05.15 at 04:16, wrote: --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { +mb(); clflush(addr); +mb(); } I think the purpose of the flush is to force write back, not to evict the cache line, and if so wmb() would appear to be sufficient. As the SDM says that's not the case, a comment explaining why wmb() is not sufficient would seem necessary. Plus in the description I think "serializing" needs to be changed to "fencing", as serialization is not what we really care about here. If you and the maintainers agree, I could certainly fix up both aspects while committing. On the subject of writebacks, we should get around to alternating-up the use of clflushopt and clwb, either of which would be better than a clflush in this case (avoiding the need for the leading mfence). However, the ISA extension document does not indicate which processors will have support for these new instructions. https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf We really should add support for this. On shutting down a very large guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. This was due to serializing nature of CLFLUSH. But flushing the IOMMU isn't being done via CPU instructions, but rather via commands sent to the IOMMU. I.e. I'm somewhat confused by your reply. I didn't mean flushing IOMMU itself, sorry. I meant __iommu_flush_cache() (or whatever it's equivalent we had in the product, which was 4.1-based). In that case I wonder how much of that flushing is really necessary during IOMMU teardown. VT-d maintainers? I should mention that we saw this on 4.1, where iommu teardown is done in the domain destruction code path. After we backported code that moves this into a tasklet things got much better. Of course we still are doing flushing, even if in the background, and therefore system resources are still being consumed (and memory is not available until flushing is done). So if this is unnecessary then there are good reasons not to do it. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
>>> On 06.05.15 at 09:26, wrote: > On 2015/5/6 15:12, Jan Beulich wrote: > On 05.05.15 at 18:11, wrote: >>> On 05/05/2015 11:58 AM, Jan Beulich wrote: >>> On 05.05.15 at 17:46, wrote: > On 05/04/2015 05:14 AM, Andrew Cooper wrote: >> On 04/05/2015 09:52, Jan Beulich wrote: >> On 04.05.15 at 04:16, wrote: --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { +mb(); clflush(addr); +mb(); } >>> I think the purpose of the flush is to force write back, not to evict >>> the cache line, and if so wmb() would appear to be sufficient. As >>> the SDM says that's not the case, a comment explaining why wmb() >>> is not sufficient would seem necessary. Plus in the description I >>> think "serializing" needs to be changed to "fencing", as serialization >>> is not what we really care about here. If you and the maintainers >>> agree, I could certainly fix up both aspects while committing. >> On the subject of writebacks, we should get around to alternating-up the >> use of clflushopt and clwb, either of which would be better than a >> clflush in this case (avoiding the need for the leading mfence). >> >> However, the ISA extension document does not indicate which processors >> will have support for these new instructions. > https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf > > > We really should add support for this. On shutting down a very large > guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. > This was due to serializing nature of CLFLUSH. But flushing the IOMMU isn't being done via CPU instructions, but rather via commands sent to the IOMMU. I.e. I'm somewhat confused by your reply. >>> >>> I didn't mean flushing IOMMU itself, sorry. I meant >>> __iommu_flush_cache() (or whatever it's equivalent we had in the >>> product, which was 4.1-based). >> >> In that case I wonder how much of that flushing is really necessary > > Sorry, what is that case? The flushing CPU of side caches during IOMMU teardown for a guest. >> during IOMMU teardown. VT-d maintainers? >> > > In most cases __iommu_flush_cache() is used to flush any remapping > structures into memory then IOMMU can get proper data. Right, but here we're talking about teardown. Since the IOMMU isn't to use any mappings anymore anyway for the dying guest, there's little point in flushing respective changes from caches (or at least that flushing could be limited to the ripping out of top level structures, provided these get zapped before anything hanging off of them). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
On 2015/5/6 15:12, Jan Beulich wrote: On 05.05.15 at 18:11, wrote: On 05/05/2015 11:58 AM, Jan Beulich wrote: On 05.05.15 at 17:46, wrote: On 05/04/2015 05:14 AM, Andrew Cooper wrote: On 04/05/2015 09:52, Jan Beulich wrote: On 04.05.15 at 04:16, wrote: --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { +mb(); clflush(addr); +mb(); } I think the purpose of the flush is to force write back, not to evict the cache line, and if so wmb() would appear to be sufficient. As the SDM says that's not the case, a comment explaining why wmb() is not sufficient would seem necessary. Plus in the description I think "serializing" needs to be changed to "fencing", as serialization is not what we really care about here. If you and the maintainers agree, I could certainly fix up both aspects while committing. On the subject of writebacks, we should get around to alternating-up the use of clflushopt and clwb, either of which would be better than a clflush in this case (avoiding the need for the leading mfence). However, the ISA extension document does not indicate which processors will have support for these new instructions. https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf We really should add support for this. On shutting down a very large guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. This was due to serializing nature of CLFLUSH. But flushing the IOMMU isn't being done via CPU instructions, but rather via commands sent to the IOMMU. I.e. I'm somewhat confused by your reply. I didn't mean flushing IOMMU itself, sorry. I meant __iommu_flush_cache() (or whatever it's equivalent we had in the product, which was 4.1-based). In that case I wonder how much of that flushing is really necessary Sorry, what is that case? during IOMMU teardown. VT-d maintainers? In most cases __iommu_flush_cache() is used to flush any remapping structures into memory then IOMMU can get proper data. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
>>> On 06.05.15 at 08:47, wrote: > On 2015/5/5 17:24, Jan Beulich wrote: >> Btw, isn't the !iommus_incoherent check in that function inverted? >> I.e. why would we _not_ need to flush caches when IOMMUs >> are not coherent (and why would flushing be needed when they're >> coherent anyway)? >> > > I guess you're misunderstanding this > > if ( !ecap_coherent(iommu->ecap) ) > iommus_incoherent = 1; > > So here !iommus_incoherent means IOMMU is coherent and then we don't > need to flush cache in this case. Ah, indeed - I overlooked the "in" in the variable name (together with the thus resulting double negation in the expression). Sorry for the noise. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
>>> On 05.05.15 at 18:11, wrote: > On 05/05/2015 11:58 AM, Jan Beulich wrote: > On 05.05.15 at 17:46, wrote: >>> On 05/04/2015 05:14 AM, Andrew Cooper wrote: On 04/05/2015 09:52, Jan Beulich wrote: On 04.05.15 at 04:16, wrote: >> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >> >>void cacheline_flush(char * addr) >>{ >> +mb(); >>clflush(addr); >> +mb(); >>} > I think the purpose of the flush is to force write back, not to evict > the cache line, and if so wmb() would appear to be sufficient. As > the SDM says that's not the case, a comment explaining why wmb() > is not sufficient would seem necessary. Plus in the description I > think "serializing" needs to be changed to "fencing", as serialization > is not what we really care about here. If you and the maintainers > agree, I could certainly fix up both aspects while committing. On the subject of writebacks, we should get around to alternating-up the use of clflushopt and clwb, either of which would be better than a clflush in this case (avoiding the need for the leading mfence). However, the ISA extension document does not indicate which processors will have support for these new instructions. >>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf >>> >>> We really should add support for this. On shutting down a very large >>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. >>> This was due to serializing nature of CLFLUSH. >> But flushing the IOMMU isn't being done via CPU instructions, but >> rather via commands sent to the IOMMU. I.e. I'm somewhat >> confused by your reply. > > I didn't mean flushing IOMMU itself, sorry. I meant > __iommu_flush_cache() (or whatever it's equivalent we had in the > product, which was 4.1-based). In that case I wonder how much of that flushing is really necessary during IOMMU teardown. VT-d maintainers? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
On 2015/5/5 23:46, Boris Ostrovsky wrote: On 05/04/2015 05:14 AM, Andrew Cooper wrote: On 04/05/2015 09:52, Jan Beulich wrote: On 04.05.15 at 04:16, wrote: --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { +mb(); clflush(addr); +mb(); } I think the purpose of the flush is to force write back, not to evict the cache line, and if so wmb() would appear to be sufficient. As the SDM says that's not the case, a comment explaining why wmb() is not sufficient would seem necessary. Plus in the description I think "serializing" needs to be changed to "fencing", as serialization is not what we really care about here. If you and the maintainers agree, I could certainly fix up both aspects while committing. On the subject of writebacks, we should get around to alternating-up the use of clflushopt and clwb, either of which would be better than a clflush in this case (avoiding the need for the leading mfence). However, the ISA extension document does not indicate which processors will have support for these new instructions. https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf We really should add support for this. On shutting down a very large guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. This was due to serializing nature of CLFLUSH. CLFLUSHOPT/CLWB instructions are ordered only by store-fencing operations, so at least we still need SFENCE. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
On 2015/5/5 17:24, Jan Beulich wrote: On 05.05.15 at 04:45, wrote: Does this work for everyone? Please first of all explain why the interfaces in asm/flushtlb.h can't be used here (at least when flushing entire pages). Because - as I also don't understand any reason we didn't use this previously on IOMMU side... said before - for a complete fix you'd need to deal with the CLFLUSH use(s) elsewhere in the system too. Looks we need to do this in xen/arch/x86/flushtlb.c:flush_area_local(). Btw, isn't the !iommus_incoherent check in that function inverted? I.e. why would we _not_ need to flush caches when IOMMUs are not coherent (and why would flushing be needed when they're coherent anyway)? I guess you're misunderstanding this if ( !ecap_coherent(iommu->ecap) ) iommus_incoherent = 1; So here !iommus_incoherent means IOMMU is coherent and then we don't need to flush cache in this case. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
On 05/05/2015 11:58 AM, Jan Beulich wrote: On 05.05.15 at 17:46, wrote: On 05/04/2015 05:14 AM, Andrew Cooper wrote: On 04/05/2015 09:52, Jan Beulich wrote: On 04.05.15 at 04:16, wrote: --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { +mb(); clflush(addr); +mb(); } I think the purpose of the flush is to force write back, not to evict the cache line, and if so wmb() would appear to be sufficient. As the SDM says that's not the case, a comment explaining why wmb() is not sufficient would seem necessary. Plus in the description I think "serializing" needs to be changed to "fencing", as serialization is not what we really care about here. If you and the maintainers agree, I could certainly fix up both aspects while committing. On the subject of writebacks, we should get around to alternating-up the use of clflushopt and clwb, either of which would be better than a clflush in this case (avoiding the need for the leading mfence). However, the ISA extension document does not indicate which processors will have support for these new instructions. https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf We really should add support for this. On shutting down a very large guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. This was due to serializing nature of CLFLUSH. But flushing the IOMMU isn't being done via CPU instructions, but rather via commands sent to the IOMMU. I.e. I'm somewhat confused by your reply. I didn't mean flushing IOMMU itself, sorry. I meant __iommu_flush_cache() (or whatever it's equivalent we had in the product, which was 4.1-based). -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
>>> On 05.05.15 at 17:46, wrote: > On 05/04/2015 05:14 AM, Andrew Cooper wrote: >> On 04/05/2015 09:52, Jan Beulich wrote: >> On 04.05.15 at 04:16, wrote: --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { +mb(); clflush(addr); +mb(); } >>> I think the purpose of the flush is to force write back, not to evict >>> the cache line, and if so wmb() would appear to be sufficient. As >>> the SDM says that's not the case, a comment explaining why wmb() >>> is not sufficient would seem necessary. Plus in the description I >>> think "serializing" needs to be changed to "fencing", as serialization >>> is not what we really care about here. If you and the maintainers >>> agree, I could certainly fix up both aspects while committing. >> On the subject of writebacks, we should get around to alternating-up the >> use of clflushopt and clwb, either of which would be better than a >> clflush in this case (avoiding the need for the leading mfence). >> >> However, the ISA extension document does not indicate which processors >> will have support for these new instructions. > > https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf > > We really should add support for this. On shutting down a very large > guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. > This was due to serializing nature of CLFLUSH. But flushing the IOMMU isn't being done via CPU instructions, but rather via commands sent to the IOMMU. I.e. I'm somewhat confused by your reply. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
On 05/04/2015 05:14 AM, Andrew Cooper wrote: On 04/05/2015 09:52, Jan Beulich wrote: On 04.05.15 at 04:16, wrote: --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { +mb(); clflush(addr); +mb(); } I think the purpose of the flush is to force write back, not to evict the cache line, and if so wmb() would appear to be sufficient. As the SDM says that's not the case, a comment explaining why wmb() is not sufficient would seem necessary. Plus in the description I think "serializing" needs to be changed to "fencing", as serialization is not what we really care about here. If you and the maintainers agree, I could certainly fix up both aspects while committing. On the subject of writebacks, we should get around to alternating-up the use of clflushopt and clwb, either of which would be better than a clflush in this case (avoiding the need for the leading mfence). However, the ISA extension document does not indicate which processors will have support for these new instructions. https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf We really should add support for this. On shutting down a very large guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. This was due to serializing nature of CLFLUSH. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
>>> On 05.05.15 at 04:45, wrote: > Does this work for everyone? Please first of all explain why the interfaces in asm/flushtlb.h can't be used here (at least when flushing entire pages). Because - as said before - for a complete fix you'd need to deal with the CLFLUSH use(s) elsewhere in the system too. Btw, isn't the !iommus_incoherent check in that function inverted? I.e. why would we _not_ need to flush caches when IOMMUs are not coherent (and why would flushing be needed when they're coherent anyway)? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
On 2015/5/4 18:43, Jan Beulich wrote: On 04.05.15 at 12:39, wrote: On 2015/5/4 16:52, Jan Beulich wrote: On 04.05.15 at 04:16, wrote: --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { +mb(); clflush(addr); +mb(); } I think the purpose of the flush is to force write back, not to evict the cache line, and if so wmb() would appear to be sufficient. As the SDM says that's not the case, a comment explaining why wmb() is not sufficient would seem necessary. Plus in the description I Seems wmb() is not sufficient here. "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to be ordered by any other fencing, serializing or other CLFLUSH instruction." Right - that's what I said in the second sentence. Thanks for all guys' comments. Does this work for everyone? xen/vt-d: need barriers to workaround CLFLUSH clflush is a light weight but not fencing instruction, so hence memory fence is necessary to make sure all load/store visible before flush cache line. Signed-off-by: Tiejun Chen diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..1248a17 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -170,8 +170,15 @@ static void __iommu_flush_cache(void *addr, unsigned int size) if ( clflush_size == 0 ) clflush_size = get_cache_line_size(); +/* + * CLFLUSH is only ordered by the MFENCE instruction. + * It is not guaranteed to be ordered by any other fencing, + * serializing or other CLFLUSH instruction. + */ +mb(); for ( i = 0; i < size; i += clflush_size ) cacheline_flush((char *)addr + i); +mb(); } void iommu_flush_cache_entry(void *addr, unsigned int size) Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
> From: Chen, Tiejun > Sent: Monday, May 04, 2015 7:26 PM > > On 2015/5/4 18:52, Tian, Kevin wrote: > >> From: Jan Beulich [mailto:jbeul...@suse.com] > >> Sent: Monday, May 04, 2015 6:44 PM > >> > > On 04.05.15 at 12:39, wrote: > >>> On 2015/5/4 16:52, Jan Beulich wrote: > >>> On 04.05.15 at 04:16, wrote: > > --- a/xen/drivers/passthrough/vtd/x86/vtd.c > > +++ b/xen/drivers/passthrough/vtd/x86/vtd.c > > @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) > > > >void cacheline_flush(char * addr) > >{ > > +mb(); > >clflush(addr); > > +mb(); > >} > > I think the purpose of the flush is to force write back, not to evict > the cache line, and if so wmb() would appear to be sufficient. As > the SDM says that's not the case, a comment explaining why wmb() > is not sufficient would seem necessary. Plus in the description I > >>> > >>> Seems wmb() is not sufficient here. > >>> > >>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed > >>> to be ordered by any other fencing, serializing or other CLFLUSH > >>> instruction." > >> > >> Right - that's what I said in the second sentence. > >> > > > > btw why do we need two fences here? Suppose we just care about > > writes before the flush point... > > > > The first MFENCE guarantees all load/store visible before flush cache > line. But the second MFENCE just makes sure CLFLUSH is not ordered by > that ensuing load/store, right? > It's not an usual case to have another store to same destination address right after CLFLUSH. Usually there's some protocol to do another update until device completes access. Store to different address is not a concern. But yes if just looking at this interface alone, we'd better not make assumption about the caller so two mb() look necessary... btw as Jan commented in another thread, if there's no other caller of this function, it might be more reasonable to expand it into the bigger loop in __iommu_flush_cache so you don't need to fence for each cache line flush. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
Jan Beulich wrote on 2015-05-04: On 04.05.15 at 11:14, wrote: >> On 04/05/2015 09:52, Jan Beulich wrote: >> On 04.05.15 at 04:16, wrote: --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { +mb(); clflush(addr); +mb(); } >>> I think the purpose of the flush is to force write back, not to >>> evict the cache line, and if so wmb() would appear to be >>> sufficient. As the SDM says that's not the case, a comment >>> explaining why wmb() is not sufficient would seem necessary. Plus >>> in the description I think "serializing" needs to be changed to >>> "fencing", as serialization is not what we really care about here. >>> If you and the maintainers agree, I could certainly fix up both aspects >>> while committing. >> >> On the subject of writebacks, we should get around to alternating-up >> the use of clflushopt and clwb, either of which would be better than >> a clflush in this case (avoiding the need for the leading mfence). > > Plus the barrier would perhaps rather sit around the loop invoking > cacheline_flush() in __iommu_flush_cache(), and I wonder whether VT-d Agree. It's better to batch the flush operations, like: @@ -167,11 +167,15 @@ static void __iommu_flush_cache(void *addr, unsigned int size) if ( !iommus_incoherent ) return; +mb(); + if ( clflush_size == 0 ) clflush_size = get_cache_line_size(); for ( i = 0; i < size; i += clflush_size ) cacheline_flush((char *)addr + i); + +mb(); } > code shouldn't use available flushing code elsewhere in the system, > and whether that code then wouldn't need barriers added (or use > clflushopt/clwb as you > suggest) instead. > > Jan Best regards, Yang ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
On 04/05/2015 16:22, Konrad Rzeszutek Wilk wrote: > On Mon, May 04, 2015 at 06:39:56PM +0800, Chen, Tiejun wrote: >> On 2015/5/4 16:52, Jan Beulich wrote: >> On 04.05.15 at 04:16, wrote: --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { +mb(); clflush(addr); +mb(); } >>> I think the purpose of the flush is to force write back, not to evict >>> the cache line, and if so wmb() would appear to be sufficient. As >>> the SDM says that's not the case, a comment explaining why wmb() >>> is not sufficient would seem necessary. Plus in the description I >> Seems wmb() is not sufficient here. >> >> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to >> be ordered by any other fencing, serializing or other CLFLUSH instruction." > That is incorrect. We have observed that CLFLUSH instruction do serialize each > other. That is if on a core you send a bunch of CLFLUSH it stalls the > pipeline. > > Cc-ing Boris who discovered this. Simply stalling the pipeline says nothing about its ordering with respect to other instructions. It is plausible that certain processor pipelines employ stricter restrictions on CLFLUSH, but Xen must not assume that this is the case in general, especially as it is in direct contradiction to both the Intel and AMD instruction manuals. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
On Mon, May 04, 2015 at 06:39:56PM +0800, Chen, Tiejun wrote: > On 2015/5/4 16:52, Jan Beulich wrote: > On 04.05.15 at 04:16, wrote: > >>--- a/xen/drivers/passthrough/vtd/x86/vtd.c > >>+++ b/xen/drivers/passthrough/vtd/x86/vtd.c > >>@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) > >> > >> void cacheline_flush(char * addr) > >> { > >>+mb(); > >> clflush(addr); > >>+mb(); > >> } > > > >I think the purpose of the flush is to force write back, not to evict > >the cache line, and if so wmb() would appear to be sufficient. As > >the SDM says that's not the case, a comment explaining why wmb() > >is not sufficient would seem necessary. Plus in the description I > > Seems wmb() is not sufficient here. > > "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to > be ordered by any other fencing, serializing or other CLFLUSH instruction." That is incorrect. We have observed that CLFLUSH instruction do serialize each other. That is if on a core you send a bunch of CLFLUSH it stalls the pipeline. Cc-ing Boris who discovered this. > > Thanks > Tiejun > > >think "serializing" needs to be changed to "fencing", as serialization > >is not what we really care about here. If you and the maintainers > >agree, I could certainly fix up both aspects while committing. > > > >Jan > > > > > > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
On 2015/5/4 18:52, Tian, Kevin wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Monday, May 04, 2015 6:44 PM On 04.05.15 at 12:39, wrote: On 2015/5/4 16:52, Jan Beulich wrote: On 04.05.15 at 04:16, wrote: --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { +mb(); clflush(addr); +mb(); } I think the purpose of the flush is to force write back, not to evict the cache line, and if so wmb() would appear to be sufficient. As the SDM says that's not the case, a comment explaining why wmb() is not sufficient would seem necessary. Plus in the description I Seems wmb() is not sufficient here. "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to be ordered by any other fencing, serializing or other CLFLUSH instruction." Right - that's what I said in the second sentence. btw why do we need two fences here? Suppose we just care about writes before the flush point... The first MFENCE guarantees all load/store visible before flush cache line. But the second MFENCE just makes sure CLFLUSH is not ordered by that ensuing load/store, right? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
> From: Jan Beulich [mailto:jbeul...@suse.com] > Sent: Monday, May 04, 2015 6:44 PM > > >>> On 04.05.15 at 12:39, wrote: > > On 2015/5/4 16:52, Jan Beulich wrote: > > On 04.05.15 at 04:16, wrote: > >>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c > >>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c > >>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) > >>> > >>> void cacheline_flush(char * addr) > >>> { > >>> +mb(); > >>> clflush(addr); > >>> +mb(); > >>> } > >> > >> I think the purpose of the flush is to force write back, not to evict > >> the cache line, and if so wmb() would appear to be sufficient. As > >> the SDM says that's not the case, a comment explaining why wmb() > >> is not sufficient would seem necessary. Plus in the description I > > > > Seems wmb() is not sufficient here. > > > > "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed > > to be ordered by any other fencing, serializing or other CLFLUSH > > instruction." > > Right - that's what I said in the second sentence. > btw why do we need two fences here? Suppose we just care about writes before the flush point... Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
>>> On 04.05.15 at 12:39, wrote: > On 2015/5/4 16:52, Jan Beulich wrote: > On 04.05.15 at 04:16, wrote: >>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>> >>> void cacheline_flush(char * addr) >>> { >>> +mb(); >>> clflush(addr); >>> +mb(); >>> } >> >> I think the purpose of the flush is to force write back, not to evict >> the cache line, and if so wmb() would appear to be sufficient. As >> the SDM says that's not the case, a comment explaining why wmb() >> is not sufficient would seem necessary. Plus in the description I > > Seems wmb() is not sufficient here. > > "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed > to be ordered by any other fencing, serializing or other CLFLUSH > instruction." Right - that's what I said in the second sentence. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
On 2015/5/4 16:52, Jan Beulich wrote: On 04.05.15 at 04:16, wrote: --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { +mb(); clflush(addr); +mb(); } I think the purpose of the flush is to force write back, not to evict the cache line, and if so wmb() would appear to be sufficient. As the SDM says that's not the case, a comment explaining why wmb() is not sufficient would seem necessary. Plus in the description I Seems wmb() is not sufficient here. "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to be ordered by any other fencing, serializing or other CLFLUSH instruction." Thanks Tiejun think "serializing" needs to be changed to "fencing", as serialization is not what we really care about here. If you and the maintainers agree, I could certainly fix up both aspects while committing. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
>>> On 04.05.15 at 11:14, wrote: > On 04/05/2015 09:52, Jan Beulich wrote: > On 04.05.15 at 04:16, wrote: >>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>> >>> void cacheline_flush(char * addr) >>> { >>> +mb(); >>> clflush(addr); >>> +mb(); >>> } >> I think the purpose of the flush is to force write back, not to evict >> the cache line, and if so wmb() would appear to be sufficient. As >> the SDM says that's not the case, a comment explaining why wmb() >> is not sufficient would seem necessary. Plus in the description I >> think "serializing" needs to be changed to "fencing", as serialization >> is not what we really care about here. If you and the maintainers >> agree, I could certainly fix up both aspects while committing. > > On the subject of writebacks, we should get around to alternating-up the > use of clflushopt and clwb, either of which would be better than a > clflush in this case (avoiding the need for the leading mfence). Plus the barrier would perhaps rather sit around the loop invoking cacheline_flush() in __iommu_flush_cache(), and I wonder whether VT-d code shouldn't use available flushing code elsewhere in the system, and whether that code then wouldn't need barriers added (or use clflushopt/clwb as you suggest) instead. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
On 04/05/2015 09:52, Jan Beulich wrote: On 04.05.15 at 04:16, wrote: >> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >> >> void cacheline_flush(char * addr) >> { >> +mb(); >> clflush(addr); >> +mb(); >> } > I think the purpose of the flush is to force write back, not to evict > the cache line, and if so wmb() would appear to be sufficient. As > the SDM says that's not the case, a comment explaining why wmb() > is not sufficient would seem necessary. Plus in the description I > think "serializing" needs to be changed to "fencing", as serialization > is not what we really care about here. If you and the maintainers > agree, I could certainly fix up both aspects while committing. On the subject of writebacks, we should get around to alternating-up the use of clflushopt and clwb, either of which would be better than a clflush in this case (avoiding the need for the leading mfence). However, the ISA extension document does not indicate which processors will have support for these new instructions. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
>>> On 04.05.15 at 04:16, wrote: > --- a/xen/drivers/passthrough/vtd/x86/vtd.c > +++ b/xen/drivers/passthrough/vtd/x86/vtd.c > @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) > > void cacheline_flush(char * addr) > { > +mb(); > clflush(addr); > +mb(); > } I think the purpose of the flush is to force write back, not to evict the cache line, and if so wmb() would appear to be sufficient. As the SDM says that's not the case, a comment explaining why wmb() is not sufficient would seem necessary. Plus in the description I think "serializing" needs to be changed to "fencing", as serialization is not what we really care about here. If you and the maintainers agree, I could certainly fix up both aspects while committing. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
Chen, Tiejun wrote on 2015-05-04: > clflush is a light weight but not serializing instruction, so hence > memory fence is necessary to make sure all load/store visible before flush > cache line. > > Signed-off-by: Tiejun Chen Acked-by: Yang Zhang > --- > xen/drivers/passthrough/vtd/x86/vtd.c | 2 ++ > 1 file changed, 2 insertions(+) > diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c > b/xen/drivers/passthrough/vtd/x86/vtd.c index 109234e..fd2ff04 100644 > --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ > b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int > get_cache_line_size(void) > > void cacheline_flush(char * addr) > { > +mb(); > clflush(addr); > +mb(); > } > > void flush_all_cache() Best regards, Yang ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel