[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-13 Thread Chris Wilson
On Tue, Jan 12, 2016 at 06:06:34PM -0800, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 4:55 PM, Chris Wilson > wrote: > > > > The double clflush() remains a mystery. > > Actually, I think it's explainable. > > It's wrong to do the clflush *after* the GPU has done the write, which > seems to

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-13 Thread Linus Torvalds
On Wed, Jan 13, 2016 at 4:34 AM, Chris Wilson wrote: > > Forgive me for being dense, but if we overwrite the GPU data in the > backing struct page with the cacheline from the CPU, how do we see the > results from the GPU afterwards? Hmm. Good point. Ok, all the symptoms just say "writes from GP

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-13 Thread Chris Wilson
On Tue, Jan 12, 2016 at 02:07:35PM -0800, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 1:13 PM, Chris Wilson > wrote: > > Indeed. So I replaced the post-clflush_cache_range() clflush() with a > > udelay(10) instead, and the corruption vanished. Putting the udelay(10) > > before the clflush_cac

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-12 Thread Chris Wilson
On Tue, Jan 12, 2016 at 09:05:19AM -0800, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 8:37 AM, Chris Wilson > wrote: > > On Mon, Jan 11, 2016 at 09:05:06PM +, Chris Wilson wrote: > >> I can narrow down the principal buggy path by doing the clflush(vend-1) > >> in the callers at least. > >

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-12 Thread Linus Torvalds
On Tue, Jan 12, 2016 at 6:42 PM, Andy Lutomirski wrote: > > Since barriers are on my mind: how strong a barrier is needed to > prevent cache fills from being speculated across the barrier? I don't think there are *any* architectural guarantees. I suspect that a real serializing instruction shoul

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-12 Thread Andy Lutomirski
On Tue, Jan 12, 2016 at 6:06 PM, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 4:55 PM, Chris Wilson > wrote: >> >> The double clflush() remains a mystery. > > Actually, I think it's explainable. > > It's wrong to do the clflush *after* the GPU has done the write, which > seems to be what you

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-12 Thread Linus Torvalds
On Tue, Jan 12, 2016 at 4:55 PM, Chris Wilson wrote: > > The double clflush() remains a mystery. Actually, I think it's explainable. It's wrong to do the clflush *after* the GPU has done the write, which seems to be what you are doing. Why? If the GPU really isn't cache coherent, what can hap

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-12 Thread Chris Wilson
On Mon, Jan 11, 2016 at 09:05:06PM +, Chris Wilson wrote: > I can narrow down the principal buggy path by doing the clflush(vend-1) > in the callers at least. That leads to the suspect path being a read back of a cache line from main memory that was just written to by the GPU. Writes to memory

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-12 Thread Linus Torvalds
On Tue, Jan 12, 2016 at 1:13 PM, Chris Wilson wrote: > > That is a continual worry. To try and assuage that fear, I sent 8x > flush gpu writes between the end of the copy and setting the "I'm done" > flag. The definition of the GPU flush is that it both flushes all > previous writes before it com

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-12 Thread H. Peter Anvin
On January 11, 2016 3:28:01 AM PST, Chris Wilson wrote: >On Sat, Jan 09, 2016 at 02:36:03PM -0800, Andy Lutomirski wrote: >> On Sat, Jan 9, 2016 at 12:01 AM, Chris Wilson > wrote: >> > On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote: >> >> On 01/07/16 14:29, H. Peter Anvin wrote: >

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-12 Thread Linus Torvalds
On Tue, Jan 12, 2016 at 8:37 AM, Chris Wilson wrote: > On Mon, Jan 11, 2016 at 09:05:06PM +, Chris Wilson wrote: >> I can narrow down the principal buggy path by doing the clflush(vend-1) >> in the callers at least. > > That leads to the suspect path being a read back of a cache line from > m

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-11 Thread Chris Wilson
On Mon, Jan 11, 2016 at 12:11:05PM -0800, Linus Torvalds wrote: > On Mon, Jan 11, 2016 at 3:28 AM, Chris Wilson > wrote: > > > > Bizarrely, > > > > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c > > index 6000ad7..cf074400 100644 > > --- a/arch/x86/mm/pageattr.c > > +++ b/arch/x86/m

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-11 Thread Linus Torvalds
On Mon, Jan 11, 2016 at 3:28 AM, Chris Wilson wrote: > > Bizarrely, > > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c > index 6000ad7..cf074400 100644 > --- a/arch/x86/mm/pageattr.c > +++ b/arch/x86/mm/pageattr.c > @@ -141,6 +141,7 @@ void clflush_cache_range(void *vaddr, unsigned

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-11 Thread Chris Wilson
On Sat, Jan 09, 2016 at 02:36:03PM -0800, Andy Lutomirski wrote: > On Sat, Jan 9, 2016 at 12:01 AM, Chris Wilson > wrote: > > On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote: > >> On 01/07/16 14:29, H. Peter Anvin wrote: > >> > > >> > I would be very interested in knowing if replac

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-09 Thread Andy Lutomirski
On Sat, Jan 9, 2016 at 12:01 AM, Chris Wilson wrote: > On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote: >> On 01/07/16 14:29, H. Peter Anvin wrote: >> > >> > I would be very interested in knowing if replacing the final clflushopt >> > with a clflush would resolve your problems (in

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-09 Thread Chris Wilson
On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote: > On 01/07/16 14:29, H. Peter Anvin wrote: > > > > I would be very interested in knowing if replacing the final clflushopt > > with a clflush would resolve your problems (in which case the last mb() > > shouldn't be necessary either.)

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-08 Thread H. Peter Anvin
On 01/07/16 14:32, H. Peter Anvin wrote: > > Nevermind. CLFLUSH is not ordered with regards to CLFLUSHOPT to the > same cache line. > *Except* to the same cache line, dammit. -hpa

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread Chris Wilson
On Thu, Jan 07, 2016 at 01:05:35PM -0800, H. Peter Anvin wrote: > On 01/07/16 11:44, Chris Wilson wrote: > > > > Now I feel silly. Looking at the .s, there is no difference with the > > addition of the barrier to clflush_cache_range(). And sure enough > > letting the test run for longer, we see a

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread Chris Wilson
On Thu, Jan 07, 2016 at 09:55:51AM -0800, Andy Lutomirski wrote: > On Thu, Jan 7, 2016 at 2:16 AM, Chris Wilson > wrote: > >> + /* GCC (4.9.1 and 5.2.1 at least) appears to be very confused when > >> + * meeting this alternative() and demonstrably miscompiles loops > >> + * iteratin

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread H. Peter Anvin
On 01/07/16 14:29, H. Peter Anvin wrote: > > I would be very interested in knowing if replacing the final clflushopt > with a clflush would resolve your problems (in which case the last mb() > shouldn't be necessary either.) > Nevermind. CLFLUSH is not ordered with regards to CLFLUSHOPT to the

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread H. Peter Anvin
On 01/07/16 13:54, Chris Wilson wrote: > > Whilst you are looking at this asm, note that we reload > boot_cpu_data.x86_cflush_size everytime around the loop. That's a small > but noticeable extra cost (especially when we are only flushing a single > cacheline). > I did notice that; I don't know

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread H. Peter Anvin
On 01/07/16 11:44, Chris Wilson wrote: > > Now I feel silly. Looking at the .s, there is no difference with the > addition of the barrier to clflush_cache_range(). And sure enough > letting the test run for longer, we see a failure. I fell for a placebo. > > The failing assertion is always on the

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread Chris Wilson
On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote: > During testing we observed that the last cacheline was not being flushed > from a > > mb() > for (addr = addr & -clflush_size; addr < end; addr += clflush_size) > clflushopt(); > mb() > > loop (where t

[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread Andy Lutomirski
On Thu, Jan 7, 2016 at 2:16 AM, Chris Wilson wrote: > On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote: >> During testing we observed that the last cacheline was not being flushed >> from a >> >> mb() >> for (addr = addr & -clflush_size; addr < end; addr += clflush_size) >

[PATCH] x86: Add an explicit barrier() to clflushopt()

2015-10-19 Thread Borislav Petkov
On Mon, Oct 19, 2015 at 12:05:29PM +0100, Chris Wilson wrote: > In order to add the clobbers, I had to adjust the macro slightly: > > +#define alternative_output(oldinstr, newinstr, feature, output)\ > + asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) \ > +

[PATCH] x86: Add an explicit barrier() to clflushopt()

2015-10-19 Thread Ross Zwisler
On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote: > During testing we observed that the last cacheline was not being flushed > from a > > mb() > for (addr = addr & -clflush_size; addr < end; addr += clflush_size) > clflushopt(); > mb() > > loop (where t

[PATCH] x86: Add an explicit barrier() to clflushopt()

2015-10-19 Thread Borislav Petkov
On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote: > During testing we observed that the last cacheline was not being flushed > from a > > mb() > for (addr = addr & -clflush_size; addr < end; addr += clflush_size) > clflushopt(); > mb() > > loop (where t

[PATCH] x86: Add an explicit barrier() to clflushopt()

2015-10-19 Thread Chris Wilson
On Mon, Oct 19, 2015 at 12:16:12PM +0200, Borislav Petkov wrote: > On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote: > > Adding a barrier() into clflushopt() is enough for GCC to dtrt, but > > solving why GCC is not seeing the constraints from the alternative_io() > > would be smarter..

[PATCH] x86: Add an explicit barrier() to clflushopt()

2015-10-19 Thread Chris Wilson
During testing we observed that the last cacheline was not being flushed from a mb() for (addr = addr & -clflush_size; addr < end; addr += clflush_size) clflushopt(); mb() loop (where the initial addr and end were not cacheline aligned). Changing the loop