On Tue, Jan 12, 2016 at 06:06:34PM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2016 at 4:55 PM, Chris Wilson
> wrote:
> >
> > The double clflush() remains a mystery.
>
> Actually, I think it's explainable.
>
> It's wrong to do the clflush *after* the GPU has done the write, which
> seems to
On Wed, Jan 13, 2016 at 4:34 AM, Chris Wilson
wrote:
>
> Forgive me for being dense, but if we overwrite the GPU data in the
> backing struct page with the cacheline from the CPU, how do we see the
> results from the GPU afterwards?
Hmm. Good point.
Ok, all the symptoms just say "writes from GP
On Tue, Jan 12, 2016 at 02:07:35PM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2016 at 1:13 PM, Chris Wilson
> wrote:
> > Indeed. So I replaced the post-clflush_cache_range() clflush() with a
> > udelay(10) instead, and the corruption vanished. Putting the udelay(10)
> > before the clflush_cac
On Tue, Jan 12, 2016 at 09:05:19AM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2016 at 8:37 AM, Chris Wilson
> wrote:
> > On Mon, Jan 11, 2016 at 09:05:06PM +, Chris Wilson wrote:
> >> I can narrow down the principal buggy path by doing the clflush(vend-1)
> >> in the callers at least.
> >
On Tue, Jan 12, 2016 at 6:42 PM, Andy Lutomirski wrote:
>
> Since barriers are on my mind: how strong a barrier is needed to
> prevent cache fills from being speculated across the barrier?
I don't think there are *any* architectural guarantees.
I suspect that a real serializing instruction shoul
On Tue, Jan 12, 2016 at 6:06 PM, Linus Torvalds
wrote:
> On Tue, Jan 12, 2016 at 4:55 PM, Chris Wilson
> wrote:
>>
>> The double clflush() remains a mystery.
>
> Actually, I think it's explainable.
>
> It's wrong to do the clflush *after* the GPU has done the write, which
> seems to be what you
On Tue, Jan 12, 2016 at 4:55 PM, Chris Wilson
wrote:
>
> The double clflush() remains a mystery.
Actually, I think it's explainable.
It's wrong to do the clflush *after* the GPU has done the write, which
seems to be what you are doing.
Why?
If the GPU really isn't cache coherent, what can hap
On Mon, Jan 11, 2016 at 09:05:06PM +, Chris Wilson wrote:
> I can narrow down the principal buggy path by doing the clflush(vend-1)
> in the callers at least.
That leads to the suspect path being a read back of a cache line from
main memory that was just written to by the GPU. Writes to memory
On Tue, Jan 12, 2016 at 1:13 PM, Chris Wilson
wrote:
>
> That is a continual worry. To try and assuage that fear, I sent 8x
> flush gpu writes between the end of the copy and setting the "I'm done"
> flag. The definition of the GPU flush is that it both flushes all
> previous writes before it com
On January 11, 2016 3:28:01 AM PST, Chris Wilson
wrote:
>On Sat, Jan 09, 2016 at 02:36:03PM -0800, Andy Lutomirski wrote:
>> On Sat, Jan 9, 2016 at 12:01 AM, Chris Wilson
> wrote:
>> > On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote:
>> >> On 01/07/16 14:29, H. Peter Anvin wrote:
>
On Tue, Jan 12, 2016 at 8:37 AM, Chris Wilson
wrote:
> On Mon, Jan 11, 2016 at 09:05:06PM +, Chris Wilson wrote:
>> I can narrow down the principal buggy path by doing the clflush(vend-1)
>> in the callers at least.
>
> That leads to the suspect path being a read back of a cache line from
> m
On Mon, Jan 11, 2016 at 12:11:05PM -0800, Linus Torvalds wrote:
> On Mon, Jan 11, 2016 at 3:28 AM, Chris Wilson
> wrote:
> >
> > Bizarrely,
> >
> > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> > index 6000ad7..cf074400 100644
> > --- a/arch/x86/mm/pageattr.c
> > +++ b/arch/x86/m
On Mon, Jan 11, 2016 at 3:28 AM, Chris Wilson
wrote:
>
> Bizarrely,
>
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 6000ad7..cf074400 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -141,6 +141,7 @@ void clflush_cache_range(void *vaddr, unsigned
On Sat, Jan 09, 2016 at 02:36:03PM -0800, Andy Lutomirski wrote:
> On Sat, Jan 9, 2016 at 12:01 AM, Chris Wilson
> wrote:
> > On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote:
> >> On 01/07/16 14:29, H. Peter Anvin wrote:
> >> >
> >> > I would be very interested in knowing if replac
On Sat, Jan 9, 2016 at 12:01 AM, Chris Wilson
wrote:
> On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote:
>> On 01/07/16 14:29, H. Peter Anvin wrote:
>> >
>> > I would be very interested in knowing if replacing the final clflushopt
>> > with a clflush would resolve your problems (in
On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote:
> On 01/07/16 14:29, H. Peter Anvin wrote:
> >
> > I would be very interested in knowing if replacing the final clflushopt
> > with a clflush would resolve your problems (in which case the last mb()
> > shouldn't be necessary either.)
On 01/07/16 14:32, H. Peter Anvin wrote:
>
> Nevermind. CLFLUSH is not ordered with regards to CLFLUSHOPT to the
> same cache line.
>
*Except* to the same cache line, dammit.
-hpa
On Thu, Jan 07, 2016 at 01:05:35PM -0800, H. Peter Anvin wrote:
> On 01/07/16 11:44, Chris Wilson wrote:
> >
> > Now I feel silly. Looking at the .s, there is no difference with the
> > addition of the barrier to clflush_cache_range(). And sure enough
> > letting the test run for longer, we see a
On Thu, Jan 07, 2016 at 09:55:51AM -0800, Andy Lutomirski wrote:
> On Thu, Jan 7, 2016 at 2:16 AM, Chris Wilson
> wrote:
> >> + /* GCC (4.9.1 and 5.2.1 at least) appears to be very confused when
> >> + * meeting this alternative() and demonstrably miscompiles loops
> >> + * iteratin
On 01/07/16 14:29, H. Peter Anvin wrote:
>
> I would be very interested in knowing if replacing the final clflushopt
> with a clflush would resolve your problems (in which case the last mb()
> shouldn't be necessary either.)
>
Nevermind. CLFLUSH is not ordered with regards to CLFLUSHOPT to the
On 01/07/16 13:54, Chris Wilson wrote:
>
> Whilst you are looking at this asm, note that we reload
> boot_cpu_data.x86_cflush_size everytime around the loop. That's a small
> but noticeable extra cost (especially when we are only flushing a single
> cacheline).
>
I did notice that; I don't know
On 01/07/16 11:44, Chris Wilson wrote:
>
> Now I feel silly. Looking at the .s, there is no difference with the
> addition of the barrier to clflush_cache_range(). And sure enough
> letting the test run for longer, we see a failure. I fell for a placebo.
>
> The failing assertion is always on the
On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote:
> During testing we observed that the last cacheline was not being flushed
> from a
>
> mb()
> for (addr = addr & -clflush_size; addr < end; addr += clflush_size)
> clflushopt();
> mb()
>
> loop (where t
On Thu, Jan 7, 2016 at 2:16 AM, Chris Wilson
wrote:
> On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote:
>> During testing we observed that the last cacheline was not being flushed
>> from a
>>
>> mb()
>> for (addr = addr & -clflush_size; addr < end; addr += clflush_size)
>
On Mon, Oct 19, 2015 at 12:05:29PM +0100, Chris Wilson wrote:
> In order to add the clobbers, I had to adjust the macro slightly:
>
> +#define alternative_output(oldinstr, newinstr, feature, output)\
> + asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) \
> +
On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote:
> During testing we observed that the last cacheline was not being flushed
> from a
>
> mb()
> for (addr = addr & -clflush_size; addr < end; addr += clflush_size)
> clflushopt();
> mb()
>
> loop (where t
On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote:
> During testing we observed that the last cacheline was not being flushed
> from a
>
> mb()
> for (addr = addr & -clflush_size; addr < end; addr += clflush_size)
> clflushopt();
> mb()
>
> loop (where t
On Mon, Oct 19, 2015 at 12:16:12PM +0200, Borislav Petkov wrote:
> On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote:
> > Adding a barrier() into clflushopt() is enough for GCC to dtrt, but
> > solving why GCC is not seeing the constraints from the alternative_io()
> > would be smarter..
During testing we observed that the last cacheline was not being flushed
from a
mb()
for (addr = addr & -clflush_size; addr < end; addr += clflush_size)
clflushopt();
mb()
loop (where the initial addr and end were not cacheline aligned).
Changing the loop
29 matches
Mail list logo