Re: [beta patch] SSE copy_page() / clear_page()

2001-02-20 Thread Manfred Spraul
Pavel Machek wrote: > > > > > + __asm__ __volatile__( > > > > + "mov %1, %0\n\t" > > > > + : "=r" (i) > > > > + : "r" (kaddr+offset)); /* load tlb entry */ > > > > + for(i=0;i > > > + __asm__ __volatile__( > > > > +

Re: [beta patch] SSE copy_page() / clear_page()

2001-02-20 Thread Alan Cox
> > Does the prefetch instruction fault on PIII/PIV then - the K7 one appears not > > to be a source of faults > > My fault. I was told that prefetch instructions are always > non-faulting. I also thought it was non faulting - To unsubscribe from this list: send the line "unsubscribe linux-kerne

Re: [beta patch] SSE copy_page() / clear_page()

2001-02-20 Thread Pavel Machek
> > > + __asm__ __volatile__( > > > + "mov %1, %0\n\t" > > > + : "=r" (i) > > > + : "r" (kaddr+offset)); /* load tlb entry */ > > > + for(i=0;i > > + __asm__ __volatile__( > > > + "prefetchnta (

Re: [beta patch] SSE copy_page() / clear_page()

2001-02-20 Thread Alan Cox
> > + __asm__ __volatile__( > > + "mov %1, %0\n\t" > > + : "=r" (i) > > + : "r" (kaddr+offset)); /* load tlb entry */ > > + for(i=0;i > + __asm__ __volatile__( > > + "prefetchnta (

Re: [beta patch] SSE copy_page() / clear_page()

2001-02-20 Thread Pavel Machek
Hi! > --- 2.4/mm/filemap.c Wed Feb 14 10:51:42 2001 > +++ build-2.4/mm/filemap.cWed Feb 14 22:11:44 2001 > @@ -1248,6 +1248,20 @@ > size = count; > > kaddr = kmap(page); > + if (size > 128) { > + int i; > + __asm__ __volatile__( > +

Re: [beta patch] SSE copy_page() / clear_page()

2001-02-16 Thread Andrew Morton
Manfred Spraul wrote: > > Intel Pentium III and P 4 have hardcoded "fast stringcopy" operations > that invalidate whole cachelines during write (documented in the most > obvious place: multiprocessor management, memory ordering) Which are dramatically slower than a simple `mov' loop for just abo

Re: [beta patch] SSE copy_page() / clear_page()

2001-02-14 Thread Manfred Spraul
I have another idea for sse, and this one is far safer: only use sse prefetch, leave the string operations for the actual copy. The prefetch operations only prefetch, don't touch the sse registers, thus neither any reentency nor interrupt problems. I tried the attached hack^H^H^H^Hpatch, and rea

Re: [beta patch] SSE copy_page() / clear_page()

2001-02-10 Thread Manfred Spraul
Manfred Spraul wrote: > > copy_*_user is probably not worth the effort for a Pentium III, but even > for that function I don't see a problem with SSE, as long as > > * the clobbered registers are stored on the stack (and not in > thread.i387.fxsave) > * the SSE/SSE2 instructions can't cause SIM

Re: [beta patch] SSE copy_page() / clear_page()

2001-02-10 Thread Manfred Spraul
Doug Ledford wrote: > > It's not whether or not your particular code does it. It's whether or not it > can happen in the framework within which you are using the FPU regs. No, with > just copy/clear page using these things it won't happen. But if you add an > SSE zero page function, who's to s

Re: [beta patch] SSE copy_page() / clear_page()

2001-02-10 Thread Doug Ledford
Manfred Spraul wrote: > > Doug Ledford wrote: > > > > > I have this strong suspicion that your kernel will lock up in a bad way > > > of you have somebody do something like divide by zero without actually > > > touching a single FP instruction after the divide (so that the error has > > > happene

Re: [beta patch] SSE copy_page() / clear_page()

2001-02-10 Thread Manfred Spraul
Doug Ledford wrote: > > > I have this strong suspicion that your kernel will lock up in a bad way > > of you have somebody do something like divide by zero without actually > > touching a single FP instruction after the divide (so that the error has > > happened, but has not yet been raised as an

Re: [beta patch] SSE copy_page() / clear_page()

2001-02-09 Thread Doug Ledford
Linus Torvalds wrote: > > In article <[EMAIL PROTECTED]>, > Manfred Spraul <[EMAIL PROTECTED]> wrote: > > > >* use sse for normal memcopy. Then main advantage of sse over mmx is > >that only the clobbered registers must be saved, not the full fpu state. > > > >* verify that the code doesn't brea

Re: [beta patch] SSE copy_page() / clear_page()

2001-02-09 Thread Linus Torvalds
In article <[EMAIL PROTECTED]>, Manfred Spraul <[EMAIL PROTECTED]> wrote: > >* use sse for normal memcopy. Then main advantage of sse over mmx is >that only the clobbered registers must be saved, not the full fpu state. > >* verify that the code doesn't break SSE enabled apps. >I checked a sse en

[beta patch] SSE copy_page() / clear_page()

2001-02-09 Thread Manfred Spraul
I wrote a kernel patch that replaces the standard copy_page()/clear_page() functions on Pentium III and Pentium IV with SSE instructions. If you have access to a Pentium 4 it would be great if you could download the user space test apps from http://colorfullife.com/~manfred/sse/ and run them.