Pavel Machek wrote:
>
> > > > + __asm__ __volatile__(
> > > > + "mov %1, %0\n\t"
> > > > + : "=r" (i)
> > > > + : "r" (kaddr+offset)); /* load tlb entry */
> > > > + for(i=0;i > > > + __asm__ __volatile__(
> > > > +
> > Does the prefetch instruction fault on PIII/PIV then - the K7 one appears not
> > to be a source of faults
>
> My fault. I was told that prefetch instructions are always
> non-faulting.
I also thought it was non faulting
-
To unsubscribe from this list: send the line "unsubscribe linux-kerne
> > > + __asm__ __volatile__(
> > > + "mov %1, %0\n\t"
> > > + : "=r" (i)
> > > + : "r" (kaddr+offset)); /* load tlb entry */
> > > + for(i=0;i > > + __asm__ __volatile__(
> > > + "prefetchnta (
> > + __asm__ __volatile__(
> > + "mov %1, %0\n\t"
> > + : "=r" (i)
> > + : "r" (kaddr+offset)); /* load tlb entry */
> > + for(i=0;i > + __asm__ __volatile__(
> > + "prefetchnta (
Hi!
> --- 2.4/mm/filemap.c Wed Feb 14 10:51:42 2001
> +++ build-2.4/mm/filemap.cWed Feb 14 22:11:44 2001
> @@ -1248,6 +1248,20 @@
> size = count;
>
> kaddr = kmap(page);
> + if (size > 128) {
> + int i;
> + __asm__ __volatile__(
> +
Manfred Spraul wrote:
>
> Intel Pentium III and P 4 have hardcoded "fast stringcopy" operations
> that invalidate whole cachelines during write (documented in the most
> obvious place: multiprocessor management, memory ordering)
Which are dramatically slower than a simple `mov' loop for just
abo
I have another idea for sse, and this one is far safer:
only use sse prefetch, leave the string operations for the actual copy.
The prefetch operations only prefetch, don't touch the sse registers,
thus neither any reentency nor interrupt problems.
I tried the attached hack^H^H^H^Hpatch, and rea
Manfred Spraul wrote:
>
> copy_*_user is probably not worth the effort for a Pentium III, but even
> for that function I don't see a problem with SSE, as long as
>
> * the clobbered registers are stored on the stack (and not in
> thread.i387.fxsave)
> * the SSE/SSE2 instructions can't cause SIM
Doug Ledford wrote:
>
> It's not whether or not your particular code does it. It's whether or not it
> can happen in the framework within which you are using the FPU regs. No, with
> just copy/clear page using these things it won't happen. But if you add an
> SSE zero page function, who's to s
Manfred Spraul wrote:
>
> Doug Ledford wrote:
> >
> > > I have this strong suspicion that your kernel will lock up in a bad way
> > > of you have somebody do something like divide by zero without actually
> > > touching a single FP instruction after the divide (so that the error has
> > > happene
Doug Ledford wrote:
>
> > I have this strong suspicion that your kernel will lock up in a bad way
> > of you have somebody do something like divide by zero without actually
> > touching a single FP instruction after the divide (so that the error has
> > happened, but has not yet been raised as an
Linus Torvalds wrote:
>
> In article <[EMAIL PROTECTED]>,
> Manfred Spraul <[EMAIL PROTECTED]> wrote:
> >
> >* use sse for normal memcopy. Then main advantage of sse over mmx is
> >that only the clobbered registers must be saved, not the full fpu state.
> >
> >* verify that the code doesn't brea
In article <[EMAIL PROTECTED]>,
Manfred Spraul <[EMAIL PROTECTED]> wrote:
>
>* use sse for normal memcopy. Then main advantage of sse over mmx is
>that only the clobbered registers must be saved, not the full fpu state.
>
>* verify that the code doesn't break SSE enabled apps.
>I checked a sse en
I wrote a kernel patch that replaces the standard
copy_page()/clear_page() functions on Pentium III and Pentium IV with
SSE instructions.
If you have access to a Pentium 4 it would be great if you could
download the user space test apps from
http://colorfullife.com/~manfred/sse/
and run them.
14 matches
Mail list logo