Re: [patch] x86: improved memory barrier implementation

2007-09-30 Thread Nick Piggin
On Sat, Sep 29, 2007 at 09:07:30AM -0700, Linus Torvalds wrote: > > > On Sat, 29 Sep 2007, Nick Piggin wrote: > > > > > > The non-temporal stores should be basically considered to be "IO", not > > > any > > > normal memory operation. > > > > Maybe you're thinking of uncached / WC?

Re: [patch] x86: improved memory barrier implementation

2007-09-30 Thread Nick Piggin
On Sat, Sep 29, 2007 at 09:07:30AM -0700, Linus Torvalds wrote: On Sat, 29 Sep 2007, Nick Piggin wrote: The non-temporal stores should be basically considered to be IO, not any normal memory operation. Maybe you're thinking of uncached / WC? Non-temporal stores to

Re: [patch] x86: improved memory barrier implementation

2007-09-29 Thread Dave Jones
On Fri, Sep 28, 2007 at 05:07:19PM +0100, Alan Cox wrote: > > Winchip: can any of these CPUs with ooostores do SMP? If not, then smp_wmb > > can also be a simple barrier on i386 too. > > The IDT Winchip can do SMP apparently. >From the Winchip3 (which was the final winchip) specs.. "The

Re: [patch] x86: improved memory barrier implementation

2007-09-29 Thread Linus Torvalds
On Sat, 29 Sep 2007, Nick Piggin wrote: > > > > The non-temporal stores should be basically considered to be "IO", not any > > normal memory operation. > > Maybe you're thinking of uncached / WC? Non-temporal stores to cacheable > RAM apparently can go out of order too, and they are being

Re: [patch] x86: improved memory barrier implementation

2007-09-29 Thread Nick Piggin
On Fri, Sep 28, 2007 at 06:18:31PM +0100, Alan Cox wrote: > > on the broken ppro stores config option if you just tell me what should > > be there (again, remember that my patch isn't actually changing anything > > already there except for smp_rmb side). > > The PPro needs rmb to ensure a store

Re: [patch] x86: improved memory barrier implementation

2007-09-29 Thread Nick Piggin
On Fri, Sep 28, 2007 at 09:15:06AM -0700, Linus Torvalds wrote: > > > On Fri, 28 Sep 2007, Alan Cox wrote: > > > > However > > - You've not shown the patch has any performance gain > > It would be nice to see this. Actually, in a userspace test I have (which actually does enough work to

Re: [patch] x86: improved memory barrier implementation

2007-09-29 Thread Nick Piggin
On Fri, Sep 28, 2007 at 09:15:06AM -0700, Linus Torvalds wrote: On Fri, 28 Sep 2007, Alan Cox wrote: However - You've not shown the patch has any performance gain It would be nice to see this. Actually, in a userspace test I have (which actually does enough work to trigger out of

Re: [patch] x86: improved memory barrier implementation

2007-09-29 Thread Nick Piggin
On Fri, Sep 28, 2007 at 06:18:31PM +0100, Alan Cox wrote: on the broken ppro stores config option if you just tell me what should be there (again, remember that my patch isn't actually changing anything already there except for smp_rmb side). The PPro needs rmb to ensure a store doesn't

Re: [patch] x86: improved memory barrier implementation

2007-09-29 Thread Linus Torvalds
On Sat, 29 Sep 2007, Nick Piggin wrote: The non-temporal stores should be basically considered to be IO, not any normal memory operation. Maybe you're thinking of uncached / WC? Non-temporal stores to cacheable RAM apparently can go out of order too, and they are being used in the

Re: [patch] x86: improved memory barrier implementation

2007-09-29 Thread Dave Jones
On Fri, Sep 28, 2007 at 05:07:19PM +0100, Alan Cox wrote: Winchip: can any of these CPUs with ooostores do SMP? If not, then smp_wmb can also be a simple barrier on i386 too. The IDT Winchip can do SMP apparently. From the Winchip3 (which was the final winchip) specs.. The IDT

Re: [patch] x86: improved memory barrier implementation

2007-09-28 Thread Alan Cox
> on the broken ppro stores config option if you just tell me what should > be there (again, remember that my patch isn't actually changing anything > already there except for smp_rmb side). The PPro needs rmb to ensure a store doesn't go for a walk on the wild side and pass the read especially

Re: [patch] x86: improved memory barrier implementation

2007-09-28 Thread Nick Piggin
On Fri, Sep 28, 2007 at 05:07:19PM +0100, Alan Cox wrote: > > The only alternative is to assume a weak memory model, and add the required > > barriers to spin_unlock -- something that has been explicitly avoided, but > > We have the barriers in spin_unlock already for Pentium Pro and IDT >

Re: [patch] x86: improved memory barrier implementation

2007-09-28 Thread Linus Torvalds
On Fri, 28 Sep 2007, Alan Cox wrote: > > However > - You've not shown the patch has any performance gain It would be nice to see this. > - You've probably broken Pentium Pro Probably not a big deal, but yeah, we should have that broken-ppro option. > - and for modern processors its still

Re: [patch] x86: improved memory barrier implementation

2007-09-28 Thread Alan Cox
> The only alternative is to assume a weak memory model, and add the required > barriers to spin_unlock -- something that has been explicitly avoided, but We have the barriers in spin_unlock already for Pentium Pro and IDT Winchip systems. The Winchip explicitly supports out of order store (and

[patch] x86: improved memory barrier implementation

2007-09-28 Thread Nick Piggin
According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture. Hence, smp_rmb() may be a simple barrier. Also according to those documents, and according to existing practice in

[patch] x86: improved memory barrier implementation

2007-09-28 Thread Nick Piggin
According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture. Hence, smp_rmb() may be a simple barrier. Also according to those documents, and according to existing practice in

Re: [patch] x86: improved memory barrier implementation

2007-09-28 Thread Alan Cox
The only alternative is to assume a weak memory model, and add the required barriers to spin_unlock -- something that has been explicitly avoided, but We have the barriers in spin_unlock already for Pentium Pro and IDT Winchip systems. The Winchip explicitly supports out of order store (and was

Re: [patch] x86: improved memory barrier implementation

2007-09-28 Thread Alan Cox
on the broken ppro stores config option if you just tell me what should be there (again, remember that my patch isn't actually changing anything already there except for smp_rmb side). The PPro needs rmb to ensure a store doesn't go for a walk on the wild side and pass the read especially when

Re: [patch] x86: improved memory barrier implementation

2007-09-28 Thread Nick Piggin
On Fri, Sep 28, 2007 at 05:07:19PM +0100, Alan Cox wrote: The only alternative is to assume a weak memory model, and add the required barriers to spin_unlock -- something that has been explicitly avoided, but We have the barriers in spin_unlock already for Pentium Pro and IDT Winchip

Re: [patch] x86: improved memory barrier implementation

2007-09-28 Thread Linus Torvalds
On Fri, 28 Sep 2007, Alan Cox wrote: However - You've not shown the patch has any performance gain It would be nice to see this. - You've probably broken Pentium Pro Probably not a big deal, but yeah, we should have that broken-ppro option. - and for modern processors its still not