Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-18 Thread Mikulas Patocka
> > > > You already must not place any data structures into WC memory --- for > > > > example, spinlocks wouldn't work there. > > > > > > What do you mean "already"? > > > > I mean "in current kernel" (I checked it in 2.6.22) > > Ahh, that's not "current kernel", though ;) > >

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-18 Thread Mikulas Patocka
You already must not place any data structures into WC memory --- for example, spinlocks wouldn't work there. What do you mean already? I mean in current kernel (I checked it in 2.6.22) Ahh, that's not current kernel, though ;) 4071c718555d955a35e9651f77086096ad87d498

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-17 Thread Nick Piggin
On Wed, Oct 17, 2007 at 01:51:17PM +0800, Herbert Xu wrote: > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > Also, for non-wb memory. I don't think the Intel document referenced > > says anything about this, but the AMD document says that loads can pass > > loads (page 8, rule b). > > > > This is

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-17 Thread Nick Piggin
On Wed, Oct 17, 2007 at 02:30:32AM +0200, Mikulas Patocka wrote: > > > You already must not place any data structures into WC memory --- for > > > example, spinlocks wouldn't work there. > > > > What do you mean "already"? > > I mean "in current kernel" (I checked it in 2.6.22) Ahh, that's not

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-17 Thread Nick Piggin
On Wed, Oct 17, 2007 at 02:30:32AM +0200, Mikulas Patocka wrote: You already must not place any data structures into WC memory --- for example, spinlocks wouldn't work there. What do you mean already? I mean in current kernel (I checked it in 2.6.22) Ahh, that's not current kernel,

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-17 Thread Nick Piggin
On Wed, Oct 17, 2007 at 01:51:17PM +0800, Herbert Xu wrote: Nick Piggin [EMAIL PROTECTED] wrote: Also, for non-wb memory. I don't think the Intel document referenced says anything about this, but the AMD document says that loads can pass loads (page 8, rule b). This is why our rmb()

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Herbert Xu
Nick Piggin <[EMAIL PROTECTED]> wrote: > > Also, for non-wb memory. I don't think the Intel document referenced > says anything about this, but the AMD document says that loads can pass > loads (page 8, rule b). > > This is why our rmb() is still an lfence. BTW, Xen (in particular, the code in

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
> > You already must not place any data structures into WC memory --- for > > example, spinlocks wouldn't work there. > > What do you mean "already"? I mean "in current kernel" (I checked it in 2.6.22) > If we already have drivers loading data from > WC memory, then rmb() needs to order them,

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Nick Piggin
On Wed, Oct 17, 2007 at 01:05:16AM +0200, Mikulas Patocka wrote: > > > I see, AMD says that WC memory loads can be out-of-order. > > > > > > There is very little usability to it --- framebuffer and AGP aperture is > > > the only piece of memory that is WC and no kernel structures are placed > >

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
> > I see, AMD says that WC memory loads can be out-of-order. > > > > There is very little usability to it --- framebuffer and AGP aperture is > > the only piece of memory that is WC and no kernel structures are placed > > there, so it is possible to remove that lfence. > > No. In Linux

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Nick Piggin
On Tue, Oct 16, 2007 at 12:33:54PM +0200, Mikulas Patocka wrote: > > > On Tue, 16 Oct 2007, Nick Piggin wrote: > > > > > The cpus also have an explicit set of instructions that deliberately do > > > > unordered stores/loads, and s/lfence etc are mostly designed for those. > > > > > > I know

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
On Tue, 16 Oct 2007, Nick Piggin wrote: > > > The cpus also have an explicit set of instructions that deliberately do > > > unordered stores/loads, and s/lfence etc are mostly designed for those. > > > > I know about unordered stores (movnti & similar) --- they basically use > >

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
On Mon, 15 Oct 2007, H. Peter Anvin wrote: > Mikulas Patocka wrote: > > > > I know about unordered stores (movnti & similar) --- they basically use > > write-combining method on memory that is normally write-back --- and they > > need sfence. But which one instruction does unordered load and

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
On Mon, 15 Oct 2007, H. Peter Anvin wrote: Mikulas Patocka wrote: I know about unordered stores (movnti similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
On Tue, 16 Oct 2007, Nick Piggin wrote: The cpus also have an explicit set of instructions that deliberately do unordered stores/loads, and s/lfence etc are mostly designed for those. I know about unordered stores (movnti similar) --- they basically use write-combining method on

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Nick Piggin
On Tue, Oct 16, 2007 at 12:33:54PM +0200, Mikulas Patocka wrote: On Tue, 16 Oct 2007, Nick Piggin wrote: The cpus also have an explicit set of instructions that deliberately do unordered stores/loads, and s/lfence etc are mostly designed for those. I know about unordered

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
I see, AMD says that WC memory loads can be out-of-order. There is very little usability to it --- framebuffer and AGP aperture is the only piece of memory that is WC and no kernel structures are placed there, so it is possible to remove that lfence. No. In Linux kernel, rmb()

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Nick Piggin
On Wed, Oct 17, 2007 at 01:05:16AM +0200, Mikulas Patocka wrote: I see, AMD says that WC memory loads can be out-of-order. There is very little usability to it --- framebuffer and AGP aperture is the only piece of memory that is WC and no kernel structures are placed there, so it

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
You already must not place any data structures into WC memory --- for example, spinlocks wouldn't work there. What do you mean already? I mean in current kernel (I checked it in 2.6.22) If we already have drivers loading data from WC memory, then rmb() needs to order them, whether or

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Herbert Xu
Nick Piggin [EMAIL PROTECTED] wrote: Also, for non-wb memory. I don't think the Intel document referenced says anything about this, but the AMD document says that loads can pass loads (page 8, rule b). This is why our rmb() is still an lfence. BTW, Xen (in particular, the code in

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Nick Piggin
On Tue, Oct 16, 2007 at 12:08:01AM +0200, Mikulas Patocka wrote: > > On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) > > Mikulas Patocka <[EMAIL PROTECTED]> wrote: > > > > > > According to latest memory ordering specification documents from > > > > Intel and AMD, both manufacturers are committed to

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread H. Peter Anvin
Mikulas Patocka wrote: I know about unordered stores (movnti & similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? PREFETCHNTA. -hpa - To

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Mikulas Patocka
> On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) > Mikulas Patocka <[EMAIL PROTECTED]> wrote: > > > > According to latest memory ordering specification documents from > > > Intel and AMD, both manufacturers are committed to in-order loads > > > from cacheable memory for the x86 architecture. Hence,

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Arjan van de Ven
On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) Mikulas Patocka <[EMAIL PROTECTED]> wrote: > > According to latest memory ordering specification documents from > > Intel and AMD, both manufacturers are committed to in-order loads > > from cacheable memory for the x86 architecture. Hence, smp_rmb() > >

LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Mikulas Patocka
> According to latest memory ordering specification documents from Intel > and AMD, both manufacturers are committed to in-order loads from > cacheable memory for the x86 architecture. Hence, smp_rmb() may be a > simple barrier. > >

LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Mikulas Patocka
According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture. Hence, smp_rmb() may be a simple barrier. http://developer.intel.com/products/processor/manuals/318147.pdf

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Arjan van de Ven
On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) Mikulas Patocka [EMAIL PROTECTED] wrote: According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture. Hence, smp_rmb() may be a

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Mikulas Patocka
On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) Mikulas Patocka [EMAIL PROTECTED] wrote: According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture. Hence, smp_rmb() may

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread H. Peter Anvin
Mikulas Patocka wrote: I know about unordered stores (movnti similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? PREFETCHNTA. -hpa - To

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Nick Piggin
On Tue, Oct 16, 2007 at 12:08:01AM +0200, Mikulas Patocka wrote: On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) Mikulas Patocka [EMAIL PROTECTED] wrote: According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads