Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-18 Thread Mikulas Patocka
> > > > You already must not place any data structures into WC memory --- for > > > > example, spinlocks wouldn't work there. > > > > > > What do you mean "already"? > > > > I mean "in current kernel" (I checked it in 2.6.22) > > Ahh, that's not "current kernel", though ;) > > 4071c718555d955a

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-17 Thread Nick Piggin
On Wed, Oct 17, 2007 at 01:51:17PM +0800, Herbert Xu wrote: > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > Also, for non-wb memory. I don't think the Intel document referenced > > says anything about this, but the AMD document says that loads can pass > > loads (page 8, rule b). > > > > This is

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-17 Thread Nick Piggin
On Wed, Oct 17, 2007 at 02:30:32AM +0200, Mikulas Patocka wrote: > > > You already must not place any data structures into WC memory --- for > > > example, spinlocks wouldn't work there. > > > > What do you mean "already"? > > I mean "in current kernel" (I checked it in 2.6.22) Ahh, that's not

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Herbert Xu
Nick Piggin <[EMAIL PROTECTED]> wrote: > > Also, for non-wb memory. I don't think the Intel document referenced > says anything about this, but the AMD document says that loads can pass > loads (page 8, rule b). > > This is why our rmb() is still an lfence. BTW, Xen (in particular, the code in dr

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
> > You already must not place any data structures into WC memory --- for > > example, spinlocks wouldn't work there. > > What do you mean "already"? I mean "in current kernel" (I checked it in 2.6.22) > If we already have drivers loading data from > WC memory, then rmb() needs to order them, w

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Nick Piggin
On Wed, Oct 17, 2007 at 01:05:16AM +0200, Mikulas Patocka wrote: > > > I see, AMD says that WC memory loads can be out-of-order. > > > > > > There is very little usability to it --- framebuffer and AGP aperture is > > > the only piece of memory that is WC and no kernel structures are placed > >

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
> > I see, AMD says that WC memory loads can be out-of-order. > > > > There is very little usability to it --- framebuffer and AGP aperture is > > the only piece of memory that is WC and no kernel structures are placed > > there, so it is possible to remove that lfence. > > No. In Linux kernel,

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Nick Piggin
On Tue, Oct 16, 2007 at 12:33:54PM +0200, Mikulas Patocka wrote: > > > On Tue, 16 Oct 2007, Nick Piggin wrote: > > > > > The cpus also have an explicit set of instructions that deliberately do > > > > unordered stores/loads, and s/lfence etc are mostly designed for those. > > > > > > I know ab

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-16 Thread Jarek Poplawski
On Tue, Oct 16, 2007 at 02:14:17AM -0700, [EMAIL PROTECTED] wrote: ... > what you don't realize is that Intel (and AMD) have built their business > on makeing sure that their new CPU's run existing software with no > modifications, (and almost always faster then the old versions). remember > tha

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
On Tue, 16 Oct 2007, Nick Piggin wrote: > > > The cpus also have an explicit set of instructions that deliberately do > > > unordered stores/loads, and s/lfence etc are mostly designed for those. > > > > I know about unordered stores (movnti & similar) --- they basically use > > write-combini

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-16 Thread Mikulas Patocka
On Mon, 15 Oct 2007, H. Peter Anvin wrote: > Mikulas Patocka wrote: > > > > I know about unordered stores (movnti & similar) --- they basically use > > write-combining method on memory that is normally write-back --- and they > > need sfence. But which one instruction does unordered load and need

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-16 Thread david
On Tue, 16 Oct 2007, Jarek Poplawski wrote: On Tue, Oct 16, 2007 at 02:50:33AM +0200, Nick Piggin wrote: On Mon, Oct 15, 2007 at 11:10:00AM +0200, Jarek Poplawski wrote: ... As a matter of fact it's not natural for me at all. I expected the other direction, and I still doubt programmers' int

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-16 Thread Jarek Poplawski
On Tue, Oct 16, 2007 at 02:50:33AM +0200, Nick Piggin wrote: > On Mon, Oct 15, 2007 at 11:10:00AM +0200, Jarek Poplawski wrote: ... > > I'm not performance-words at all, so I can't help you, sorry. But, I > > understand people who care about this, and think there is a popular > > conviction barrier

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-15 Thread Nick Piggin
On Mon, Oct 15, 2007 at 11:10:00AM +0200, Jarek Poplawski wrote: > On Mon, Oct 15, 2007 at 10:09:24AM +0200, Nick Piggin wrote: > ... > > Has performance really been much problem for you? (even before the > > lfence instruction, when you theoretically had to use a locked op)? > > I mean, I'd strugg

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Nick Piggin
On Tue, Oct 16, 2007 at 12:08:01AM +0200, Mikulas Patocka wrote: > > On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) > > Mikulas Patocka <[EMAIL PROTECTED]> wrote: > > > > > > According to latest memory ordering specification documents from > > > > Intel and AMD, both manufacturers are committed to in-o

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread H. Peter Anvin
Mikulas Patocka wrote: I know about unordered stores (movnti & similar) --- they basically use write-combining method on memory that is normally write-back --- and they need sfence. But which one instruction does unordered load and needs lefence? PREFETCHNTA. -hpa - To unsubscrib

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Mikulas Patocka
> On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) > Mikulas Patocka <[EMAIL PROTECTED]> wrote: > > > > According to latest memory ordering specification documents from > > > Intel and AMD, both manufacturers are committed to in-order loads > > > from cacheable memory for the x86 architecture. Hence, smp

Re: LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Arjan van de Ven
On Mon, 15 Oct 2007 22:47:42 +0200 (CEST) Mikulas Patocka <[EMAIL PROTECTED]> wrote: > > According to latest memory ordering specification documents from > > Intel and AMD, both manufacturers are committed to in-order loads > > from cacheable memory for the x86 architecture. Hence, smp_rmb() > > m

LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007-10-15 Thread Mikulas Patocka
> According to latest memory ordering specification documents from Intel > and AMD, both manufacturers are committed to in-order loads from > cacheable memory for the x86 architecture. Hence, smp_rmb() may be a > simple barrier. > > http://developer.intel.com/products/processor/manuals/318147.pd

RE: [rfc][patch 3/3] x86: optimise barriers

2007-10-15 Thread David Schwartz
> From: Intel(R) 64 and IA-32 Architectures Software Developer's Manual > Volume 3A: > >"7.2.2 Memory Ordering in P6 and More Recent Processor Families > ... > 1. Reads can be carried out speculatively and in any order. > ..." > > So, it looks to me like almost the 1-st Commandment

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-15 Thread Jarek Poplawski
On Mon, Oct 15, 2007 at 12:17:40PM +0200, Helge Hafting wrote: > Jarek Poplawski wrote: > >On Fri, Oct 12, 2007 at 02:44:51PM +0200, Helge Hafting wrote: > >... > > > >>The point is that we _trust_ intel when they says "this will work". > >>Therefore, we can use the optimizations. It was never ab

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-15 Thread Helge Hafting
Jarek Poplawski wrote: On Fri, Oct 12, 2007 at 02:44:51PM +0200, Helge Hafting wrote: ... The point is that we _trust_ intel when they says "this will work". Therefore, we can use the optimizations. It was never about legal matters. If we didn't trust intel, then we couldn't use their process

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-15 Thread Jarek Poplawski
On Mon, Oct 15, 2007 at 11:09:59AM +0200, Jarek Poplawski wrote: ... > I'm not performance-words at all, so I can't help you, sorry. But, I ...performance-wards?! Looks like serious: I don't even now who I'm not now! Jarek P. - To unsubscribe from this list: send the line "unsubscribe linux-kern

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-15 Thread Jarek Poplawski
On Mon, Oct 15, 2007 at 10:09:24AM +0200, Nick Piggin wrote: ... > Has performance really been much problem for you? (even before the > lfence instruction, when you theoretically had to use a locked op)? > I mean, I'd struggle to find a place in the Linux kernel where there > is actually a measurab

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-15 Thread Nick Piggin
On Mon, Oct 15, 2007 at 09:44:05AM +0200, Jarek Poplawski wrote: > On Fri, Oct 12, 2007 at 08:13:52AM -0700, Linus Torvalds wrote: > > > > > > On Fri, 12 Oct 2007, Jarek Poplawski wrote: > ... > > So no, there's no way a software person could have afforded to say "it > > seems to work on my setu

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-15 Thread Jarek Poplawski
On Fri, Oct 12, 2007 at 08:13:52AM -0700, Linus Torvalds wrote: > > > On Fri, 12 Oct 2007, Jarek Poplawski wrote: ... > So no, there's no way a software person could have afforded to say "it > seems to work on my setup even without the barrier". On a dual-socket > setup with s shared bus, that

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Linus Torvalds
On Fri, 12 Oct 2007, Jarek Poplawski wrote: > > First it looks like a really great thing that it's revealed at last. > But then... there is probably some confusion: did we have to use > ineffective code for so long? I think the chip manufacturers really wanted to keep their options open. Havin

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Jarek Poplawski
On Fri, Oct 12, 2007 at 02:44:51PM +0200, Helge Hafting wrote: ... > The point is that we _trust_ intel when they says "this will work". > Therefore, we can use the optimizations. It was never about > legal matters. If we didn't trust intel, then we couldn't > use their processors at all. But ther

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Helge Hafting
Jarek Poplawski wrote: On Fri, Oct 12, 2007 at 10:42:34AM +0200, Helge Hafting wrote: Jarek Poplawski wrote: On 04-10-2007 07:23, Nick Piggin wrote: According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loa

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Jarek Poplawski
On Fri, Oct 12, 2007 at 01:55:10PM +0200, Jarek Poplawski wrote: > On Fri, Oct 12, 2007 at 12:42:38PM +0200, Nick Piggin wrote: ... > > [...] If you can actually come up with a test > > case that triggers load/load or store/store reordering, I'm sure > > Intel / AMD would like to see it ;) > > It

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Jarek Poplawski
On Fri, Oct 12, 2007 at 12:42:38PM +0200, Nick Piggin wrote: > On Fri, Oct 12, 2007 at 11:55:05AM +0200, Jarek Poplawski wrote: > > On Fri, Oct 12, 2007 at 10:57:33AM +0200, Nick Piggin wrote: > > > > > > I don't know quite what you're saying... the CPUs could probably get > > > performance by hav

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Nick Piggin
On Fri, Oct 12, 2007 at 11:55:05AM +0200, Jarek Poplawski wrote: > On Fri, Oct 12, 2007 at 10:57:33AM +0200, Nick Piggin wrote: > > > > I don't know quite what you're saying... the CPUs could probably get > > performance by having weakly ordered loads, OTOH I think the Intel > > ones might already

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Jarek Poplawski
On Fri, Oct 12, 2007 at 11:44:27AM +0200, Nick Piggin wrote: ... > So unless there is reasonable information for us to believe this > will be a problem, IMO the best thing to do is stick with the > specs. Intel is pretty reasonable with documenting errata I think. 100% right - if there are any spe

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Jarek Poplawski
On Fri, Oct 12, 2007 at 10:57:33AM +0200, Nick Piggin wrote: > On Fri, Oct 12, 2007 at 10:25:34AM +0200, Jarek Poplawski wrote: > > On 04-10-2007 07:23, Nick Piggin wrote: > > > According to latest memory ordering specification documents from Intel and > > > AMD, both manufacturers are committed to

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Nick Piggin
On Fri, Oct 12, 2007 at 11:12:13AM +0200, Jarek Poplawski wrote: > On Fri, Oct 12, 2007 at 10:42:34AM +0200, Helge Hafting wrote: > > Jarek Poplawski wrote: > > >On 04-10-2007 07:23, Nick Piggin wrote: > > > > > >>According to latest memory ordering specification documents from Intel and > > >>AM

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Jarek Poplawski
On Fri, Oct 12, 2007 at 10:42:34AM +0200, Helge Hafting wrote: > Jarek Poplawski wrote: > >On 04-10-2007 07:23, Nick Piggin wrote: > > > >>According to latest memory ordering specification documents from Intel and > >>AMD, both manufacturers are committed to in-order loads from cacheable > >>mem

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Nick Piggin
On Fri, Oct 12, 2007 at 10:25:34AM +0200, Jarek Poplawski wrote: > On 04-10-2007 07:23, Nick Piggin wrote: > > According to latest memory ordering specification documents from Intel and > > AMD, both manufacturers are committed to in-order loads from cacheable > > memory > > for the x86 architectu

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Helge Hafting
Jarek Poplawski wrote: On 04-10-2007 07:23, Nick Piggin wrote: According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture. Hence, smp_rmb() may be a simple barrier. ...

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Jarek Poplawski
On 04-10-2007 07:23, Nick Piggin wrote: > According to latest memory ordering specification documents from Intel and > AMD, both manufacturers are committed to in-order loads from cacheable memory > for the x86 architecture. Hence, smp_rmb() may be a simple barrier. ... Great news! First it looks

[rfc][patch 3/3] x86: optimise barriers

2007-10-03 Thread Nick Piggin
According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture. Hence, smp_rmb() may be a simple barrier. Also according to those documents, and according to existing practice in Lin