should sync builtins be full optimization barriers?

2011-09-09 Thread Paolo Bonzini
Hi all, sync builtins are described in the documentations as being full memory barriers, with the possible exception of __sync_lock_test_and_set. However, GCC is not enforcing the fact that they are also full _optimization_ barriers. The RTL produced by builtins does not in general include a

Re: should sync builtins be full optimization barriers?

2011-09-09 Thread Jakub Jelinek
On Fri, Sep 09, 2011 at 10:07:30AM +0200, Paolo Bonzini wrote: > sync builtins are described in the documentations as being full > memory barriers, with the possible exception of > __sync_lock_test_and_set. However, GCC is not enforcing the fact > that they are also full _optimization_ barriers. T

Re: should sync builtins be full optimization barriers?

2011-09-09 Thread Paolo Bonzini
On 09/09/2011 10:17 AM, Jakub Jelinek wrote: > Is the above analysis correct? Or should the users put explicit > compiler barriers? I'd say they should be optimization barriers too (and at the tree level they I think work that way, being represented as function calls), so if they don't act as

Re: should sync builtins be full optimization barriers?

2011-09-09 Thread Andrew MacLeod
On 09/09/2011 04:17 AM, Jakub Jelinek wrote: On Fri, Sep 09, 2011 at 10:07:30AM +0200, Paolo Bonzini wrote: sync builtins are described in the documentations as being full memory barriers, with the possible exception of __sync_lock_test_and_set. However, GCC is not enforcing the fact that they a

Re: should sync builtins be full optimization barriers?

2011-09-09 Thread Paolo Bonzini
On 09/09/2011 04:22 PM, Andrew MacLeod wrote: Yeah, some of this is part of the ongoing C++0x work... the memory model parameter is going to allow certain types of code movement in optimizers based on whether its an acquire operation, a release operation, neither, or both.It is ongoing and

Re: should sync builtins be full optimization barriers?

2011-09-09 Thread Geert Bosch
On Sep 9, 2011, at 04:17, Jakub Jelinek wrote: > I'd say they should be optimization barriers too (and at the tree level > they I think work that way, being represented as function calls), so if > they don't act as memory barriers in RTL, the *.md patterns should be > fixed. The only exception s

Re: should sync builtins be full optimization barriers?

2011-09-09 Thread Paolo Bonzini
On Sat, Sep 10, 2011 at 03:09, Geert Bosch wrote: > For example, for atomic objects accessed only from a single processor > (but  possibly multiple threads), you'd not want the compiler to reorder > memory accesses to global variables across the atomic operations, but > you wouldn't have  to emit

Re: should sync builtins be full optimization barriers?

2011-09-09 Thread Jakub Jelinek
On Fri, Sep 09, 2011 at 09:09:27PM -0400, Geert Bosch wrote: > To be honest, I can't quite see the use of completely unordered > atomic operations, where we not even prohibit compiler optimizations. > It would seem if we guarantee that a variable will not be accessed > concurrently from any other t

Re: should sync builtins be full optimization barriers?

2011-09-11 Thread Andrew MacLeod
On 09/09/2011 09:09 PM, Geert Bosch wrote: For the C++0x atomic types there are: void A::store(C desired, memory_order order = memory_order_seq_cst) volatile; void A::store(C desired, memory_order order = memory_order_seq_cst); where the first variant (with order = memory_order_relaxed) would a

Re: should sync builtins be full optimization barriers?

2011-09-11 Thread Paolo Bonzini
On 09/11/2011 04:12 PM, Andrew MacLeod wrote: tail->value = othervalue // global variable write atomic_exchange (&var, tail) // acquire operation although the optimizer moving the store of tail->value to AFTER the exchange seems very wrong on the surface, it's really

Re: should sync builtins be full optimization barriers?

2011-09-11 Thread Geert Bosch
On Sep 11, 2011, at 10:12, Andrew MacLeod wrote: >> To be honest, I can't quite see the use of completely unordered >> atomic operations, where we not even prohibit compiler optimizations. >> It would seem if we guarantee that a variable will not be accessed >> concurrently from any other thread,

Re: should sync builtins be full optimization barriers?

2011-09-11 Thread Jakub Jelinek
On Sun, Sep 11, 2011 at 03:00:11PM -0400, Geert Bosch wrote: > Also, for relaxed order atomic operations we would only need a single > fence between two accesses (by a thread) to the same atomic object. I'm not aware of any CPUs that would need any kind of fences for that. Nor the compiler should

Re: should sync builtins be full optimization barriers?

2011-09-11 Thread Geert Bosch
On Sep 11, 2011, at 15:11, Jakub Jelinek wrote: > On Sun, Sep 11, 2011 at 03:00:11PM -0400, Geert Bosch wrote: >> Also, for relaxed order atomic operations we would only need a single >> fence between two accesses (by a thread) to the same atomic object. > > I'm not aware of any CPUs that would

Re: should sync builtins be full optimization barriers?

2011-09-11 Thread Jakub Jelinek
On Sun, Sep 11, 2011 at 03:31:15PM -0400, Geert Bosch wrote: > > On Sun, Sep 11, 2011 at 03:00:11PM -0400, Geert Bosch wrote: > >> Also, for relaxed order atomic operations we would only need a single > >> fence between two accesses (by a thread) to the same atomic object. > > > > I'm not aware of

Re: should sync builtins be full optimization barriers?

2011-09-11 Thread Andrew MacLeod
On 09/11/2011 02:22 PM, Paolo Bonzini wrote: On 09/11/2011 04:12 PM, Andrew MacLeod wrote: tail->value = othervalue // global variable write atomic_exchange (&var, tail) // acquire operation although the optimizer moving the store of tail->value to AFTER the exchange

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Paolo Bonzini
On 09/11/2011 09:00 PM, Geert Bosch wrote: So, if I understand correctly, then operations using relaxed memory order will still need fences, but indeed do not require any optimization barrier. For memory_order_seq_cst we'll need a full barrier, and for the others there is a partial barrier. If

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Paolo Bonzini
On 09/12/2011 01:22 AM, Andrew MacLeod wrote: You're right that using lock_test_and_set as an exchange is very wrong because of the compiler barrier semantics, but I think this is entirely a red herring in this case. The same problem could happen with a fetch_and_add or even a lock_release opera

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Geert Bosch
On Sep 12, 2011, at 03:02, Paolo Bonzini wrote: > On 09/11/2011 09:00 PM, Geert Bosch wrote: >> So, if I understand correctly, then operations using relaxed memory >> order will still need fences, but indeed do not require any >> optimization barrier. For memory_order_seq_cst we'll need a full >>

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Paolo Bonzini
On Mon, Sep 12, 2011 at 20:40, Geert Bosch wrote: > Assuming that statement is true, that would imply that even for relaxed > ordering there has to be an optimization barrier. Clearly fences need to be > used for any atomic accesses, including those with relaxed memory order. > > Consider 4 threa

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Andrew MacLeod
On 09/12/2011 02:40 PM, Geert Bosch wrote: thread 1 thread 2 thread 3 thread 4 x=1; r1=x x=3; r3=x; x=2; r2=x x=4; r4=x; Even with relaxed memory ordering, all modifications to x have to occur in some particular tot

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Ken Raeburn
On Sep 12, 2011, at 19:19, Andrew MacLeod wrote: > lets say the order of the writes turns out to be 2,4... is it possible for > both writes to be travelling around some bus and have thread 4 actually read > the second one first, followed by the first one? It would imply a lack of > memory co

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Andy Lutomirski
On 09/12/2011 05:30 PM, Ken Raeburn wrote: > On Sep 12, 2011, at 19:19, Andrew MacLeod wrote: >> lets say the order of the writes turns out to be 2,4... is it possible for >> both writes to be travelling around some bus and have thread 4 actually read >> the second one first, followed by the fi

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Geert Bosch
On Sep 12, 2011, at 19:19, Andrew MacLeod wrote: > Lets simplify it slightly. The compiler can optimize away x=1 and x=3 as > dead stores (even valid on atomics!), leaving us with 2 modification orders.. > 2,4 or 4,2 > and what you are getting at is you don't think we should ever see > r1==

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Lawrence Crowl
On 9/9/11, Geert Bosch wrote: > To be honest, I can't quite see the use of completely unordered > atomic operations, where we not even prohibit compiler optimizations. > It would seem if we guarantee that a variable will not be accessed > concurrently from any other thread, we wouldn't need the op

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Lawrence Crowl
On 9/11/11, Andrew MacLeod wrote: > On 09/09/2011 09:09 PM, Geert Bosch wrote: >> For the C++0x atomic types there are: >> >> void A::store(C desired, memory_order order = memory_order_seq_cst) >> volatile; >> void A::store(C desired, memory_order order = memory_order_seq_cst); >> >> where the fir

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Paolo Bonzini
On Tue, Sep 13, 2011 at 03:52, Geert Bosch wrote: > No, it is possible, and actually likely. Basically, the issue is write > buffers. > The coherency mechanisms come into play at a lower level in the > hierarchy (typically at the last-level cache), which is why we need fences > to start with to i

Re: should sync builtins be full optimization barriers?

2011-09-13 Thread Andrew MacLeod
On 09/12/2011 09:52 PM, Geert Bosch wrote: No that's false. Even on systems with nice memory models, such as x86 and SPARC with a TSO model, you need a fence to avoid that a write-load of the same location is forced to make it all the way to coherent memory and not forwarded directly from th

Re: should sync builtins be full optimization barriers?

2011-09-13 Thread Eric Botcazou
> You need fences on x86 to implement Petterson or Dekkar spin locks but > only because they involve write-read ordering to different memory > locations (I'm mentioning those spin lock algorithms because they do > not require locked memory accesses). Write-write, read-read and for > the same locat

Re: should sync builtins be full optimization barriers?

2011-09-13 Thread Geert Bosch
On Sep 13, 2011, at 08:08, Andrew MacLeod wrote: > On 09/12/2011 09:52 PM, Geert Bosch wrote: >> No that's false. Even on systems with nice memory models, such as x86 and >> SPARC with a TSO model, you need a fence to avoid that a write-load of the >> same location is forced to Note that here w

Re: should sync builtins be full optimization barriers?

2011-09-13 Thread Andrew MacLeod
On 09/13/2011 10:58 AM, Geert Bosch wrote: On Sep 13, 2011, at 08:08, Andrew MacLeod wrote: On 09/12/2011 09:52 PM, Geert Bosch wrote: No that's false. Even on systems with nice memory models, such as x86 and SPARC with a TSO model, you need a fence to avoid that a write-load of the same lo

Re: should sync builtins be full optimization barriers?

2011-09-15 Thread Richard Henderson
> > I'd say they should be optimization barriers too (and at the tree level > > they I think work that way, being represented as function calls), so if > > they don't act as memory barriers in RTL, the *.md patterns should be > > fixed. The only exception should be IMHO the __SYNC_MEM_RELAXED > >

Re: should sync builtins be full optimization barriers?

2011-09-15 Thread Paolo Bonzini
On 09/15/2011 06:19 PM, Richard Henderson wrote: I wouldn't go that far. They *used* to be compiler barriers, but clearly something broke at some point without anyone noticing. We don't know how many versions are affected until we debug it. For all we know it broke in 4.5 and 4.4 is fine. 4.4

Re: should sync builtins be full optimization barriers?

2011-09-20 Thread Paolo Bonzini
On 09/15/2011 06:26 PM, Paolo Bonzini wrote: There's no reference to a GCC bug report about this in the thread. Did the folks over at the libdispatch project never think to file one? I asked them to attach a preprocessed testcase somewhere, but they haven't done so yet. :( They now attached

Re: should sync builtins be full optimization barriers?

2011-09-24 Thread Richard Guenther
On Thu, Sep 15, 2011 at 6:26 PM, Paolo Bonzini wrote: > On 09/15/2011 06:19 PM, Richard Henderson wrote: >> >> I wouldn't go that far.  They *used* to be compiler barriers, >> but clearly something broke at some point without anyone noticing. >> We don't know how many versions are affected until w

Re: should sync builtins be full optimization barriers?

2011-09-26 Thread Michael Matz
Hi, On Tue, 13 Sep 2011, Andrew MacLeod wrote: > Your example was not about regular stores, it used atomic variables. This reads as if there exists non-atomic variables in the new C++ mem-model. Assuming that this is so, why do those ugly requirements of not introducing new data races also ap

Re: should sync builtins be full optimization barriers?

2011-09-26 Thread Richard Guenther
On Sat, Sep 24, 2011 at 11:24 AM, Richard Guenther wrote: > On Thu, Sep 15, 2011 at 6:26 PM, Paolo Bonzini wrote: >> On 09/15/2011 06:19 PM, Richard Henderson wrote: >>> >>> I wouldn't go that far.  They *used* to be compiler barriers, >>> but clearly something broke at some point without anyone

Re: should sync builtins be full optimization barriers?

2011-09-26 Thread Ian Lance Taylor
Michael Matz writes: > Hi, > > On Tue, 13 Sep 2011, Andrew MacLeod wrote: > >> Your example was not about regular stores, it used atomic variables. > > This reads as if there exists non-atomic variables in the new C++ > mem-model. Assuming that this is so, why do those ugly requirements of > n

Re: should sync builtins be full optimization barriers?

2011-09-26 Thread Andrew MacLeod
Hi, On Tue, 13 Sep 2011, Andrew MacLeod wrote: Your example was not about regular stores, it used atomic variables. This reads as if there exists non-atomic variables in the new C++ mem-model. Assuming that this is so, why do those ugly requirements of not introducing new data races also appl

Re: should sync builtins be full optimization barriers?

2011-09-26 Thread James Dennett
On Mon, Sep 26, 2011 at 9:57 AM, Andrew MacLeod wrote: >> Hi, >> >> On Tue, 13 Sep 2011, Andrew MacLeod wrote: >> >>> Your example was not about regular stores, it used atomic variables. >> >> This reads as if there exists non-atomic variables in the new C++ >> mem-model.  Assuming that this is so

Re: should sync builtins be full optimization barriers?

2011-09-26 Thread Andrew MacLeod
On 09/26/2011 01:31 PM, James Dennett wrote: On Mon, Sep 26, 2011 at 9:57 AM, Andrew MacLeod wrote: The C++11 memory model asserts that a program containing data races involving *non-atomic* variables has undefined semantics. The compiler is not allowed to introduce any data races into an othe