Re: Memory corruption due to word sharing

2012-02-10 Thread Richard Henderson
On 02/03/2012 12:00 PM, Linus Torvalds wrote: do { load-link %r,%m if (r == value) return 0; add } while (store-conditional %r,%m) return 1; and it is used to implement two *very* common (and critical) reference-counting use cases: - decrement

Re: Memory corruption due to word sharing

2012-02-08 Thread Jason Merrill
On 02/07/2012 05:32 AM, Aldy Hernandez wrote: struct A { virtual void f(); int f1 : 1;--- bit 64 }; struct B : public A { int f2 : 1; // { dg-warning ABI }--- bit 65 int : 0; int f3 : 4; int f4 : 3; }; It is my understanding that f1 and f2 must be in distinct memory regions. So

Re: Memory corruption due to word sharing

2012-02-07 Thread Aldy Hernandez
Testcase is for example g++.dg/abi/bitfield5.C, bit layout annotated: struct A { virtual void f(); int f1 : 1;--- bit 64 }; struct B : public A { int f2 : 1; // { dg-warning ABI }--- bit 65 int : 0; int f3 : 4; int f4 : 3; }; maybe it was a bug (above happens with

Re: Memory corruption due to word sharing

2012-02-06 Thread Torvald Riegel
On Fri, 2012-02-03 at 12:00 -0800, Linus Torvalds wrote: Of course, it you expose some intrinsic for the whole ll/sc model (and you then turn it into cmpxchg on demand), we could literally open-code it. That is likely the most *flexible* approach for a compiler. I think pretty much

Re: Memory corruption due to word sharing

2012-02-03 Thread Richard Guenther
On Fri, 3 Feb 2012, DJ Delorie wrote: Jan Kara j...@suse.cz writes: we've spotted the following mismatch between what kernel folks expect from a compiler and what GCC really does, resulting in memory corruption on some architectures. Consider the following structure: struct x {

Re: Memory corruption due to word sharing

2012-02-03 Thread Matthew Gretton-Dann
On Fri, Feb 03, 2012 at 09:37:22AM +, Richard Guenther wrote: On Fri, 3 Feb 2012, DJ Delorie wrote: Jan Kara j...@suse.cz writes: we've spotted the following mismatch between what kernel folks expect from a compiler and what GCC really does, resulting in memory corruption

Re: Memory corruption due to word sharing

2012-02-03 Thread Andrew MacLeod
And I assume that since the compiler does them, that would now make it impossible for us to gather a list of all the 'lock' prefixes so that we can undo them if it turns out that we are running on a UP machine. When we do SMP operations, we don't just add a lock prefix to it. We do this:

Re: Memory corruption due to word sharing

2012-02-03 Thread Linus Torvalds
On Fri, Feb 3, 2012 at 8:38 AM, Andrew MacLeod amacl...@redhat.com wrote: The atomic intrinsics were created for c++11  memory model compliance, but I am certainly open to enhancements that would make them more useful.   I am planning some enhancements for 4.8 now, and it sounds like you may

Re: Memory corruption due to word sharing

2012-02-03 Thread Andrew MacLeod
On 02/03/2012 12:16 PM, Linus Torvalds wrote: So we have several atomics we use in the kernel, with the more common being - add (and subtract) and cmpchg of both 'int' and 'long' This would be __atomic_fetch_add, __atomic_fetch_sub, and __atomic_compare_exchange. For 4.8

Re: Memory corruption due to word sharing

2012-02-03 Thread Linus Torvalds
On Fri, Feb 3, 2012 at 11:16 AM, Andrew MacLeod amacl...@redhat.com wrote:    The special cases are because older x86 cannot do the generic add_return efficiently - it needs xadd - but can do atomic versions that test the end result and give zero or sign information. Since these are older x86

Re: Memory corruption due to word sharing

2012-02-03 Thread Paul E. McKenney
On Fri, Feb 03, 2012 at 12:00:03PM -0800, Linus Torvalds wrote: On Fri, Feb 3, 2012 at 11:16 AM, Andrew MacLeod amacl...@redhat.com wrote: [ . . . ] Having access to __ATOMIC_ACQUIRE would actually be an improvement - it's just that the architectures that really care about things like that

Re: Memory corruption due to word sharing

2012-02-03 Thread Richard Guenther
(Somehow my reply was private to Aldy ... forwarding to gcc-patches now, given that it contains a patch and we changed topics) On Fri, 3 Feb 2012, Richard Guenther wrote: On Thu, 2 Feb 2012, Aldy Hernandez wrote: Linus Torvalds torva...@linux-foundation.org writes: Seriously - is

Re: Memory corruption due to word sharing

2012-02-03 Thread Richard Guenther
On Fri, 3 Feb 2012, Richard Guenther wrote: On Fri, 3 Feb 2012, Richard Guenther wrote: On Thu, 2 Feb 2012, Aldy Hernandez wrote: Linus Torvalds torva...@linux-foundation.org writes: Seriously - is there any real argument *against* just using the base type as a hint for

RE: Memory corruption due to word sharing

2012-02-02 Thread Bernd Petrovitsch
On Mit, 2012-02-01 at 21:04 +, Boehm, Hans wrote: [...] The C11 memory model potentially adds overhead in only two cases: 1. When current code involves touching a field that wouldn't otherwise be touched. There are odd cases in which this measurably slows down code, but I think all

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther
On Wed, 1 Feb 2012, Linus Torvalds wrote: On Wed, Feb 1, 2012 at 9:41 AM, Michael Matz m...@suse.de wrote: One problem is that it's not a new problem, GCC emitted similar code since about forever, and still they turned up only now (well, probably because ia64 is dead, but sparc64 should

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther
On Wed, 1 Feb 2012, Linus Torvalds wrote: Just out of morbid curiosity, what happens if you have totally *separate* variables that just happen to link together? IOW, something like static struct { unsigned bit:1; } onebit; static volatile int var; and they just *happen* to link

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther
On Wed, 1 Feb 2012, Linus Torvalds wrote: IT IS A GCC BUG. I think you are wrong when you assume that we think it is not a bug. Repeating it repeatedly in all-caps doesn't make the fix appear faster though. Richard.

Re: Memory corruption due to word sharing

2012-02-02 Thread James Courtier-Dutton
On 1 February 2012 15:19, Jan Kara j...@suse.cz wrote:  Hello,  we've spotted the following mismatch between what kernel folks expect from a compiler and what GCC really does, resulting in memory corruption on some architectures. Consider the following structure: struct x {    long a;    

Re: Memory corruption due to word sharing

2012-02-02 Thread David Sterba
On Wed, Feb 01, 2012 at 04:19:18PM +0100, Jan Kara wrote: We actually spotted this race in practice in btrfs on structure fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got corrupted due to update of following bitfield and there seem to be other places in kernel where this

Re: Memory corruption due to word sharing

2012-02-02 Thread Ingo Molnar
* Linus Torvalds torva...@linux-foundation.org wrote: [...] And I realize that compiler people tend to think that loop hoisting etc is absolutely critical for performance, and some big hammer like barrier() makes a compiler person wince. You think it results in horrible code generation

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther
On Thu, 2 Feb 2012, David Sterba wrote: On Wed, Feb 01, 2012 at 04:19:18PM +0100, Jan Kara wrote: We actually spotted this race in practice in btrfs on structure fs/btrfs/ctree.h:struct btrfs_block_rsv where spinlock content got corrupted due to update of following bitfield and there seem

Re: Memory corruption due to word sharing

2012-02-02 Thread Richard Guenther
On Thu, 2 Feb 2012, James Courtier-Dutton wrote: On 1 February 2012 15:19, Jan Kara j...@suse.cz wrote:  Hello,  we've spotted the following mismatch between what kernel folks expect from a compiler and what GCC really does, resulting in memory corruption on some architectures.

Re: Memory corruption due to word sharing

2012-02-02 Thread Michael Matz
Hi, On Wed, 1 Feb 2012, Linus Torvalds wrote: But I also think that gcc is simply *buggy*, and has made them much nastier than they should be. What gcc *should* have done is to turn bitfield accesses into shift-and-masking of the underlying field as early as possible, and then do all

Re: Memory corruption due to word sharing

2012-02-02 Thread Aldy Hernandez
Linus Torvalds torva...@linux-foundation.org writes: Seriously - is there any real argument *against* just using the base type as a hint for access size? If I'm on the hook for attempting to fix this again, I'd also like to know if there are any arguments against using the base type.

Re: Memory corruption due to word sharing

2012-02-02 Thread Michael Matz
Hi, On Thu, 2 Feb 2012, Aldy Hernandez wrote: Seriously - is there any real argument *against* just using the base type as a hint for access size? If I'm on the hook for attempting to fix this again, I'd also like to know if there are any arguments against using the base type. Sure.

Re: Memory corruption due to word sharing

2012-02-02 Thread Linus Torvalds
On Thu, Feb 2, 2012 at 8:28 AM, Michael Matz m...@suse.de wrote: Sure.  Simplest example: struct s {int i:24;} __attribute__((packed)). You must access only three bytes, no matter what.  The basetype (int) is four bytes. Ok, so here's a really *stupid* (but also really really simple) patch

Re: Memory corruption due to word sharing

2012-02-02 Thread Paul E. McKenney
On Wed, Feb 01, 2012 at 03:11:00PM -0800, Linus Torvalds wrote: On Wed, Feb 1, 2012 at 2:45 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: My (perhaps forlorn and naive) hope is that C++11 memory_order_relaxed will eventually allow ACCESS_ONCE() to be upgraded so that (for example)

Re: Memory corruption due to word sharing

2012-02-02 Thread Linus Torvalds
On Thu, Feb 2, 2012 at 10:42 AM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: SMP-atomic or percpu atomic? Or both? Only SMP-atomic. And I assume that since the compiler does them, that would now make it impossible for us to gather a list of all the 'lock' prefixes so that we can undo

Re: Memory corruption due to word sharing

2012-02-02 Thread Paul E. McKenney
On Thu, Feb 02, 2012 at 11:08:25AM -0800, Linus Torvalds wrote: On Thu, Feb 2, 2012 at 10:42 AM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: SMP-atomic or percpu atomic? Or both? Only SMP-atomic. And I assume that since the compiler does them, that would now make it impossible

Re: Memory corruption due to word sharing

2012-02-02 Thread DJ Delorie
Jan Kara j...@suse.cz writes: we've spotted the following mismatch between what kernel folks expect from a compiler and what GCC really does, resulting in memory corruption on some architectures. Consider the following structure: struct x { long a; unsigned int b1; unsigned

Memory corruption due to word sharing

2012-02-01 Thread Jan Kara
Hello, we've spotted the following mismatch between what kernel folks expect from a compiler and what GCC really does, resulting in memory corruption on some architectures. Consider the following structure: struct x { long a; unsigned int b1; unsigned int b2:1; }; We have two

Re: Memory corruption due to word sharing

2012-02-01 Thread Markus Trippelsdorf
On 2012.02.01 at 16:19 +0100, Jan Kara wrote: we've spotted the following mismatch between what kernel folks expect from a compiler and what GCC really does, resulting in memory corruption on some architectures. Consider the following structure: struct x { long a; unsigned int b1;

Re: Memory corruption due to word sharing

2012-02-01 Thread Colin Walters
On Wed, 2012-02-01 at 16:19 +0100, Jan Kara wrote: I've raised the issue with our GCC guys and they said to me that: C does not provide such guarantee, nor can you reliably lock different structure fields with different locks if they share naturally aligned word-size memory regions. The

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 7:19 AM, Jan Kara j...@suse.cz wrote:  we've spotted the following mismatch between what kernel folks expect from a compiler and what GCC really does, resulting in memory corruption on some architectures. This is sad. We've had something like this before due to

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 8:37 AM, Colin Walters walt...@verbum.org wrote: 1) Use the same lock for a given bitfield That's not the problem. All the *bitfield* fields are all accessed under the same word already. 2) Split up the bitfield into different words Again, it's not the bitfield that is

Re: Memory corruption due to word sharing

2012-02-01 Thread Torvald Riegel
On Wed, 2012-02-01 at 16:19 +0100, Jan Kara wrote: I've raised the issue with our GCC guys and they said to me that: C does not provide such guarantee, nor can you reliably lock different structure fields with different locks if they share naturally aligned word-size memory regions. The C++11

Re: Memory corruption due to word sharing

2012-02-01 Thread Jiri Kosina
On Wed, 1 Feb 2012, Linus Torvalds wrote: But the compiler turns the access to the bitfield (in a 32-bit aligned word) into a 64-bit access that accesses the word *next* to it. That word next to it might *be* the lock, for example. So we could literally have this kind of situation:

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 9:08 AM, Torvald Riegel trie...@redhat.com wrote: What do the kernel folks think about the C11 memory model?  If you can spot any issues in there, the GCC community would certainly like to know. I don't think this is about memory models except very tangentially. Gcc

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 9:11 AM, Jiri Kosina jkos...@suse.cz wrote: On Wed, 1 Feb 2012, Linus Torvalds wrote: And I suspect it really is a generic bug that can be shown even with the above trivial example. I have actually tried exactly this earlier today (because while looking at this, I had

Re: Memory corruption due to word sharing

2012-02-01 Thread Michael Matz
Hi, On Wed, 1 Feb 2012, Jiri Kosina wrote: # cat x.c struct x { long a; volatile unsigned int lock; unsigned int full:1; }; void wrong(struct x *ptr) { ptr-full = 1; } In my opinion, this is a clear bug Even that depends (sadly) on who you ask. half-volatile

Re: Memory corruption due to word sharing

2012-02-01 Thread Dennis Clarke
I have actually tried exactly this earlier today (because while looking at this, I had an idea that putting volatile in place could be a workaround, causing gcc to generate a saner code), but it doesn't work either: # cat x.c struct x { long a; volatile unsigned int lock;

Re: Memory corruption due to word sharing

2012-02-01 Thread David Miller
From: Michael Matz m...@suse.de Date: Wed, 1 Feb 2012 18:41:05 +0100 (CET) One problem is that it's not a new problem, GCC emitted similar code since about forever, and still they turned up only now (well, probably because ia64 is dead, but sparc64 should have similar problems). Indeed, on

Re: Memory corruption due to word sharing

2012-02-01 Thread Jeff Law
On 02/01/2012 11:09 AM, David Miller wrote: From: Michael Matzm...@suse.de Date: Wed, 1 Feb 2012 18:41:05 +0100 (CET) One problem is that it's not a new problem, GCC emitted similar code since about forever, and still they turned up only now (well, probably because ia64 is dead, but sparc64

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 9:41 AM, Michael Matz m...@suse.de wrote: One problem is that it's not a new problem, GCC emitted similar code since about forever, and still they turned up only now (well, probably because ia64 is dead, but sparc64 should have similar problems).  The bitfield handling

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 10:09 AM, David Miller da...@davemloft.net wrote: Personally I've avoided C bitfields like the plague in any code I've written. I do agree with that. The kernel largely tries to avoid bitfields, usually because we have some really strict rules about different bitfields,

Re: Memory corruption due to word sharing

2012-02-01 Thread Peter Bergner
On Wed, 2012-02-01 at 13:09 -0500, David Miller wrote: From: Michael Matz m...@suse.de Date: Wed, 1 Feb 2012 18:41:05 +0100 (CET) One problem is that it's not a new problem, GCC emitted similar code since about forever, and still they turned up only now (well, probably because ia64 is

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 10:45 AM, Jeff Law l...@redhat.com wrote: Torvald Riegel I were told that was kernel policy when we brought up the upcoming bitfield semantic changes with some of the linux kernel folks last year. Btw, one reason this is true is that the bitfield ordering/packing is so

Re: Memory corruption due to word sharing

2012-02-01 Thread Torvald Riegel
On Wed, 2012-02-01 at 08:41 -0800, Linus Torvalds wrote: If the gcc people aren't willing to agree that this is actually a flaw in the standard (one that is being addressed, no less) It has been addressed in the standards. and try to fix it, Again, this is being worked on, see

Re: Memory corruption due to word sharing

2012-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2012 at 06:42:54PM +0100, Torvald Riegel wrote: We need a proper memory model. No vague assumptions with lots of hand-waving. If you think that this is simple stuff and can sufficiently described by don't do anything stupid, then please have a look at the issues that the Java

RE: Memory corruption due to word sharing

2012-02-01 Thread Boehm, Hans
corruption due to word sharing On Wed, 2012-02-01 at 08:41 -0800, Linus Torvalds wrote: If the gcc people aren't willing to agree that this is actually a flaw in the standard (one that is being addressed, no less) It has been addressed in the standards. and try to fix it, Again

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 9:42 AM, Torvald Riegel trie...@redhat.com wrote: We need a proper memory model. Not really. The fact is, the kernel will happily take the memory model of the underlying hardware. Trying to impose some compiler description of the memory model is actually horribly bad,

Re: Memory corruption due to word sharing

2012-02-01 Thread Jeff Law
On 02/01/2012 12:44 PM, Boehm, Hans wrote: C11 is a published standard. Last I checked, gcc did not follow many of the above rules. It looks like major changes were recently merged into the gcc trunk, and I haven't had a chance to test those, so it may well be fixed. But more importantly,

Re: Memory corruption due to word sharing

2012-02-01 Thread Alan Cox
So here's basically what the kernel needs: - if we don't touch a field, the compiler doesn't touch it. This is the rule that gcc now violates with bitfields. This is a gcc bug. End of story. The volatile example proves it - anybody who argues otherwise is simply wrong, and is

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 11:40 AM, Jakub Jelinek ja...@redhat.com wrote: Well, the C++11/C11 model doesn't allow to use the underlying type for accesses, consider e.g. struct S { long s1; unsigned int s2 : 5; unsigned int s3 : 19; unsigned char s4; unsigned int s5; }; struct T { long s1 :

Re: Memory corruption due to word sharing

2012-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2012 at 12:01:46PM -0800, Linus Torvalds wrote: On Wed, Feb 1, 2012 at 11:40 AM, Jakub Jelinek ja...@redhat.com wrote: struct S { long s1; unsigned int s2 : 5; unsigned int s3 : 19; unsigned char s4; unsigned int s5; }; struct T { long t1 : 16; unsigned int t2; }; on

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 12:01 PM, Linus Torvalds torva...@linux-foundation.org wrote:  - However, while using the *smallest* possible access may generate correct code, it often generates really *crappy* code. Which is exactly the bug that I reported in  

Re: Memory corruption due to word sharing

2012-02-01 Thread Torvald Riegel
On Wed, 2012-02-01 at 11:47 -0800, Linus Torvalds wrote: On Wed, Feb 1, 2012 at 9:42 AM, Torvald Riegel trie...@redhat.com wrote: We need a proper memory model. Not really. The fact is, the kernel will happily take the memory model of the underlying hardware. You do rely on the

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 12:16 PM, Jakub Jelinek ja...@redhat.com wrote: So the kernel really doesn't care what you do to things *within* the bitfield. But what is *within* the bitfield?  Do you consider s4 or t2 fields (non-bitfield fields that just the ABI wants to pack together with the

Re: Memory corruption due to word sharing

2012-02-01 Thread Torvald Riegel
On Wed, 2012-02-01 at 09:29 -0800, Linus Torvalds wrote: On Wed, Feb 1, 2012 at 9:08 AM, Torvald Riegel trie...@redhat.com wrote: What do the kernel folks think about the C11 memory model? If you can spot any issues in there, the GCC community would certainly like to know. I don't

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 12:41 PM, Torvald Riegel trie...@redhat.com wrote: You do rely on the compiler to do common transformations I suppose: hoist loads out of loops, CSE, etc.  How do you expect the compiler to know whether they are allowed for a particular piece of code or not? We have

RE: Memory corruption due to word sharing

2012-02-01 Thread Boehm, Hans
From: Linus Torvalds We need a proper memory model. Not really. The fact is, the kernel will happily take the memory model of the underlying hardware. Trying to impose some compiler description of the memory model is actually horribly bad, because it automatically also involves

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 12:53 PM, Torvald Riegel trie...@redhat.com wrote: For volatile, I agree. However, the original btrfs example was *without* a volatile, and that's why I raised the memory model point.  This triggered an error in a concurrent execution, so that's memory model land, at

Re: Memory corruption due to word sharing

2012-02-01 Thread Torvald Riegel
On Wed, 2012-02-01 at 12:59 -0800, Linus Torvalds wrote: On Wed, Feb 1, 2012 at 12:41 PM, Torvald Riegel trie...@redhat.com wrote: You do rely on the compiler to do common transformations I suppose: hoist loads out of loops, CSE, etc. How do you expect the compiler to know whether they

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 1:24 PM, Torvald Riegel trie...@redhat.com wrote: It's not the only thing we do. We have cases where it's not that you can't hoist things outside of loops, it's that you have to read things exactly *once*, and then use that particular value (ie the compiler can't be

RE: Memory corruption due to word sharing

2012-02-01 Thread Boehm, Hans
From: Torvald Riegel Oh, one of my favorite (NOT!) pieces of code in the kernel is the implementation of the smp_read_barrier_depends() macro, which on every single architecture except for one (alpha) is a no-op. We have basically 30 or so empty definitions for it, and I think

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 1:25 PM, Boehm, Hans hans.bo...@hp.com wrote: Here are some more interesting ones that illustrate the issues (all declarations are non-local, unless stated otherwise): struct { char a; int b:9; int c:7; char d} x; Is x.b = 1 allowed to overwrite x.a?  C11 says no,

Re: Memory corruption due to word sharing

2012-02-01 Thread Paul E. McKenney
On Wed, Feb 01, 2012 at 12:59:24PM -0800, Linus Torvalds wrote: On Wed, Feb 1, 2012 at 12:41 PM, Torvald Riegel trie...@redhat.com wrote: You do rely on the compiler to do common transformations I suppose: hoist loads out of loops, CSE, etc.  How do you expect the compiler to know whether

Re: Memory corruption due to word sharing

2012-02-01 Thread Linus Torvalds
On Wed, Feb 1, 2012 at 2:45 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: My (perhaps forlorn and naive) hope is that C++11 memory_order_relaxed will eventually allow ACCESS_ONCE() to be upgraded so that (for example) access-once increments can generate a single increment-memory