Re: [PATCH] lockref: remove cpu_relax() again

2013-09-06 Thread Tony Luck
No new Itanium numbers yet ... but I did wonder how this works on multi-socket x86 ... so I tweaked "t.c. to increase threads to 64 to max out my 4-socket Xeon E5-4650 (8 cores/socket 2 threads/core) and also print out the individual scores from each thread. $ ./t /tmp 64 389827 717666 1540293 130

Re: [PATCH] lockref: remove cpu_relax() again

2013-09-05 Thread Linus Torvalds
On Thu, Sep 5, 2013 at 12:45 PM, Luck, Tony wrote: > > No. I can change the Linux code to say "cmpxchg.rel" here ... but the > h/w will do exactly the same thing it did when I had "cmpxchg.acq". Oh, so when you said "So we had to back-pedal and keep the "legacy" behavior of a full fence", you me

RE: [PATCH] lockref: remove cpu_relax() again

2013-09-05 Thread Luck, Tony
>> Worse still - early processor implementations actually just ignored >> the acquire/release and did a full fence all the time. Unfortunately >> this meant a lot of badly written code that used .acq when they really >> wanted .rel became legacy out in the wild - so when we made a cpu >> that stri

RE: [PATCH] lockref: remove cpu_relax() again

2013-09-05 Thread Luck, Tony
> That said, another thing that strikes me is that you have 32 CPU > threads, and the stupid test-program I sent out had MAX_THREADS set to > 16. Did you change that? Becuase if not, then some of the extreme > performance profile might be about how the threads get scheduled on > your machine (HT t

Re: [PATCH] lockref: remove cpu_relax() again

2013-09-05 Thread Linus Torvalds
On Thu, Sep 5, 2013 at 11:57 AM, Luck, Tony wrote: > > Not "tons", just two. You can ask for "acquire" or "release" semantics, > there is no relaxed option. Seriously? You can't just do a cache-coherent cmpxchg without extra serialization? Oh well. > Worse still - early processor implementation

RE: [PATCH] lockref: remove cpu_relax() again

2013-09-05 Thread Luck, Tony
> Also, it strikes me that ia64 has tons of different versions of > cmpxchg, and the one you use by default is the one with "acquire" > semantics Not "tons", just two. You can ask for "acquire" or "release" semantics, there is no relaxed option. Worse still - early processor implementations actu

Re: [PATCH] lockref: remove cpu_relax() again

2013-09-05 Thread Linus Torvalds
On Thu, Sep 5, 2013 at 10:35 AM, Luck, Tony wrote: > > So Linus is right that the cpu_relax() makes things less fair ... but without > it performance sucks so > much that I don't want to use the clever cmpxchg at all - I'm much better off > without it! Hmm. Is this single-socket? The thing real

Re: [PATCH] lockref: remove cpu_relax() again

2013-09-05 Thread Waiman Long
On 09/05/2013 11:31 AM, Linus Torvalds wrote: On Thu, Sep 5, 2013 at 6:18 AM, Heiko Carstens wrote: *If* however the cpu_relax() makes sense on other platforms maybe we could add something like we have already with "arch_mutex_cpu_relax()": I actually think it won't. The lockref cmpxchg isn'

RE: [PATCH] lockref: remove cpu_relax() again

2013-09-05 Thread Luck, Tony
> And there can't be any livelock, since by definition somebody else > _did_ make progress. In fact, adding the cpu_relax() probably just > makes things much less fair - once somebody else raced on you, the > cpu_relax() now makes it more likely that _another_ cpu does so too. > > That said, let's

Re: [PATCH] lockref: remove cpu_relax() again

2013-09-05 Thread Linus Torvalds
On Thu, Sep 5, 2013 at 6:18 AM, Heiko Carstens wrote: > > *If* however the cpu_relax() makes sense on other platforms maybe we could > add something like we have already with "arch_mutex_cpu_relax()": I actually think it won't. The lockref cmpxchg isn't waiting for something to change - it only

RE: [PATCH] lockref: remove cpu_relax() again

2013-09-05 Thread Luck, Tony
> *If* however the cpu_relax() makes sense on other platforms maybe we could > add something like we have already with "arch_mutex_cpu_relax()": I'll do some more measurements on ia64. During my first tests cpu_relax() seemed to be a big win - but I only ran "./t" a couple of times. Later (with

Re: [PATCH] lockref: remove cpu_relax() again

2013-09-05 Thread Heiko Carstens
On Thu, Sep 05, 2013 at 03:18:14PM +0200, Heiko Carstens wrote: > d472d9d9 "lockref: Relax in cmpxchg loop" added a cpu_relax() call to the > CMPXCHG_LOOP() macro. However to me it seems to be wrong since it is very > likely that the next round will succeed (or the loop will be left). > Even worse:

[PATCH] lockref: remove cpu_relax() again

2013-09-05 Thread Heiko Carstens
d472d9d9 "lockref: Relax in cmpxchg loop" added a cpu_relax() call to the CMPXCHG_LOOP() macro. However to me it seems to be wrong since it is very likely that the next round will succeed (or the loop will be left). Even worse: cpu_relax() is very expensive on s390, since it means yield "my virtual