No new Itanium numbers yet ... but I did wonder how this works
on multi-socket x86 ... so I tweaked "t.c. to increase threads to
64 to max out my 4-socket Xeon E5-4650 (8 cores/socket 2 threads/core)
and also print out the individual scores from each thread.
$ ./t /tmp 64
389827
717666
1540293
130
On Thu, Sep 5, 2013 at 12:45 PM, Luck, Tony wrote:
>
> No. I can change the Linux code to say "cmpxchg.rel" here ... but the
> h/w will do exactly the same thing it did when I had "cmpxchg.acq".
Oh, so when you said "So we had to back-pedal and keep the "legacy"
behavior of a full fence", you me
>> Worse still - early processor implementations actually just ignored
>> the acquire/release and did a full fence all the time. Unfortunately
>> this meant a lot of badly written code that used .acq when they really
>> wanted .rel became legacy out in the wild - so when we made a cpu
>> that stri
> That said, another thing that strikes me is that you have 32 CPU
> threads, and the stupid test-program I sent out had MAX_THREADS set to
> 16. Did you change that? Becuase if not, then some of the extreme
> performance profile might be about how the threads get scheduled on
> your machine (HT t
On Thu, Sep 5, 2013 at 11:57 AM, Luck, Tony wrote:
>
> Not "tons", just two. You can ask for "acquire" or "release" semantics,
> there is no relaxed option.
Seriously? You can't just do a cache-coherent cmpxchg without extra
serialization? Oh well.
> Worse still - early processor implementation
> Also, it strikes me that ia64 has tons of different versions of
> cmpxchg, and the one you use by default is the one with "acquire"
> semantics
Not "tons", just two. You can ask for "acquire" or "release" semantics,
there is no relaxed option.
Worse still - early processor implementations actu
On Thu, Sep 5, 2013 at 10:35 AM, Luck, Tony wrote:
>
> So Linus is right that the cpu_relax() makes things less fair ... but without
> it performance sucks so
> much that I don't want to use the clever cmpxchg at all - I'm much better off
> without it!
Hmm. Is this single-socket? The thing real
On 09/05/2013 11:31 AM, Linus Torvalds wrote:
On Thu, Sep 5, 2013 at 6:18 AM, Heiko Carstens
wrote:
*If* however the cpu_relax() makes sense on other platforms maybe we could
add something like we have already with "arch_mutex_cpu_relax()":
I actually think it won't.
The lockref cmpxchg isn'
> And there can't be any livelock, since by definition somebody else
> _did_ make progress. In fact, adding the cpu_relax() probably just
> makes things much less fair - once somebody else raced on you, the
> cpu_relax() now makes it more likely that _another_ cpu does so too.
>
> That said, let's
On Thu, Sep 5, 2013 at 6:18 AM, Heiko Carstens
wrote:
>
> *If* however the cpu_relax() makes sense on other platforms maybe we could
> add something like we have already with "arch_mutex_cpu_relax()":
I actually think it won't.
The lockref cmpxchg isn't waiting for something to change - it only
> *If* however the cpu_relax() makes sense on other platforms maybe we could
> add something like we have already with "arch_mutex_cpu_relax()":
I'll do some more measurements on ia64. During my first tests cpu_relax()
seemed
to be a big win - but I only ran "./t" a couple of times. Later (with
On Thu, Sep 05, 2013 at 03:18:14PM +0200, Heiko Carstens wrote:
> d472d9d9 "lockref: Relax in cmpxchg loop" added a cpu_relax() call to the
> CMPXCHG_LOOP() macro. However to me it seems to be wrong since it is very
> likely that the next round will succeed (or the loop will be left).
> Even worse:
d472d9d9 "lockref: Relax in cmpxchg loop" added a cpu_relax() call to the
CMPXCHG_LOOP() macro. However to me it seems to be wrong since it is very
likely that the next round will succeed (or the loop will be left).
Even worse: cpu_relax() is very expensive on s390, since it means yield
"my virtual
13 matches
Mail list logo