On Fri, May 13, 2011 at 07:55:44AM +0200, Piotr Wyderski wrote:
Jakub Jelinek wrote:
/* X86_TUNE_USE_INCDEC */
~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM),
So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl,
if addl is better (e.g. on Atom incl is very bad
Jakub Jelinek wrote:
And that's the right thing to do.
I concur. But the exchange case remains open.
Please file an enhancement request in gcc bugzilla.
Done, 48986. I have also noticed several other missing
optimizations in this area, so I'm about to report them too.
Best regards,
On Thu, 12 May 2011, Piotr Wyderski wrote:
Hello,
I'm not sure if it should be better handled as missed optimization,
but there is a certain lack of functionality in the GCC's __sync_*
function family.
I don't think we should add new functions to that family; instead the aim
should be to
On Thu, May 12, 2011 at 06:11:59PM +0200, Piotr Wyderski wrote:
Hello,
I'm not sure if it should be better handled as missed optimization,
but there is a certain lack of functionality in the GCC's __sync_*
function family.
When implementing a reference counting smart pointer, two
On Thu, May 12, 2011 at 06:11:59PM +0200, Piotr Wyderski wrote:
Unfortunately, onx86/x64 both are compiled in a rather poor way:
__sync_increment:
lock addl $x01,(ptr)
which is longer than:
lock incl (ptr)
GCC actually generates lock incl (ptr) already now, it just depends
on
Jakub Jelinek wrote:
/* X86_TUNE_USE_INCDEC */
~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM),
So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl,
if addl is better (e.g. on Atom incl is very bad compared to addl $1),
it will generate it.
Why is lock inc/dec worse