http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48634
Summary: Missed optimization for use of __builtin_ctzll() and __builtin_clzll Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: svfue...@gmail.com Target: amd64 unsigned long long foo(unsigned long long x) { return __builtin_ctzll(x); } Compiles into bsf %rdi,%rax cltq retq at -O3 with 4.6.0 The cltq instruction isn't needed because the bitscan instruction will zero out the upper 32 bits of rax. Basically, the return value of these intrinsics should be unsigned long long instead of int on 64 bit machines. The ABI means that the reverse process of truncating back down to an int costs zero instructions.