http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
José Salavert Torres <jsalavert at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jsalavert at gmail dot com --- Comment #8 from José Salavert Torres <jsalavert at gmail dot com> 2012-09-05 10:39:45 UTC --- Hello, there has been any advance in in this issue, Knuth's publication approach would be great for 8 bit registers also. Also, allowing different behaviour for each architecture would be better. In the forums the implementation described here is now like this, seems to use less operations: inline unsigned int bitcount32(uint32_t i) { //Parallel binary bit add i = i - ((i >> 1) & 0x55555555); i = (i & 0x33333333) + ((i >> 2) & 0x33333333); return (((i + (i >> 4)) & 0xF0F0F0F) * 0x1010101) >> 24; } //Parallel binary bit add i = i - ((i >> 1) & 0x5555555555555555); i = (i & 0x3333333333333333) + ((i >> 2) & 0x3333333333333333); return (((i + (i >> 4)) & 0xF0F0F0F0F0F0F0F) * 0x101010101010101) >> 56; }