https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
Bug ID: 97759 Summary: Could std::has_single_bit implementation be faster? Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: gcc-bugs at marehr dot dialup.fu-berlin.de Target Milestone: --- Hello gcc-team, we are thrilled that C++20 offers some efficient bit implementation and that we could exchange some of our own implementation with the standardized ones, making the code more accessible. I replaced our implementation and noticed that `std::has_single_bit` was slower than what we had before by around 30%. (The other functions matched our timings.) Additionally, we have a (micro-)benchmark that compares the standard arithmetic bit trick (https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2) with the implementation where popcount == 1. We decided to use the arithmetic version, because we measured that it was faster than popcount on our machines (mostly intel processors). Interestingly, it seems that the popcount benchmark matches the std::has_single_bit time-wise, so I guess that std::has_single_bit is implemented via popcount. Those timings could be reproduced at an unknown location https://quick-bench.com/q/Y28keu_mSh25WwhO05T4SKrbHpk I don't know how to fix this, but I would expect that the optimizer would recognize popcount=1 and knows that there is a more efficient version. Or change the implementation to arithmetic, where again the optimizer could decide to replace that by a popcount if that is more efficient on some architecture? Thank you!