https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114338
Bug ID: 114338 Summary: (x & (-1 << y)) should be optimized to ((x >> y) << y) or vice versa Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: Explorer09 at gmail dot com Target Milestone: --- ### Test code ```c #include <stdint.h> unsigned int func1(unsigned int x, unsigned char count) { return (x >> count) << count; } unsigned int func2(unsigned int x, unsigned char count) { return x & (~0U << count); } uint16_t func3(uint16_t x, unsigned char count) { return (x >> count) << count; } uint16_t func4(uint16_t x, unsigned char count) { return x & (0xFFFF << count); } ``` ### Expected result func1 and func2 should compile to identical code. The compiler should pick the pattern that produces the smallest or fastest code for the target processor. func3 and func4 should compile to identical code, too. If the ABI doesn't require the upper bits of the registers to be zeroed, then func3 and func4 code size could be as small as func1 and func2. ### Current result (gcc) With x86-64 gcc 13.2.0 and "-Os" option func2 generates code that is one byte larger than func1. func3 generates a MOVZX instruction (one byte larger than func1) that ideally can be replaced with a MOV. For func4, I guess you can put the test code in Compiler Explorer (godbolt.org) and see the result yourself. (It's the real case I was facing. I only need to work with a 16-bit input value and I don't want to internally promote to uint32_t just for optimization purpose.) ### Current result (clang) clang 18.1.0 and "-Os" option (Tested in Compiler Explorer (godbolt.org)) func1, func2 and func3 produce identical code in x86-64. It seems that clang does recognize the pattern and optimize accordingly. For x86-64 target, ((x >> count) << count) is used. For ARM and RISC-V targets, ((-1 << count) & x) is used. However clang doesn't recognize func4 as identical to func3, so there are rooms to improve.