[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569 --- Comment #10 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:234d9acfd25aa3b74958e2f1c41e46d483b92d24 commit r17-330-g234d9acfd25aa3b74958e2f1c41e46d483b92d24 Author: Jakub Jelinek Date: Tue May 5 10:45:37 2026 +0200 testsuite: Fix up pr122569*.c tests [PR122569] On Tue, May 05, 2026 at 02:27:23PM +0800, H.J. Lu wrote: > The new tests failed with -m32 on Linux/x86-64: > > FAIL: gcc.dg/tree-ssa/pr122569-1.c scan-tree-dump forwprop1 > "__builtin_ctz|\\.CTZ" > FAIL: gcc.dg/tree-ssa/pr122569-2.c scan-tree-dump forwprop1 > "__builtin_clz|\\.CLZ" > > Should these tests require int128? They should first of all require ctzll resp. clzll effective targets, if there is a function call for those, then it certainly isn't optimized. The problem is that that isn't enough, ia32 is both ctzll and clzll effective target. That is because we handle double-word __builtin_c[tl]zll by doing 2 word ops and one conditional. The tree-ssa-forwprop.cc optimization is checking for whether it can use IFN_CLZ/IFN_CTZ, and that is not the case, because we only use direct optab for that and don't have the double-word unop fallback for that. Rather than int128 I think it is more natural to test for lp64 || llp64. 2026-05-05 Jakub Jelinek PR tree-optimization/122569 * gcc.dg/tree-ssa/pr122569-1.c: Only require __builtin_ctz/.CTZ on ctzll 64-bit targets. * gcc.dg/tree-ssa/pr122569-2.c: Only require __builtin_clz/.CLZ on clzll 64-bit targets. Reviewed-by: Richard Biener
[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569 --- Comment #9 from GCC Commits --- The master branch has been updated by Philipp Tomsich : https://gcc.gnu.org/g:a66820ce3f5b298759042b6543a056b490370310 commit r17-286-ga66820ce3f5b298759042b6543a056b490370310 Author: Philipp Tomsich Date: Thu Apr 9 10:58:23 2026 +0200 tree-optimization/122569 - fix DeBruijn CLZ table validator shift-by-64 UB simplify_count_zeroes validates DeBruijn CLZ tables by computing (1 << (data + 1)) - 1 to simulate the value produced by the OR-cascade b |= b >> 1; ... b |= b >> 32. For 64-bit input with data == 63 (the MSB bit), data + 1 equals HOST_BITS_PER_WIDE_INT, making the shift (HOST_WIDE_INT_1U << 64) undefined behavior. Hosts typically produce 0, so the check (0 * magic) >> 58 == 63 fails and check_table_array returns false. Every well-formed 64-bit DeBruijn CLZ table has an entry mapping the all-ones value to bit 63, so this UB rejected every such table -- including the magic 0x03f79d71b4cb0a89 used in Stockfish's msb(), zstd's bits.h, and cpython's pycore_bitutils.h. Fix by special-casing data + 1 == HOST_BITS_PER_WIDE_INT to use HOST_WIDE_INT_M1U. Only the 64-bit CLZ path is affected. gcc/ChangeLog: PR tree-optimization/122569 * tree-ssa-forwprop.cc (simplify_count_zeroes): Avoid shift-by-HOST_BITS_PER_WIDE_INT UB when computing the all-ones value for the CLZ validator. gcc/testsuite/ChangeLog: PR tree-optimization/122569 * gcc.dg/tree-ssa/pr122569-1.c: New test. * gcc.dg/tree-ssa/pr122569-2.c: New test.
[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569 --- Comment #8 from ptomsich at gcc dot gnu.org --- Submitted as - https://gcc.gnu.org/pipermail/gcc-patches/2026-April/712665.html - https://gcc.gnu.org/pipermail/gcc-patches/2026-April/712666.html
[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569 --- Comment #7 from ptomsich at gcc dot gnu.org --- We found the exact pattern from comment #3 in Microsoft's SEAL crypto: https://github.com/microsoft/SEAL/blob/main/native/src/seal/util/common.h#L435
[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569 --- Comment #6 from ptomsich at gcc dot gnu.org --- The test case from comment #3 is different (and I am still looking where this came from originally): it is a CTZ constant and uses the (value - (value >> 1)) expression to isolate the highest set bit after the or-cascade. In this case, the recognition on the clz_table_index pattern misses.
[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569 --- Comment #5 from ptomsich at gcc dot gnu.org --- The root-cause is in the CLZ table validator in simplify_count_zeroes (gcc/tree-ssa-forwprop.cc): it invokes undefined behaviour when validating 64-bit DeBruijn CLZ tables. The checkfn computes (1 << (data + 1)) - 1 to simulate the value produced by the OR-cascade b |= b >> 1; ... b |= b >> 32 — i.e. the value with all bits from position 0 up to the original MSB set. When the input type is 64-bit and the table entry is 63 (the MSB position, which is the bit position reached when the OR-cascade input has bit 63 set), data + 1 equals HOST_BITS_PER_WIDE_INT, so the shift HOST_WIDE_INT_1U << 64 is UB. In practice the host produces 0, turning the check (0 * magic) >> 58 == 63 into false. The bad entry drops the match count from bits + 1 to exactly bits, and check_table_array returns false because it requires strictly more than bits matches. Every valid 64-bit DeBruijn CLZ table has an entry mapping the all-ones value to bit 63, so this UB rejects every well-formed 64-bit DeBruijn CLZ table — including the magic 0x03f79d71b4cb0a89 found in Stockfish's msb(), zstd's lib/common/bits.h, and cpython's Include/internal/pycore_bitutils.h. The 32-bit CLZ path and the CTZ path are not affected.
[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569 ptomsich at gcc dot gnu.org changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |ptomsich at gcc dot gnu.org CC||ptomsich at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from ptomsich at gcc dot gnu.org --- Confirmed against trunk.
[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569
--- Comment #3 from Feng Xue ---
Here is the real case:
void get_msb_index(unsigned long *result, uint64_t value)
{
static const unsigned long deBruijnTable64[64] = { 63, 0, 58, 1, 59, 47,
53, 2, 60, 39, 48, 27, 54,
33, 42, 3, 61, 51, 37,
40, 49, 18, 28, 20, 55, 30,
34, 11, 43, 14, 22, 4,
62, 57, 46, 52, 38, 26, 32,
41, 50, 36, 17, 19, 29,
10, 13, 21, 56, 45, 25, 31,
35, 16, 9, 12, 44, 24,
15, 8, 23, 7, 6, 5 };
value |= value >> 1;
value |= value >> 2;
value |= value >> 4;
value |= value >> 8;
value |= value >> 16;
value |= value >> 32;
*result = deBruijnTable64[((value - (value >> 1)) *
uint64_t(0x07EDD5E59A4E28C2)) >> 58];
}
[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569 --- Comment #2 from Feng Xue --- (In reply to Richard Biener from comment #1) > Do these come from some actual cases used? Yes, the actual case is for 64-bit integer. I merely split the problem of match failure into three issues above.
[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569 Richard Biener changed: What|Removed |Added CC||rguenth at gcc dot gnu.org Last reconfirmed||2025-11-05 Version|unknown |16.0 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Keywords||missed-optimization --- Comment #1 from Richard Biener --- Do these come from some actual cases used?
