[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn

2026-05-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569

--- Comment #10 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:234d9acfd25aa3b74958e2f1c41e46d483b92d24

commit r17-330-g234d9acfd25aa3b74958e2f1c41e46d483b92d24
Author: Jakub Jelinek 
Date:   Tue May 5 10:45:37 2026 +0200

testsuite: Fix up pr122569*.c tests [PR122569]

On Tue, May 05, 2026 at 02:27:23PM +0800, H.J. Lu wrote:
> The new tests failed with -m32 on Linux/x86-64:
>
> FAIL: gcc.dg/tree-ssa/pr122569-1.c scan-tree-dump forwprop1
> "__builtin_ctz|\\.CTZ"
> FAIL: gcc.dg/tree-ssa/pr122569-2.c scan-tree-dump forwprop1
> "__builtin_clz|\\.CLZ"
>
> Should these tests require int128?

They should first of all require ctzll resp. clzll effective targets,
if there is a function call for those, then it certainly isn't optimized.

The problem is that that isn't enough, ia32 is both ctzll and clzll
effective target.  That is because we handle double-word __builtin_c[tl]zll
by doing 2 word ops and one conditional.
The tree-ssa-forwprop.cc optimization is checking for whether it can use
IFN_CLZ/IFN_CTZ, and that is not the case, because we only use direct optab
for that and don't have the double-word unop fallback for that.

Rather than int128 I think it is more natural to test for lp64 || llp64.

2026-05-05  Jakub Jelinek  

PR tree-optimization/122569
* gcc.dg/tree-ssa/pr122569-1.c: Only require __builtin_ctz/.CTZ
on ctzll 64-bit targets.
* gcc.dg/tree-ssa/pr122569-2.c: Only require __builtin_clz/.CLZ
on clzll 64-bit targets.

Reviewed-by: Richard Biener 

[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn

2026-05-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569

--- Comment #9 from GCC Commits  ---
The master branch has been updated by Philipp Tomsich :

https://gcc.gnu.org/g:a66820ce3f5b298759042b6543a056b490370310

commit r17-286-ga66820ce3f5b298759042b6543a056b490370310
Author: Philipp Tomsich 
Date:   Thu Apr 9 10:58:23 2026 +0200

tree-optimization/122569 - fix DeBruijn CLZ table validator shift-by-64 UB

simplify_count_zeroes validates DeBruijn CLZ tables by computing
(1 << (data + 1)) - 1 to simulate the value produced by the OR-cascade
b |= b >> 1; ... b |= b >> 32.  For 64-bit input with data == 63 (the
MSB bit), data + 1 equals HOST_BITS_PER_WIDE_INT, making the shift
(HOST_WIDE_INT_1U << 64) undefined behavior.  Hosts typically produce
0, so the check (0 * magic) >> 58 == 63 fails and check_table_array
returns false.

Every well-formed 64-bit DeBruijn CLZ table has an entry mapping the
all-ones value to bit 63, so this UB rejected every such table --
including the magic 0x03f79d71b4cb0a89 used in Stockfish's msb(),
zstd's bits.h, and cpython's pycore_bitutils.h.

Fix by special-casing data + 1 == HOST_BITS_PER_WIDE_INT to use
HOST_WIDE_INT_M1U.  Only the 64-bit CLZ path is affected.

gcc/ChangeLog:

PR tree-optimization/122569
* tree-ssa-forwprop.cc (simplify_count_zeroes): Avoid
shift-by-HOST_BITS_PER_WIDE_INT UB when computing the all-ones
value for the CLZ validator.

gcc/testsuite/ChangeLog:

PR tree-optimization/122569
* gcc.dg/tree-ssa/pr122569-1.c: New test.
* gcc.dg/tree-ssa/pr122569-2.c: New test.

[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn

2026-04-10 Thread ptomsich at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569

--- Comment #8 from ptomsich at gcc dot gnu.org ---
Submitted as
- https://gcc.gnu.org/pipermail/gcc-patches/2026-April/712665.html
- https://gcc.gnu.org/pipermail/gcc-patches/2026-April/712666.html

[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn

2026-04-09 Thread ptomsich at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569

--- Comment #7 from ptomsich at gcc dot gnu.org ---
We found the exact pattern from comment #3 in Microsoft's SEAL crypto:
https://github.com/microsoft/SEAL/blob/main/native/src/seal/util/common.h#L435

[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn

2026-04-09 Thread ptomsich at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569

--- Comment #6 from ptomsich at gcc dot gnu.org ---
The test case from comment #3 is different (and I am still looking where this
came from originally): it is a CTZ constant and uses the (value - (value >> 1))
expression to isolate the highest set bit after the or-cascade.

In this case, the recognition on the clz_table_index pattern misses.

[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn

2026-04-09 Thread ptomsich at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569

--- Comment #5 from ptomsich at gcc dot gnu.org ---
The root-cause is in the CLZ table validator in simplify_count_zeroes
(gcc/tree-ssa-forwprop.cc): it invokes undefined behaviour when validating
64-bit DeBruijn CLZ tables. 

The checkfn computes (1 << (data + 1)) - 1 to simulate the value produced by
the OR-cascade b |= b >> 1; ... b |= b >> 32 — i.e. the value with all bits
from position 0 up to the original MSB set. When the input type is 64-bit and
the table entry is 63 (the MSB position, which is the bit position reached when
the OR-cascade input has bit 63 set), data + 1 equals HOST_BITS_PER_WIDE_INT,
so the shift HOST_WIDE_INT_1U << 64 is UB.
In practice the host produces 0, turning the check (0 * magic) >> 58 == 63 into
false. The bad entry drops the match count from bits + 1 to exactly bits, and
check_table_array returns false because it requires strictly more than bits
matches.

Every valid 64-bit DeBruijn CLZ table has an entry mapping the all-ones value
to bit 63, so this UB rejects every well-formed 64-bit DeBruijn CLZ table —
including the magic 0x03f79d71b4cb0a89 found in Stockfish's msb(), zstd's
lib/common/bits.h, and cpython's Include/internal/pycore_bitutils.h.

The 32-bit CLZ path and the CTZ path are not affected.

[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn

2026-04-08 Thread ptomsich at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569

ptomsich at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ptomsich at gcc dot 
gnu.org
 CC||ptomsich at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from ptomsich at gcc dot gnu.org ---
Confirmed against trunk.

[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn

2025-11-05 Thread fxue at os dot amperecomputing.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569

--- Comment #3 from Feng Xue  ---
Here is the real case:

void get_msb_index(unsigned long *result, uint64_t value)
{
static const unsigned long deBruijnTable64[64] = { 63, 0,  58, 1,  59, 47,
53, 2,  60, 39, 48, 27, 54,
   33, 42, 3,  61, 51, 37,
40, 49, 18, 28, 20, 55, 30,
   34, 11, 43, 14, 22, 4, 
62, 57, 46, 52, 38, 26, 32,
   41, 50, 36, 17, 19, 29,
10, 13, 21, 56, 45, 25, 31,
   35, 16, 9,  12, 44, 24,
15, 8,  23, 7,  6,  5 };

value |= value >> 1;
value |= value >> 2;
value |= value >> 4;
value |= value >> 8;
value |= value >> 16;
value |= value >> 32;

*result = deBruijnTable64[((value - (value >> 1)) *
uint64_t(0x07EDD5E59A4E28C2)) >> 58];
}

[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn

2025-11-05 Thread fxue at os dot amperecomputing.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569

--- Comment #2 from Feng Xue  ---
(In reply to Richard Biener from comment #1)
> Do these come from some actual cases used?

Yes, the actual case is for 64-bit integer. I merely split the problem of match
failure into three issues above.

[Bug tree-optimization/122569] Fails to pattern match ctz/clz from DeBruijn

2025-11-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122569

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
   Last reconfirmed||2025-11-05
Version|unknown |16.0
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
Do these come from some actual cases used?