[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64

2025-08-08 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |14.4
 Status|ASSIGNED|RESOLVED

--- Comment #7 from Jakub Jelinek  ---
Fixed for 16+, 15.3+ and 14.4+.

[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64

2025-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413

--- Comment #6 from GCC Commits  ---
The releases/gcc-14 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:edc2388e802853ddc70b04a6de2b3c180a2a8442

commit r14-11941-gedc2388e802853ddc70b04a6de2b3c180a2a8442
Author: Jakub Jelinek 
Date:   Wed Aug 6 11:30:08 2025 +0200

bitint: Fix up INTEGER_CST PHI handling [PR121413]

The following testcase is miscompiled on aarch64-linux.
The problem is in the optimization to shorten large constants
in PHI arguments.
In a couple of places during bitint lowering we compute
minimal precision of constant and if it is significantly
smaller than the precision of the type, store smaller constant
in memory and extend it at runtime (zero or all ones).
Now, in most places that works fine, we handle the stored number
of limbs by loading them from memory and then the rest is
extended.  In the PHI INTEGER_CST argument handling we do
it differently, we don't form there any loops (because we
insert stmt sequences on the edges).
The problem is that we copy the whole _BitInt variable from
memory to the PHI VAR_DECL + initialize the rest to = {} or
memset to -1.  It has
min_prec = CEIL (min_prec, limb_prec) * limb_prec;
precision, so e.g. on x86_64 there is no padding and it works
just fine.  But on aarch64 which has abi_limb_mode TImode
and limb_mode DImode it doesn't in some cases.
In the testcase the constant has 408 bits min precision, rounded up
to limb_prec (64) is 448, i.e. 7 limbs.  But aarch64 with TImode
abi_limb_mode will actually allocate 8 limbs and the most significant
limb is solely padding.  As we want to extend the constant with all
ones, copying the padding (from memory, so 0s) will result in
64 0 bits where 1 bits were needed.

The following patch fixes it by detecting this case and setting
min_prec to a multiple of abi limb precision so that it has
no padding.

2025-08-06  Jakub Jelinek  

PR tree-optimization/121413
* gimple-lower-bitint.cc (abi_limb_prec): New variable
(bitint_precision_kind): Initialize it.
(gimple_lower_bitint): Clear it at the start.  For
min_prec > limb_prec descreased precision vars for
INTEGER_CST PHI arguments ensure min_prec is either
prec or multiple of abi_limb_prec.

* gcc.dg/torture/bitint-85.c: New test.

(cherry picked from commit 70aff5112ec25f2391d8048d8c7994160d3cb008)

[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64

2025-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413

--- Comment #5 from GCC Commits  ---
The releases/gcc-15 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:d43ece39b709992681d92b17c1e58b5f152ff247

commit r15-10206-gd43ece39b709992681d92b17c1e58b5f152ff247
Author: Jakub Jelinek 
Date:   Wed Aug 6 11:30:08 2025 +0200

bitint: Fix up INTEGER_CST PHI handling [PR121413]

The following testcase is miscompiled on aarch64-linux.
The problem is in the optimization to shorten large constants
in PHI arguments.
In a couple of places during bitint lowering we compute
minimal precision of constant and if it is significantly
smaller than the precision of the type, store smaller constant
in memory and extend it at runtime (zero or all ones).
Now, in most places that works fine, we handle the stored number
of limbs by loading them from memory and then the rest is
extended.  In the PHI INTEGER_CST argument handling we do
it differently, we don't form there any loops (because we
insert stmt sequences on the edges).
The problem is that we copy the whole _BitInt variable from
memory to the PHI VAR_DECL + initialize the rest to = {} or
memset to -1.  It has
min_prec = CEIL (min_prec, limb_prec) * limb_prec;
precision, so e.g. on x86_64 there is no padding and it works
just fine.  But on aarch64 which has abi_limb_mode TImode
and limb_mode DImode it doesn't in some cases.
In the testcase the constant has 408 bits min precision, rounded up
to limb_prec (64) is 448, i.e. 7 limbs.  But aarch64 with TImode
abi_limb_mode will actually allocate 8 limbs and the most significant
limb is solely padding.  As we want to extend the constant with all
ones, copying the padding (from memory, so 0s) will result in
64 0 bits where 1 bits were needed.

The following patch fixes it by detecting this case and setting
min_prec to a multiple of abi limb precision so that it has
no padding.

2025-08-06  Jakub Jelinek  

PR tree-optimization/121413
* gimple-lower-bitint.cc (abi_limb_prec): New variable
(bitint_precision_kind): Initialize it.
(gimple_lower_bitint): Clear it at the start.  For
min_prec > limb_prec descreased precision vars for
INTEGER_CST PHI arguments ensure min_prec is either
prec or multiple of abi_limb_prec.

* gcc.dg/torture/bitint-85.c: New test.

(cherry picked from commit 70aff5112ec25f2391d8048d8c7994160d3cb008)

[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64

2025-08-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413

--- Comment #4 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:685527a408ea025591c7f887566d7049ddd72c02

commit r16-3041-g685527a408ea025591c7f887566d7049ddd72c02
Author: Jakub Jelinek 
Date:   Wed Aug 6 12:52:47 2025 +0200

bitint: Fix build [PR121413]

Sorry, my bootstrap failed last night because of this, I've fixed it
up and it bootstrapped/regtested fine overnight, but in the morning
forgot to adjust the patch before committing.

Without this there is
.../gimple-lower-bitint.cc:7678:36: error: comparison of integer
expressions of different signedness: âunsigned intâ and âintâ
[-Werror=sign-compare]
 7678 |   if (min_prec > limb_prec && abi_limb_prec >
limb_prec)
  |   ~^~~

2025-08-06  Jakub Jelinek  

PR tree-optimization/121413
* gimple-lower-bitint.cc (gimple_lower_bitint): Fix up last
commit, cast limb_prec to unsigned before comparison.

[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64

2025-08-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:70aff5112ec25f2391d8048d8c7994160d3cb008

commit r16-3034-g70aff5112ec25f2391d8048d8c7994160d3cb008
Author: Jakub Jelinek 
Date:   Wed Aug 6 11:30:08 2025 +0200

bitint: Fix up INTEGER_CST PHI handling [PR121413]

The following testcase is miscompiled on aarch64-linux.
The problem is in the optimization to shorten large constants
in PHI arguments.
In a couple of places during bitint lowering we compute
minimal precision of constant and if it is significantly
smaller than the precision of the type, store smaller constant
in memory and extend it at runtime (zero or all ones).
Now, in most places that works fine, we handle the stored number
of limbs by loading them from memory and then the rest is
extended.  In the PHI INTEGER_CST argument handling we do
it differently, we don't form there any loops (because we
insert stmt sequences on the edges).
The problem is that we copy the whole _BitInt variable from
memory to the PHI VAR_DECL + initialize the rest to = {} or
memset to -1.  It has
min_prec = CEIL (min_prec, limb_prec) * limb_prec;
precision, so e.g. on x86_64 there is no padding and it works
just fine.  But on aarch64 which has abi_limb_mode TImode
and limb_mode DImode it doesn't in some cases.
In the testcase the constant has 408 bits min precision, rounded up
to limb_prec (64) is 448, i.e. 7 limbs.  But aarch64 with TImode
abi_limb_mode will actually allocate 8 limbs and the most significant
limb is solely padding.  As we want to extend the constant with all
ones, copying the padding (from memory, so 0s) will result in
64 0 bits where 1 bits were needed.

The following patch fixes it by detecting this case and setting
min_prec to a multiple of abi limb precision so that it has
no padding.

2025-08-06  Jakub Jelinek  

PR tree-optimization/121413
* gimple-lower-bitint.cc (abi_limb_prec): New variable
(bitint_precision_kind): Initialize it.
(gimple_lower_bitint): Clear it at the start.  For
min_prec > limb_prec descreased precision vars for
INTEGER_CST PHI arguments ensure min_prec is either
prec or multiple of abi_limb_prec.

* gcc.dg/torture/bitint-85.c: New test.

[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64

2025-08-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
   Last reconfirmed||2025-08-05
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED

--- Comment #2 from Jakub Jelinek  ---
Created attachment 62060
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62060&action=edit
gcc16-pr121413.patch

Untested fix.

[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64

2025-08-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413

--- Comment #1 from Jakub Jelinek  ---
Ah, the problem is in the PHI INTEGER_CST argument expansion code on targets
with GET_MODE_BITSIZE (info.abi_limb_mode) > GET_MODE_BITSIZE (info.limb_mode)
like aarch64 (or in the near future loongarch and arm).
While in other spots where we emit INTEGER_CSTs with smaller precision into
rodata and then extend the extension is done properly through extending just
the limbs beyond the precision of the chosen _BitInt type (with precision in
multiplies of limb_prec), in the PHI argument case it is done by copying the
whole c and clearing the rest (or memset to -1).
Now, in this particular case min_prec is ~ 408ish, and as limb_prec is 64, we
choose 7*64 = 448 bits for c.  Except on aarch64 that contains 64bits of
padding and when we copy the whole c, we copy also the 64 bits of padding from
there (and only for higher bits memset it to -1).
Either we'd need to carefully copy only the 7 limbs rather than whole c and
memset the rest, or IMHO when we in rodata allocate 8 limbs anyway, it might be
easier to just use 512 bits for c with no padding in there.