[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413 Jakub Jelinek changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |14.4 Status|ASSIGNED|RESOLVED --- Comment #7 from Jakub Jelinek --- Fixed for 16+, 15.3+ and 14.4+.
[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413
--- Comment #6 from GCC Commits ---
The releases/gcc-14 branch has been updated by Jakub Jelinek
:
https://gcc.gnu.org/g:edc2388e802853ddc70b04a6de2b3c180a2a8442
commit r14-11941-gedc2388e802853ddc70b04a6de2b3c180a2a8442
Author: Jakub Jelinek
Date: Wed Aug 6 11:30:08 2025 +0200
bitint: Fix up INTEGER_CST PHI handling [PR121413]
The following testcase is miscompiled on aarch64-linux.
The problem is in the optimization to shorten large constants
in PHI arguments.
In a couple of places during bitint lowering we compute
minimal precision of constant and if it is significantly
smaller than the precision of the type, store smaller constant
in memory and extend it at runtime (zero or all ones).
Now, in most places that works fine, we handle the stored number
of limbs by loading them from memory and then the rest is
extended. In the PHI INTEGER_CST argument handling we do
it differently, we don't form there any loops (because we
insert stmt sequences on the edges).
The problem is that we copy the whole _BitInt variable from
memory to the PHI VAR_DECL + initialize the rest to = {} or
memset to -1. It has
min_prec = CEIL (min_prec, limb_prec) * limb_prec;
precision, so e.g. on x86_64 there is no padding and it works
just fine. But on aarch64 which has abi_limb_mode TImode
and limb_mode DImode it doesn't in some cases.
In the testcase the constant has 408 bits min precision, rounded up
to limb_prec (64) is 448, i.e. 7 limbs. But aarch64 with TImode
abi_limb_mode will actually allocate 8 limbs and the most significant
limb is solely padding. As we want to extend the constant with all
ones, copying the padding (from memory, so 0s) will result in
64 0 bits where 1 bits were needed.
The following patch fixes it by detecting this case and setting
min_prec to a multiple of abi limb precision so that it has
no padding.
2025-08-06 Jakub Jelinek
PR tree-optimization/121413
* gimple-lower-bitint.cc (abi_limb_prec): New variable
(bitint_precision_kind): Initialize it.
(gimple_lower_bitint): Clear it at the start. For
min_prec > limb_prec descreased precision vars for
INTEGER_CST PHI arguments ensure min_prec is either
prec or multiple of abi_limb_prec.
* gcc.dg/torture/bitint-85.c: New test.
(cherry picked from commit 70aff5112ec25f2391d8048d8c7994160d3cb008)
[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413
--- Comment #5 from GCC Commits ---
The releases/gcc-15 branch has been updated by Jakub Jelinek
:
https://gcc.gnu.org/g:d43ece39b709992681d92b17c1e58b5f152ff247
commit r15-10206-gd43ece39b709992681d92b17c1e58b5f152ff247
Author: Jakub Jelinek
Date: Wed Aug 6 11:30:08 2025 +0200
bitint: Fix up INTEGER_CST PHI handling [PR121413]
The following testcase is miscompiled on aarch64-linux.
The problem is in the optimization to shorten large constants
in PHI arguments.
In a couple of places during bitint lowering we compute
minimal precision of constant and if it is significantly
smaller than the precision of the type, store smaller constant
in memory and extend it at runtime (zero or all ones).
Now, in most places that works fine, we handle the stored number
of limbs by loading them from memory and then the rest is
extended. In the PHI INTEGER_CST argument handling we do
it differently, we don't form there any loops (because we
insert stmt sequences on the edges).
The problem is that we copy the whole _BitInt variable from
memory to the PHI VAR_DECL + initialize the rest to = {} or
memset to -1. It has
min_prec = CEIL (min_prec, limb_prec) * limb_prec;
precision, so e.g. on x86_64 there is no padding and it works
just fine. But on aarch64 which has abi_limb_mode TImode
and limb_mode DImode it doesn't in some cases.
In the testcase the constant has 408 bits min precision, rounded up
to limb_prec (64) is 448, i.e. 7 limbs. But aarch64 with TImode
abi_limb_mode will actually allocate 8 limbs and the most significant
limb is solely padding. As we want to extend the constant with all
ones, copying the padding (from memory, so 0s) will result in
64 0 bits where 1 bits were needed.
The following patch fixes it by detecting this case and setting
min_prec to a multiple of abi limb precision so that it has
no padding.
2025-08-06 Jakub Jelinek
PR tree-optimization/121413
* gimple-lower-bitint.cc (abi_limb_prec): New variable
(bitint_precision_kind): Initialize it.
(gimple_lower_bitint): Clear it at the start. For
min_prec > limb_prec descreased precision vars for
INTEGER_CST PHI arguments ensure min_prec is either
prec or multiple of abi_limb_prec.
* gcc.dg/torture/bitint-85.c: New test.
(cherry picked from commit 70aff5112ec25f2391d8048d8c7994160d3cb008)
[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413 --- Comment #4 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:685527a408ea025591c7f887566d7049ddd72c02 commit r16-3041-g685527a408ea025591c7f887566d7049ddd72c02 Author: Jakub Jelinek Date: Wed Aug 6 12:52:47 2025 +0200 bitint: Fix build [PR121413] Sorry, my bootstrap failed last night because of this, I've fixed it up and it bootstrapped/regtested fine overnight, but in the morning forgot to adjust the patch before committing. Without this there is .../gimple-lower-bitint.cc:7678:36: error: comparison of integer expressions of different signedness: âunsigned intâ and âintâ [-Werror=sign-compare] 7678 | if (min_prec > limb_prec && abi_limb_prec > limb_prec) | ~^~~ 2025-08-06 Jakub Jelinek PR tree-optimization/121413 * gimple-lower-bitint.cc (gimple_lower_bitint): Fix up last commit, cast limb_prec to unsigned before comparison.
[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413
--- Comment #3 from GCC Commits ---
The master branch has been updated by Jakub Jelinek :
https://gcc.gnu.org/g:70aff5112ec25f2391d8048d8c7994160d3cb008
commit r16-3034-g70aff5112ec25f2391d8048d8c7994160d3cb008
Author: Jakub Jelinek
Date: Wed Aug 6 11:30:08 2025 +0200
bitint: Fix up INTEGER_CST PHI handling [PR121413]
The following testcase is miscompiled on aarch64-linux.
The problem is in the optimization to shorten large constants
in PHI arguments.
In a couple of places during bitint lowering we compute
minimal precision of constant and if it is significantly
smaller than the precision of the type, store smaller constant
in memory and extend it at runtime (zero or all ones).
Now, in most places that works fine, we handle the stored number
of limbs by loading them from memory and then the rest is
extended. In the PHI INTEGER_CST argument handling we do
it differently, we don't form there any loops (because we
insert stmt sequences on the edges).
The problem is that we copy the whole _BitInt variable from
memory to the PHI VAR_DECL + initialize the rest to = {} or
memset to -1. It has
min_prec = CEIL (min_prec, limb_prec) * limb_prec;
precision, so e.g. on x86_64 there is no padding and it works
just fine. But on aarch64 which has abi_limb_mode TImode
and limb_mode DImode it doesn't in some cases.
In the testcase the constant has 408 bits min precision, rounded up
to limb_prec (64) is 448, i.e. 7 limbs. But aarch64 with TImode
abi_limb_mode will actually allocate 8 limbs and the most significant
limb is solely padding. As we want to extend the constant with all
ones, copying the padding (from memory, so 0s) will result in
64 0 bits where 1 bits were needed.
The following patch fixes it by detecting this case and setting
min_prec to a multiple of abi limb precision so that it has
no padding.
2025-08-06 Jakub Jelinek
PR tree-optimization/121413
* gimple-lower-bitint.cc (abi_limb_prec): New variable
(bitint_precision_kind): Initialize it.
(gimple_lower_bitint): Clear it at the start. For
min_prec > limb_prec descreased precision vars for
INTEGER_CST PHI arguments ensure min_prec is either
prec or multiple of abi_limb_prec.
* gcc.dg/torture/bitint-85.c: New test.
[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413 Jakub Jelinek changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Last reconfirmed||2025-08-05 Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED --- Comment #2 from Jakub Jelinek --- Created attachment 62060 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62060&action=edit gcc16-pr121413.patch Untested fix.
[Bug tree-optimization/121413] wrong code with _BitInt(1024) on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121413 --- Comment #1 from Jakub Jelinek --- Ah, the problem is in the PHI INTEGER_CST argument expansion code on targets with GET_MODE_BITSIZE (info.abi_limb_mode) > GET_MODE_BITSIZE (info.limb_mode) like aarch64 (or in the near future loongarch and arm). While in other spots where we emit INTEGER_CSTs with smaller precision into rodata and then extend the extension is done properly through extending just the limbs beyond the precision of the chosen _BitInt type (with precision in multiplies of limb_prec), in the PHI argument case it is done by copying the whole c and clearing the rest (or memset to -1). Now, in this particular case min_prec is ~ 408ish, and as limb_prec is 64, we choose 7*64 = 448 bits for c. Except on aarch64 that contains 64bits of padding and when we copy the whole c, we copy also the 64 bits of padding from there (and only for higher bits memset it to -1). Either we'd need to carefully copy only the 7 limbs rather than whole c and memset the rest, or IMHO when we in rodata allocate 8 limbs anyway, it might be easier to just use 512 bits for c with no padding in there.
