[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623
--- Comment #18 from GCC Commits ---
The master branch has been updated by Jakub Jelinek :
https://gcc.gnu.org/g:92142019b6cd0cf1fe483203cf3ec451a9848a42
commit r15-7454-g92142019b6cd0cf1fe483203cf3ec451a9848a42
Author: Jakub Jelinek
Date: Mon Feb 10 10:40:22 2025 +0100
i386: Change RTL representation of bt[lq] [PR118623]
The following testcase is miscompiled because of RTL represententation
of bt{l,q} insn followed by e.g. j{c,nc} being misleading to what it
actually does.
Let's look e.g. at
(define_insn_and_split "*jcc_bt"
[(set (pc)
(if_then_else (match_operator 0 "bt_comparison_operator"
[(zero_extract:SWI48
(match_operand:SWI48 1 "nonimmediate_operand")
(const_int 1)
(match_operand:QI 2 "nonmemory_operand"))
(const_int 0)])
(label_ref (match_operand 3))
(pc)))
(clobber (reg:CC FLAGS_REG))]
"(TARGET_USE_BT || optimize_function_for_size_p (cfun))
&& (CONST_INT_P (operands[2])
? (INTVAL (operands[2]) < GET_MODE_BITSIZE (mode)
&& INTVAL (operands[2])
>= (optimize_function_for_size_p (cfun) ? 8 : 32))
: !memory_operand (operands[1], mode))
&& ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (reg:CCC FLAGS_REG)
(compare:CCC
(zero_extract:SWI48
(match_dup 1)
(const_int 1)
(match_dup 2))
(const_int 0)))
(set (pc)
(if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
(label_ref (match_dup 3))
(pc)))]
{
operands[0] = shallow_copy_rtx (operands[0]);
PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
})
The define_insn part in RTL describes exactly what it does,
jumps to op3 if bit op2 in op1 is set (for op0 NE) or not set (for op0 EQ).
The problem is with what it splits into.
put_condition_code %C1 for CCCmode comparisons emits c for EQ and LTU,
nc for NE and GEU and ICEs otherwise.
CCCmode is used mainly for carry out of add/adc, borrow out of sub/sbb,
in those cases e.g. for add we have
(set (reg:CCC flags) (compare:CCC (plus:M x y) x))
and use (ltu (reg:CCC flags) (const_int 0)) for carry set and
(geu (reg:CCC flags) (const_int 0)) for carry not set. These cases
model in RTL what is actually happening, compare in infinite precision
x from the result of finite precision addition in M mode and if it is
less than unsigned (i.e. overflow happened), carry is set.
Another use of CCCmode is in UNSPEC_* patterns, those are used with
(eq (reg:CCC flags) (const_int 0)) for carry set and ne for unset,
given the UNSPEC no big deal, the middle-end doesn't know what means
set or unset.
But for the bt{l,q}; j{c,nc} case the above splits it into
(set (reg:CCC flags) (compare:CCC (zero_extract) (const_int 0)))
for bt and
(set (pc) (if_then_else (eq (reg:CCC flags) (const_int 0)) (label_ref)
(pc)))
for the bit set case (so that the jump expands to jc) and ne for
the bit not set case (so that the jump expands to jnc).
Similarly for the different splitters for cmov and set{c,nc} etc.
The problem is that when the middle-end reads this RTL, it feels
the exact opposite to it. If zero_extract is 1, flags is set
to comparison of 1 and 0 and that would mean using ne ne in the
if_then_else, and vice versa.
So, in order to better describe in RTL what is actually happening,
one possibility would be to swap the behavior of put_condition_code
and use NE + LTU -> c and EQ + GEU -> nc rather than the current
EQ + LTU -> c and NE + GEU -> nc; and adjust everything. The
following patch uses a more limited approach, instead of representing
bt{l,q}; j{c,nc} case as written above it uses
(set (reg:CCC flags) (compare:CCC (const_int 0) (zero_extract)))
and
(set (pc) (if_then_else (ltu (reg:CCC flags) (const_int 0)) (label_ref)
(pc)))
which uses the existing put_condition_code but describes what the
insns actually do in RTL clearly. If zero_extract is 1,
then flags are LTU, 0U < 1U, if zero_extract is 0, then flags are GEU,
0U >= 0U. The patch adjusts the *bt define_insn and all the
splitters to it and its comparisons/conditional moves/setXX.
2025-02-10 Jakub Jelinek
PR target/118623
* config/i386/i386.md (*bt): Represent bt as
compare:CCC of const0_rtx and zero_extract rather than
zero_extract and const0_rtx.
(*bt_mask): Likewise.
(*jcc_bt): Likewise. Use LTU and GEU as flags test
instead
[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 --- Comment #17 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #15) > Created attachment 60411 [details] > gcc15-pr118623.patch > > Untested patch which seems to work for me on the new testcases and > i386.exp=bt*.c so far. When there's a borrow, Carry flag is set. swap operands order to (compare:CCC (const_int 0) (zero_extract ...)) looks like a good idea.
[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 --- Comment #16 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #14) > So, if (reg:CCC flags) being non-zero in RTL means nc and (reg:CCC flags) > being zero in RTL means c, shouldn't *bt be using (compare:CCC > (zero_extract ...) (const_int 1)) rather than 0? IMO, yes. When the bit is 1, there're no borrow in the (compare:CCC (zero_extract ...) (const_int 1)), so Carry flag is set. Otherwise the carry flag is cleared.
[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 --- Comment #15 from Jakub Jelinek --- Created attachment 60411 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60411&action=edit gcc15-pr118623.patch Untested patch which seems to work for me on the new testcases and i386.exp=bt*.c so far.
[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 --- Comment #14 from Jakub Jelinek --- So, if (reg:CCC flags) being non-zero in RTL means nc and (reg:CCC flags) being zero in RTL means c, shouldn't *bt be using (compare:CCC (zero_extract ...) (const_int 1)) rather than 0?
[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623
Jakub Jelinek changed:
What|Removed |Added
CC||jakub at gcc dot gnu.org
--- Comment #13 from Jakub Jelinek ---
So, there is the cmov splitter added in r12-1525 and extended in r12-7751. The
r12-1525 version of this was:
;; Help combine recognize bt followed by cmov
(define_split
[(set (match_operand:SWI248 0 "register_operand")
(if_then_else:SWI248
(ne
(zero_extract:SWI48
(match_operand:SWI48 1 "register_operand")
(const_int 1)
(zero_extend:SI (match_operand:QI 2 "register_operand")))
(const_int 0))
(match_operand:SWI248 3 "nonimmediate_operand")
(match_operand:SWI248 4 "nonimmediate_operand")))]
"TARGET_USE_BT && TARGET_CMOVE
&& !(MEM_P (operands[3]) && MEM_P (operands[4]))
&& ix86_pre_reload_split ()"
[(set (reg:CCC FLAGS_REG)
(compare:CCC
(zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
(const_int 0)))
(set (match_dup 0)
(if_then_else:SWI248 (eq (reg:CCC FLAGS_REG) (const_int 0))
(match_dup 3)
(match_dup 4)))]
{
operands[2] = lowpart_subreg (SImode, operands[2], QImode);
})
So, in my reading, op0 = (op1 & (1 << op2)) ? op3 : op4 is what is being
matched.
And by purely reading the RTL the replacement is the other way around, flags
is set to comparison of the zero extraction (i.e. 1 or 0) against 0 and if it
is
0, op3 is used, otherwise op4. But %C1 actually prints for (eq (reg:CCC flags)
(const_int 0))
c rather than nc and for ne nc rather than c.
[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623
--- Comment #12 from Hongtao Liu ---
1370Trying 35 -> 20:
1371 35: flags:CCC=cmp(zero_extract(r104:SI,0x1,r105:SI#0),0)
1372 REG_DEAD r104:SI
1373 REG_DEAD r105:SI
1374 20: pc={(flags:CCC!=0)?L26:pc}
1375 REG_BR_PROB 1073741831
1376 REG_DEAD flags:CCZ
1377Successfully matched this instruction:
1378(set (pc)
1379(if_then_else (ne (zero_extract:SI (reg/v:SI 104 [ g ])
1380(const_int 1 [0x1])
1381(subreg:QI (reg:SI 105 [ _1 ]) 0))
1382(const_int 0 [0]))
1383(label_ref:DI 26)
1384(pc)))
The problem is here, {(flags:CCC!=0)?L26:pc} means when bit CCC is not set
which should be equal to
(eq (zero_extract:SI (reg/v:SI 104 [ g ])
1380 (const_int 1 [0x1])
1381 (subreg:QI (reg:SI 105 [ _1 ]) 0) 0).
Not *ne*, Looks like rtl simplication handle CFLAGS differently from backend?
In i386.md, there's explicitly reverse for the condition code when handling
zero_extract against zero with CCCmode.
19178(define_insn_and_split "*jcc_bt"
19179 [(set (pc)
19180(if_then_else (match_operator 0 "bt_comparison_operator"
19181[(zero_extract:SWI48
19182 (match_operand:SWI48 1 "nonimmediate_operand")
19183 (const_int 1)
19184 (match_operand:QI 2 "nonmemory_operand"))
19185 (const_int 0)])
19186 (label_ref (match_operand 3))
19187 (pc)))
19188 (clobber (reg:CC FLAGS_REG))]
19189 "(TARGET_USE_BT || optimize_function_for_size_p (cfun))
19190 && (CONST_INT_P (operands[2])
19191 ? (INTVAL (operands[2]) < GET_MODE_BITSIZE (mode)
19192 && INTVAL (operands[2])
19193 >= (optimize_function_for_size_p (cfun) ? 8 : 32))
19194 : !memory_operand (operands[1], mode))
19195 && ix86_pre_reload_split ()"
19196 "#"
19197 "&& 1"
19198 [(set (reg:CCC FLAGS_REG)
19199(compare:CCC
19200 (zero_extract:SWI48
19201(match_dup 1)
19202(const_int 1)
19203(match_dup 2))
19204 (const_int 0)))
19205 (set (pc)
19206(if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
19207 (label_ref (match_dup 3))
19208 (pc)))]
19209{
19210 operands[0] = shallow_copy_rtx (operands[0]);
19211 PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
19212})
[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623
--- Comment #11 from Hongtao Liu ---
283(insn 8 7 9 2 (set (reg:SI 107)
284(const_int 1 [0x1])) "test.c":3:7 -1
285 (nil))
286(insn 9 8 10 2 (parallel [
287(set (reg:SI 106 [ e_7 ])
288(ashift:SI (reg:SI 107)
289(subreg:QI (reg:SI 105 [ _1 ]) 0)))
290(clobber (reg:CC 17 flags))
291]) "test.c":3:7 -1
292 (nil))
293(insn 10 9 11 2 (parallel [
294(set (reg:SI 108 [ _9 ])
295(and:SI (reg:SI 106 [ e_7 ])
296(reg/v:SI 104 [ g ])))
297(clobber (reg:CC 17 flags))
298]) "test.c":4:9 -1
299 (nil))
300(insn 11 10 12 2 (set (reg:CCZ 17 flags)
301(compare:CCZ (reg:SI 108 [ _9 ])
302(const_int 0 [0]))) "test.c":4:6 -1
303 (nil))
304(jump_insn 12 11 13 2 (set (pc)
305(if_then_else (ne (reg:CCZ 17 flags)
306(const_int 0 [0]))
307(label_ref:DI 31)
308(pc))) "test.c":4:6 1458 {*jcc}
309 (int_list:REG_BR_PROB 524952380 (nil))
310 -> 31)
311;; succ: 5 [48.9% (guessed)] count:524952376 (estimated locally,
freq 0.4889)
312;; 4 [51.1% (guessed)] count:548789448 (estimated locally,
freq 0.5111) (FALLTHRU)
313
314;; basic block 4, loop depth 0, count 548789448 (estimated locally, freq
0.5111), maybe hot
315;; prev block 2, next block 5, flags: (NEW, REACHABLE, RTL, MODIFIED,
VISITED)
316;; pred: 2 [51.1% (guessed)] count:548789448 (estimated locally,
freq 0.5111) (FALLTHRU)
317(note 13 12 14 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
318(insn 14 13 28 4 (parallel [
319(set (reg:SI 103 [ _14 ])
320(plus:SI (reg/v:SI 104 [ g ])
321(const_int 5 [0x5])))
322(clobber (reg:CC 17 flags))
323]) "test.c":12:22 discrim 1 -1
324 (nil))
325(jump_insn 28 14 29 4 (set (pc)
326(label_ref 15)) -1
327 (nil)
328 -> 15)
329;; succ: 6 [always] count:548789448 (estimated locally, freq 0.5111)
330
331(barrier 29 28 31)
332;; basic block 5, loop depth 0, count 524952376 (estimated locally, freq
0.4889), maybe hot
333;; prev block 4, next block 6, flags: (NEW, REACHABLE, RTL, MODIFIED)
334;; pred: 2 [48.9% (guessed)] count:524952376 (estimated locally,
freq 0.4889)
335(code_label 31 29 30 5 4 (nil) [1 uses])
336(note 30 31 4 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
337(insn 4 30 15 5 (set (reg:SI 103 [ _14 ])
338(reg/v:SI 104 [ g ])) -1
339 (nil))
340;; succ: 6 [always] count:524952376 (estimated locally, freq 0.4889)
(FALLTHRU)
341
342;; basic block 6, loop depth 0, count 1073741824 (estimated locally, freq
1.), maybe hot
343;; prev block 5, next block 7, flags: (NEW, REACHABLE, RTL, MODIFIED,
VISITED)
344;; pred: 5 [always] count:524952376 (estimated locally, freq 0.4889)
(FALLTHRU)
345;; 4 [always] count:548789448 (estimated locally, freq 0.5111)
346(code_label 15 4 16 6 2 (nil) [1 uses])
347(note 16 15 17 6 [bb 6] NOTE_INSN_BASIC_BLOCK)
348(insn 17 16 18 6 (parallel [
It looks it's already incorrect in the expand, if g & e != 0, it jumps to bb 4,
which is from return 0, but when it's equals to 0, it fallthru to return 5
case.
[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 --- Comment #10 from Hongtao Liu --- > > r12-7751-g919fbffef07555 > > that might have just exposed a latent issue Should be, the guilty commit just extent a splitter to handle reversed condition, didn't see anything abnormal.
[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 --- Comment #9 from rguenther at suse dot de --- On Thu, 23 Jan 2025, sjames at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 > > Sam James changed: > >What|Removed |Added > > CC||liuhongt at gcc dot gnu.org > Summary|[12/13/14/15 regression]|[12/13/14/15 regression] >|Miscompile with -O2/3 and |Miscompile with -O2/3 and >|-O0/1 |-O0/1 since >||r12-7751-g919fbffef07555 > > --- Comment #8 from Sam James --- > r12-7751-g919fbffef07555 that might have just exposed a latent issue
[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 Sam James changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org Summary|[12/13/14/15 regression]|[12/13/14/15 regression] |Miscompile with -O2/3 and |Miscompile with -O2/3 and |-O0/1 |-O0/1 since ||r12-7751-g919fbffef07555 --- Comment #8 from Sam James --- r12-7751-g919fbffef07555
