[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103
--- Comment #16 from GCC Commits ---
The master branch has been updated by Pan Li :
https://gcc.gnu.org/g:1c8e6734d2dd3a6236d94c6e4e0c6780f35ede9f
commit r15-7421-g1c8e6734d2dd3a6236d94c6e4e0c6780f35ede9f
Author: Pan Li
Date: Fri Feb 7 14:21:35 2025 +0800
RISC-V: Make VXRM as global register [PR118103]
Inspired by PR118103, the VXRM register should be treated almost the
same as the FRM register, aka cooperatively-managed global register.
Thus, add the VXRM to global_regs to avoid the elimination by the
late-combine pass.
For example as below code:
21 â
22 â void compute ()
23 â {
24 â size_t vl = __riscv_vsetvl_e16m1 (N);
25 â vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl);
26 â vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl);
27 â vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb,
__RISCV_VXRM_RDN, vl);
28 â
29 â __riscv_vse16_v_u16m1 (c, vc, vl);
30 â }
31 â
32 â int main ()
33 â {
34 â initialize ();
35 â compute();
36 â
37 â return 0;
38 â }
After compile with -march=rv64gcv -O3, we will have:
30 â compute:
31 â csrwi vxrm,2
32 â lui a3,%hi(a)
33 â lui a4,%hi(b)
34 â addia4,a4,%lo(b)
35 â vsetivlizero,4,e16,m1,ta,ma
36 â addia3,a3,%lo(a)
37 â vle16.v v2,0(a4)
38 â vle16.v v1,0(a3)
39 â lui a4,%hi(c)
40 â addia4,a4,%lo(c)
41 â vaaddu.vv v1,v1,v2
42 â vse16.v v1,0(a4)
43 â ret
44 â .size compute, .-compute
45 â .section.text.startup,"ax",@progbits
46 â .align 1
47 â .globl main
48 â .type main, @function
49 â main:
| // csrwi vxrm,2 deleted after inline
50 â addisp,sp,-16
51 â sd ra,8(sp)
52 â callinitialize
53 â lui a3,%hi(a)
54 â lui a4,%hi(b)
55 â vsetivlizero,4,e16,m1,ta,ma
56 â addia4,a4,%lo(b)
57 â addia3,a3,%lo(a)
58 â vle16.v v2,0(a4)
59 â vle16.v v1,0(a3)
60 â lui a4,%hi(c)
61 â addia4,a4,%lo(c)
62 â li a0,0
63 â vaaddu.vv v1,v1,v2
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/118103
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_conditional_register_usage): Add
the VXRM as the global_regs.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr118103-2.c: New test.
* gcc.target/riscv/rvv/base/pr118103-run-2.c: New test.
Signed-off-by: Pan Li
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103 --- Comment #15 from Vineet Gupta --- (In reply to Li Pan from comment #12) > > Hi Vineet, > > Could you please help to double check if below can help to resolve the cam4 > issue? Thanks and feel free to let me know if any help is required. > > https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674502.html Yep it does. Thx for the fix.
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103 Jeffrey A. Law changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #14 from Jeffrey A. Law --- Fixed on the trunk.
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103
--- Comment #13 from GCC Commits ---
The master branch has been updated by Pan Li :
https://gcc.gnu.org/g:55d288d4ff5360c572f2a017ba9385840ac5134e
commit r15-7215-g55d288d4ff5360c572f2a017ba9385840ac5134e
Author: Pan Li
Date: Sat Jan 25 15:45:10 2025 +0800
RISC-V: Make FRM as global register [PR118103]
After we enabled the labe-combine pass after the mode-switching pass, it
will try to combine below insn patterns into op. Aka:
(insn 40 5 41 2 (set (reg:SI 11 a1 [151])
(reg:SI 69 frm)) "pr118103-simple.c":67:15 2712 {frrmsi}
(nil))
(insn 41 40 7 2 (set (reg:SI 69 frm)
(const_int 2 [0x2])) "pr118103-simple.c":69:8 2710 {fsrmsi_restore}
(nil))
(insn 42 10 11 2 (set (reg:SI 69 frm)
(reg:SI 11 a1 [151])) "pr118103-simple.c":70:8 2710 {fsrmsi_restore}
(nil))
trying to combine definition of r11 in:
40: a1:SI=frm:SI
into:
42: frm:SI=a1:SI
instruction becomes a no-op:
(set (reg:SI 69 frm)
(reg:SI 69 frm))
original cost = 4 + 4 (weighted: 8.00), replacement cost =
2147483647; keeping replacement
rescanning insn with uid = 42.
updating insn 42 in-place
verify found no changes in insn with uid = 42.
deleting insn 40
For example we have code as blow:
9 â int test_exampe () {
10 â test ();
11 â
12 â size_t vl = 4;
13 â vfloat16m1_t va = __riscv_vle16_v_f16m1(a, vl);
14 â va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN,
vl);
15 â va = __riscv_vfmsac_vv_f16m1(va, va, va, vl);
16 â
17 â __riscv_vse16_v_f16m1(b, va, vl);
18 â
19 â return 0;
20 â }
it will be compiled to:
53 â main:
54 â addisp,sp,-16
55 â sd ra,8(sp)
56 â callinitialize
57 â lui a6,%hi(b)
58 â lui a2,%hi(a)
59 â addia3,a6,%lo(b)
60 â addia2,a2,%lo(a)
61 â li a4,4
62 â .L8:
63 â fsrmi 2
64 â vsetvli a5,a4,e16,m1,ta,ma
65 â vle16.v v1,0(a2)
66 â sllia1,a5,1
67 â subwa4,a4,a5
68 â add a2,a2,a1
69 â vfnmadd.vv v1,v1,v1
>> The fsrm a0 insn is deleted by late-combine <<
70 â vfmsub.vv v1,v1,v1
71 â vse16.v v1,0(a3)
72 â add a3,a3,a1
73 â bgt a4,zero,.L8
74 â lh a4,%lo(b)(a6)
75 â li a5,-20480
76 â addia5,a5,-1382
77 â bne a4,a5,.L14
78 â ld ra,8(sp)
79 â li a0,0
80 â addisp,sp,16
81 â jr ra
This patch would like to add the FRM register to the global_regs as it
is a cooperatively-managed global register. And then the fsrm insn will
not be eliminated by late-combine. The related spec17 cam4 failure may
also caused by this issue too.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/118103
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_conditional_register_usage): Add
the FRM as the global_regs.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr118103-1.c: New test.
* gcc.target/riscv/rvv/base/pr118103-run-1.c: New test.
Signed-off-by: Pan Li
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103 --- Comment #12 from Li Pan --- (In reply to Li Pan from comment #10) > (In reply to Vineet Gupta from comment #8) > > A fix for PR/118464 is posted to list [1] which also cures this issue. > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674498.html > > Thanks Vineet, it seems we can also have a try for cooperatively-managed > global register, will keep you posted. Hi Vineet, Could you please help to double check if below can help to resolve the cam4 issue? Thanks and feel free to let me know if any help is required. https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674502.html
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103 --- Comment #11 from Li Pan --- TARGET_CONDITIONAL_REGISTER_USAGE can help to resolve this issue, let me have a try for regression test. But looks we don't need to emit_volatile_frm anymore here, but it is another refactor later.
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103 --- Comment #10 from Li Pan --- (In reply to Vineet Gupta from comment #8) > A fix for PR/118464 is posted to list [1] which also cures this issue. > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674498.html Thanks Vineet, it seems we can also have a try for cooperatively-managed global register, will keep you posted.
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103
--- Comment #9 from Li Pan ---
(In reply to Richard Sandiford from comment #7)
> The problem seems to be in the modelling of the FRM register.
> CALL_USED_REGISTERS says that the register is call-clobbered/caller-save,
> which means:
>
> (a) it is not treated as live on entry to the function
> (b) all calls are assumed to clobber the contents
>
> So in:
>
> ;; entry block defs 1 [ra] 2 [sp] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14
> [a4] 15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47
> [fa5] 48 [fa6] 49 [fa7]
> …
> (note 63 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (note 2 63 10 2 NOTE_INSN_FUNCTION_BEG)
> (call_insn 10 2 68 2 (parallel [
> (call (mem:SI (symbol_ref:DI ("initialize") [flags 0x3]
> ) [0 initialize S4 A32])
> (const_int 0 [0]))
> (use (unspec:SI [
> (const_int 0 [0])
> ] UNSPEC_CALLEE_CC))
> (clobber (reg:SI 1 ra))
> ]) "/tmp/a.c":30:3 456 {call_internal}
> (expr_list:REG_CALL_DECL (symbol_ref:DI ("initialize") [flags 0x3]
> )
> (expr_list:REG_EH_REGION (const_int 0 [0])
> (nil)))
> (nil))
> (insn 68 10 3 2 (set (reg:SI 10 a0 [165])
> (reg:SI 69 frm)) "/tmp/a.c":18:36 2725 {frrmsi}
> (nil))
>
> the FRM on entry to insn 68 is in a sense doubly undefined: it wasn't
> defined on entry, and any value that it did have is in any case clobbered by
> the call.
>
> If FRM is in fact a call-preserved/callee-save register, that should be
> described using CALL_REALLY_USED_REGISTERS. If instead FRM is a
> cooperatively-managed global register, global_regs[FRM_REGNUM] should be set
> to true.
Thanks Richard, I think it is cooperatively-managed global register. Let me
double check where should I set this.
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103 Vineet Gupta changed: What|Removed |Added CC||vineetg at gcc dot gnu.org --- Comment #8 from Vineet Gupta --- A fix for PR/118464 is posted to list [1] which also cures this issue. [1] https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674498.html
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103
Richard Sandiford changed:
What|Removed |Added
Status|ASSIGNED|NEW
Assignee|rsandifo at gcc dot gnu.org|unassigned at gcc dot
gnu.org
--- Comment #7 from Richard Sandiford ---
The problem seems to be in the modelling of the FRM register.
CALL_USED_REGISTERS says that the register is call-clobbered/caller-save, which
means:
(a) it is not treated as live on entry to the function
(b) all calls are assumed to clobber the contents
So in:
;; entry block defs 1 [ra] 2 [sp] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4]
15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5]
48 [fa6] 49 [fa7]
…
(note 63 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 63 10 2 NOTE_INSN_FUNCTION_BEG)
(call_insn 10 2 68 2 (parallel [
(call (mem:SI (symbol_ref:DI ("initialize") [flags 0x3]
) [0 initialize S4 A32])
(const_int 0 [0]))
(use (unspec:SI [
(const_int 0 [0])
] UNSPEC_CALLEE_CC))
(clobber (reg:SI 1 ra))
]) "/tmp/a.c":30:3 456 {call_internal}
(expr_list:REG_CALL_DECL (symbol_ref:DI ("initialize") [flags 0x3]
)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil)))
(nil))
(insn 68 10 3 2 (set (reg:SI 10 a0 [165])
(reg:SI 69 frm)) "/tmp/a.c":18:36 2725 {frrmsi}
(nil))
the FRM on entry to insn 68 is in a sense doubly undefined: it wasn't defined
on entry, and any value that it did have is in any case clobbered by the call.
If FRM is in fact a call-preserved/callee-save register, that should be
described using CALL_REALLY_USED_REGISTERS. If instead FRM is a
cooperatively-managed global register, global_regs[FRM_REGNUM] should be set to
true.
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103 Richard Sandiford changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2025-01-24 Status|UNCONFIRMED |ASSIGNED CC||rsandifo at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #6 from Richard Sandiford --- Mine.
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103
--- Comment #5 from Li Pan ---
bisect late-combine2 results in some invalid combine here for main function.
(insn 40 5 41 2 (set (reg:SI 11 a1 [151])
(reg:SI 69 frm)) "pr118103-simple.c":67:15 2712 {frrmsi}
(nil))
(insn 41 40 7 2 (set (reg:SI 69 frm)
(const_int 2 [0x2])) "pr118103-simple.c":69:8 2710 {fsrmsi_restore}
(nil))
(insn 42 10 11 2 (set (reg:SI 69 frm)
(reg:SI 11 a1 [151])) "pr118103-simple.c":70:8 2710 {fsrmsi_restore}
(nil))
trying to combine definition of r11 in:
40: a1:SI=frm:SI
into:
42: frm:SI=a1:SI
instruction becomes a no-op:
(set (reg:SI
69 frm)
(reg:SI 69 frm))
original cost = 4 + 4 (weighted:
8.00), replacement cost = 2147483647; keeping replacement
rescanning insn with uid = 42.
updating insn 42 in-place
verify found no changes in insn with uid = 42.
deleting insn 40
deleting insn with uid = 40
But when the same code in another function instead of main, we will have
trying to combine definition of r12 in:
18: a2:SI=frm:SI
into:
20: frm:SI=a2:SI
-- cannot satisfy all definitions and uses in insn 20
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103 --- Comment #4 from Li Pan --- gcc-14 has the correct behavior and mostly some middle-end change I guess. └─(11:39:07 on master⚑ ✭)──> riscv64-linux-gnu-gcc-14 --version riscv64-linux-gnu-gcc-14 (Ubuntu 14.2.0-4ubuntu2~24.04) 14.2.0 70 │ test_exampe: 71 │ .LFB2: 72 │ .cfi_startproc 73 │ frrma3 74 │ fsrmi 2 75 │ lla a2,.LANCHOR0 76 │ lla a4,.LANCHOR0+8 77 │ vsetivlizero,4,e16,m1,ta,ma 78 │ vle16.v v1,0(a2) 79 │ vfnmadd.vv v1,v1,v1 80 │ fsrma3 81 │ vfmsub.vv v1,v1,v1 82 │ vse16.v v1,0(a4) 83 │ ret 84 │ .cfi_endproc 85 │ .LFE2: 86 │ .size test_exampe, .-test_exampe 87 │ .section.text.startup,"ax",@progbits 88 │ .align 1 89 │ .globl main 90 │ .type main, @function 91 │ main: 92 │ .LFB3: 93 │ .cfi_startproc 94 │ addisp,sp,-16 95 │ .cfi_def_cfa_offset 16 96 │ sd ra,8(sp) 97 │ .cfi_offset 1, -8 98 │ callinitialize 99 │ frrma3 100 │ fsrmi 2 101 │ lla a2,.LANCHOR0 102 │ lla a4,.LANCHOR0+8 103 │ vsetivlizero,4,e16,m1,ta,ma 104 │ vle16.v v1,0(a2) 105 │ vfnmadd.vv v1,v1,v1 106 │ fsrma3 107 │ vfmsub.vv v1,v1,v1 108 │ vse16.v v1,0(a4) 109 │ callprint_result
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103
--- Comment #3 from Li Pan ---
Interesting the test_example in a separate function other than main will have
the frm restore insn, but there will be no such frm in main function.
62 │ test_exampe:
63 │ frrma2
64 │ fsrmi 2
65 │ lui a3,%hi(a)
66 │ addia3,a3,%lo(a)
67 │ lui a5,%hi(b)
68 │ vsetivlizero,4,e16,m1,ta,ma
69 │ vle16.v v1,0(a3)
70 │ addia5,a5,%lo(b)
71 │ vfnmadd.vv v1,v1,v1
72 │ fsrma2// Has FRM
73 │ vfmsub.vv v1,v1,v1
74 │ vse16.v v1,0(a5)
75 │ ret
76 │ .size test_exampe, .-test_exampe
77 │ .section.text.startup,"ax",@progbits
78 │ .align 1
79 │ .globl main
80 │ .type main, @function
81 │ main:
82 │ addisp,sp,-16
83 │ sd ra,8(sp)
84 │ callinitialize
85 │ fsrmi 2
86 │ lui a3,%hi(a)
87 │ addia3,a3,%lo(a)
88 │ lui a5,%hi(b)
89 │ vsetivlizero,4,e16,m1,ta,ma
90 │ vle16.v v1,0(a3)
91 │ addia5,a5,%lo(b)
92 │ vfnmadd.vv v1,v1,v1 // No FRM
93 │ vfmsub.vv v1,v1,v1
94 │ vse16.v v1,0(a5)
95 │ callprint_result
96 │ ld ra,8(sp)
97 │ li a0,0
98 │ addisp,sp,16
99 │ jr ra
52 │ void test_exampe () {
53 │ int avl = dataLen;
54 │ float16_t* ptr_a = a; float16_t* ptr_b = b;
55 │ size_t vl = __riscv_vsetvl_e16m1(avl);
56 │ vfloat16m1_t va = __riscv_vle16_v_f16m1(ptr_a, vl);
57 │ va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN, vl);
58 │ va = __riscv_vfmsac_vv_f16m1(va, va, va, vl);
59 │ __riscv_vse16_v_f16m1(ptr_b, va, vl);
60 │ }
61 │
62 │ int main(){
63 │ initialize();
64 │
65 │ int avl = dataLen;
66 │ float16_t* ptr_a = a; float16_t* ptr_b = b;
67 │ size_t vl = __riscv_vsetvl_e16m1(avl);
68 │ vfloat16m1_t va = __riscv_vle16_v_f16m1(ptr_a, vl);
69 │ va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN, vl);
70 │ va = __riscv_vfmsac_vv_f16m1(va, va, va, vl);
71 │ __riscv_vse16_v_f16m1(ptr_b, va, vl);
72 │
73 │ print_result ();
74 │
75 │ return 0;
76 │ }
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103 --- Comment #2 from Li Pan --- (In reply to Li Pan from comment #1) > Ack, let me try to reproduce this. Reproduced, the inlined compute delete the restore FRM somewhere, will take a look into it.
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103 Li Pan changed: What|Removed |Added CC||pan2.li at intel dot com --- Comment #1 from Li Pan --- Ack, let me try to reproduce this.
[Bug target/118103] [15 Regression] GCC miscompile rvv intrinsics at `-O3`, missing the `fsrm` instruction to the recover status of frm CSR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118103 Richard Biener changed: What|Removed |Added Keywords||wrong-code Target Milestone|--- |15.0 Target||riscv
