https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110573
Bug ID: 110573 Summary: MIPS64: Enhancement PR of load of pointer to atomic Product: gcc Version: 13.1.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: luke.geeson at cs dot ucl.ac.uk CC: luke.geeson at cs dot ucl.ac.uk Target Milestone: --- Target: MIPS64 This is my first report for GCC so please forgive me if I make a mistake. This is an enhancement report - the behaviour of the program is ok, but an instruction could be removed to be consistent with the non-atomic variant of the below code. Consider the code in GCC 13.1.0 built for MIPS64 (https://godbolt.org/z/as68sEWda) ``` void P1() { int r0; r0 = *y; if(r0 == (1)) { atomic_store_explicit(x,1,memory_order_release); } *P1_r0 = r0; } ``` When compiled using `-O1 -pthread -std=c11 -g -c` the branch to label L7 loads a pointer to `P1_r0` using the delay slot. Likewise `P1_r0` is loaded the line above L7 when the branch is taken. ``` #... (code in if branch) ld $3,%got_disp(x)($5) ld $3,%got_disp(P1_r0)($5). # ld P1_r0 on branch taken .L7: ld $3,0($3) jr $31 sw $2,0($3) .L6: ld $3,0($3) sync li $4,1 # 0x1 sw $4,0($3) b .L7 ld $3,%got_disp(P1_r0)($5). # ld P1_r0 on branch not taken ``` The ld could be moved into L7, thus saving one instruction: ``` #... (code in if branch) ld $3,%got_disp(x)($5) .L7: ld $3,0($3) jr $31 sw $2,0($3) .L6: ld $3,%got_disp(P1_r0)($5). ld $3,0($3) sync li $4,1 # 0x1 b .L7 sw $4,0($3) ``` The above optimisation already occurs if x is non-atomic (see https://godbolt.org/z/8dhxvsE18) The optimisation can also be applied for `-O2` (https://godbolt.org/z/8aMj6xqTq) as well. I hope this helps.