https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110573

            Bug ID: 110573
           Summary: MIPS64: Enhancement PR of load of pointer to atomic
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: luke.geeson at cs dot ucl.ac.uk
                CC: luke.geeson at cs dot ucl.ac.uk
  Target Milestone: ---
            Target: MIPS64

This is my first report for GCC so please forgive me if I make a mistake. This
is an enhancement report - the behaviour of the program is ok, but an
instruction could be removed to be consistent with the non-atomic variant of
the below code.

Consider the code in GCC 13.1.0 built for MIPS64
(https://godbolt.org/z/as68sEWda)

```
void P1() {
  int r0;
  r0 = *y;
  if(r0 == (1))   {
    atomic_store_explicit(x,1,memory_order_release);  }

  *P1_r0 = r0;
}
```
When compiled using `-O1 -pthread -std=c11 -g -c` the branch to label L7 loads
a pointer to `P1_r0` using the delay slot. Likewise `P1_r0` is loaded the line
above L7 when the branch is taken. 


```
                                            #... (code in if branch)
        ld      $3,%got_disp(x)($5)

        ld      $3,%got_disp(P1_r0)($5).    # ld P1_r0 on branch taken
.L7:
        ld      $3,0($3)
        jr      $31
        sw      $2,0($3)

.L6:
        ld      $3,0($3)
        sync
        li      $4,1                        # 0x1
        sw      $4,0($3)
        b       .L7
        ld      $3,%got_disp(P1_r0)($5).  # ld P1_r0 on branch not taken
```


The ld could be moved into L7, thus saving one instruction:


```
                                            #... (code in if branch)
        ld      $3,%got_disp(x)($5)
.L7:
        ld      $3,0($3)
        jr      $31
        sw      $2,0($3)

.L6:    ld      $3,%got_disp(P1_r0)($5). 
        ld      $3,0($3)
        sync
        li      $4,1                        # 0x1
        b       .L7
        sw      $4,0($3)
```

The above optimisation already occurs if x is non-atomic (see
https://godbolt.org/z/8dhxvsE18)

The optimisation can also be applied for `-O2`
(https://godbolt.org/z/8aMj6xqTq)
as well.

I hope this helps.

Reply via email to