http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53938

             Bug #: 53938
           Summary: ARM target generates sub-optimal code (extra
                    instructions) on load from memory
    Classification: Unclassified
           Product: gcc
           Version: 4.6.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: gregpsm...@live.co.uk


Created attachment 27781
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27781
Example C source code

We are targetting an embedded device and we do a lot of work accessing an FPGA
(but this applies just as well to memory access). It has annoyed me for years
that the GCC compiler emits unncessary code, wasting memory and cycles when
reading 8 and 16-bit values.

The attached script shows opportunities to generate better code. when compiled
with:

gcc -c -O3 -mcpu=arm946e-s codegen.c

It compiles to (I have added comments):

<DeviceAccess>
    mov   r2, #0xE0000000  // base address of the device
    ldrb  r1, [r2]         // load an unsigned byte, 0 extend
    ldrb  r12, [r2]        // load signed byte - WHY NOT ldrsb?
    and   r1, r1, #0xFF    // WHAT IS THIS FOR
    ldrh  r3, [r2]         // load unsigned short
    tst   r1, #0x80        // if (i & 0x80)
    movne r1, #0           //     i = 0
    lsl   r12, r12, #24    // sign extend j (but could be avoided)
    tst   r3, #0x80        // if (k & 0x80)
    ldrh  r0, [r2]         // load signed short - WHY NOT ldrsh?
    movne r3, #0           //     k = 0
    add   r1, r1, r12, asr #24 // add sign extended
    add   r3, r1, r3
    lsl   r0, r0, #16      // sign extend l
    add   r0, r3, r0, asr #16
    bx lr

There are two issues:

1) There is a completely redundant and r1,r1,#0xff. This does not occur when
loading the unsigned short (which is why I have the similar code for loading an
unsigned short).
2) There is unneccesary sign extension taking place. ARM has allowed signed
loads of 8 and 16-bit values since v4. Spotting this has to be opportunistic as
there are offset restrictions.

Ideally the code would look like:
    mov   r2, #0xE0000000 // base address of the device
    ldrb  r1, [r2]        // load an unsigned byte, 0 extend
    ldrsb r12, [r2]       // load signed byte
    ldrh  r3, [r2]        // load unsigned short
    tst   r1, #0x80       // if (i & 0x80)
    movne r1, #0          //     i = 0
    tst   r3, #0x80       // if (k & 0x80)
    ldrsh r0, [r2]        // load signed short, extend to 32-bits
    movne r3, #0          //     k = 0
    add   r1, r1, r12     // add sign extended
    add   r3, r1, r3
    add   r0, r3, r0
    bx lr

Reply via email to