http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56197



             Bug #: 56197

           Summary: [SH] Use calculated jump address instead of using a

                    jump table

    Classification: Unclassified

           Product: gcc

           Version: 4.8.0

            Status: UNCONFIRMED

          Severity: enhancement

          Priority: P3

         Component: target

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: olege...@gcc.gnu.org

            Target: sh*-*-*





I ran across this one while checking out PR 55146.

If there are a lot of cases in a switch and the length of the case blocks is

more or less constant, it can be beneficial to calculate the jump address and

eliminate the jump table.  For example, code such as



int

test (int arg)

{

  int rc;

  switch (arg)

    {

    case 0:

      asm ("nop\n\tnop\n\t"

           "mov r4,%0"

           : "=r" (rc)

           : "r" (arg));

      break;

    case 1:

      asm ("nop\n\tnop\n\t"

           "mov r5,%0"

           : "=r" (rc)

           : "r" (arg));

      break;

    case 2:

      asm ("nop\n\tnop\n\t"

           "mov r6,%0"

           : "=r" (rc)

           : "r" (arg));



    [...]



    case 9:

      asm ("nop\n\tnop\n\t"

           "mov r7,%0"

           : "=r" (rc)

           : "r" (arg));

      break;

    }

  return rc;

}





Compiled with -O2 results in:



_test:

        mov     #9,r1

        cmp/hi  r1,r4

        bt      .L2

        mova    .L4,r0

        mov.b   @(r0,r4),r4

        add     r0,r4

        jmp     @r4

        nop

        .align 2

.L4:

        .byte   .L3-.L4

        .byte   .L5-.L4

        .byte   .L6-.L4

        .byte   .L7-.L4

        .byte   .L8-.L4

        .byte   .L9-.L4

        .byte   .L10-.L4

        .byte   .L11-.L4

        .byte   .L12-.L4

        .byte   .L13-.L4

        .align 1

.L13:

        mov     #9,r0

        nop

        nop

        mov r7,r0

        .align 2

.L2:

        rts    

        nop

        .align 1

.L12:

        mov     #8,r0



        [...]



For a lot of cases, the jump table might become large and is likely to cause

data cache misses.  The following might be better in that case (assuming that

the length of each case block is 16 bytes):



        mov     #9,r1

        cmp/hi  r1,r4

        bt      .L2

        shll2   r4

        shll2   r4

        add     #.Lcase_0 - .Lcase_default,r4

        braf    @r4

        nop



.Lcase_default:

        rts

        nop



        .align 4

.Lcase_0:

        mov     #0,r0

        nop

        nop

        mov     r4,r0

        rts    

        nop



        .align 4

.Lcase_1:



        [...]



        .align 4

.Lcase_9:

        mov     #0,r0

        nop

        nop

        mov     r7,r0

        rts    

        nop



However, this requires the jump table to be sorted in ascending order and the

length of the case blocks should not vary too much.



Maybe this optimization could also be beneficial on other targets than SH.  At

least PR 43462 looks somewhat related to it.

Reply via email to