https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77438

            Bug ID: 77438
           Summary: MMX intrinsic on x86_64 generates bloated code
           Product: gcc
           Version: 4.8.4
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: acahalan at gmail dot com
  Target Milestone: ---

__m64 __attribute__((noinline)) mmx(__m64 x, __m64 y){return _mm_add_pi8(x,y);}

That gives 6 lines of assembly. (movq,movdq2q,paddb,movq,movq,ret) Stuff even
gets moved to the stack. Good code would just do the operation in an xmm
register instead of moving it to a mm register. Failing that, gcc could at
least avoid using the stack.

Reply via email to