http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57037



             Bug #: 57037

           Summary: GCC does not generate non-temporal stores on i386 with

                    SSE2+

    Classification: Unclassified

           Product: gcc

           Version: 4.8.0

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: target

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: anl...@gmx.de





Hello,



it appears that gcc does not generate non-temporal stores

available on i386 at least with SSE2.  This is an important

optimization for some memory-bandwidth limited codes.



Example: for the stream triad kernel,



subroutine stream_kernel_triad (a, b, c, n, s)

  integer         , intent(in) :: n

  double precision             :: a(*), b(*), c(*)

  double precision, intent(in) :: s



  integer :: j

  do j = 1,n

     a(j) = b(j) + s*c(j)

  end do

end subroutine stream_kernel_triad



the Intel compiler generates vectorized code with a

throughput that is 25% higher on my Core2 than when

disabling the generation of non-temporal stores

(i.e. compiling with "-opt-streaming-stores never").



gfortran (using -Ofast -fprefetch-loop-arrays) exactly

reproduces the performance of the Intel compiler without

temporal stores.  It appears that this is an important

optimization.



Harald

Reply via email to