https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68261
Bug ID: 68261 Summary: GCC needs to use optimized version of memcpy Product: gcc Version: 5.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: geir at cray dot com Target Milestone: --- The memcpy routine for GCC needs to be faster. The following test case shows that the Intel compiler implementation of memcpy is over twice as fast as GCC. I realize that memcpy is a part of GLIBC, but the GCC compiler should take advantage of the targetting information being provided and the context of the memcpy in order to provide more optimal code: $ cat test_memcpy.cpp #include <stdio.h> #include <string.h> #include <omp.h> #include <vector> extern "C" void memcpy_custom(double* out, double* in, int length); int main(int argn, char** argv) { int repeat = 200; int N = (1 << 20); std::vector<double> inp(N, 1); std::vector<double> out(N, 2); double t = -omp_get_wtime(); if (argn == 1) { for (int i = 0; i < repeat; i++) { memcpy(&out[0], &inp[0], N * sizeof(double)); } } else { for (int i = 0; i < repeat; i++) { memcpy_custom(&out[0], &inp[0], N); } } t += omp_get_wtime(); printf("performance: %.4f MB/sec.\n", repeat * N * sizeof(double) / t / (1 << 20)); } $ cat memcpy_custom.cpp extern "C" void memcpy_custom(double* out, double* in, int length) { for (int i = 0; i < length; i++) out[i] = in[i]; } $ GCC g++ performance: $ g++ --version g++ (GCC) 5.1.0 20150422 (Cray Inc.) Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ g++ -march=corei7-avx -o gcc.out -O3 -fopenmp memcpy_custom.cpp test_memcpy.cpp $ ./gcc.out performance: 6977.5857 MB/sec. $ Intel icpc performance: $ icpc --version icpc (ICC) 15.0.3 20150407 Copyright (C) 1985-2015 Intel Corporation. All rights reserved. $ icpc -mavx -o intel.out -O3 -qopenmp memcpy_custom.cpp test_memcpy.cpp $ ./intel.out performance: 13055.0563 MB/sec. $ Performance of GCC can be improved by implementing a simple "custom" version of memcpy: $ ./gcc.out 1 performance: 11619.4630 MB/sec. $ ./intel.out 1 performance: 13068.3777 MB/sec. $