[PATCH 2/3] x86: Update memcpy/memset inline strategies for Skylake family CPUs

2021-03-22 Thread H.J. Lu via Gcc-patches
Simply memcpy and memset inline strategies to avoid branches for Skylake family CPUs: 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector load and store for up to 16 * 16 (256) bytes when the data size is fixed and known. 2. Inline only if data size is known to be <= 256.

Re: [PATCH 2/3] x86: Update memcpy/memset inline strategies for Skylake family CPUs

2021-04-05 Thread H.J. Lu via Gcc-patches
On Mon, Mar 22, 2021 at 6:16 AM H.J. Lu wrote: > > Simply memcpy and memset inline strategies to avoid branches for > Skylake family CPUs: > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector >load and store for up to 16 * 16 (256) bytes when the data size is >fixed an

Re: [PATCH 2/3] x86: Update memcpy/memset inline strategies for Skylake family CPUs

2021-04-05 Thread Jan Hubicka
> > /* skylake_cost should produce code tuned for Skylake familly of CPUs. */ > > static stringop_algs skylake_memcpy[2] = { > > - {libcall, {{1024, rep_prefix_4_byte, true}, {-1, libcall, false}}}, > > - {libcall, {{16, loop, false}, {512, unrolled_loop, false}, > > - {-1, libca

Re: [PATCH 2/3] x86: Update memcpy/memset inline strategies for Skylake family CPUs

2021-04-05 Thread H.J. Lu via Gcc-patches
On Mon, Apr 5, 2021 at 2:14 PM Jan Hubicka wrote: > > > > /* skylake_cost should produce code tuned for Skylake familly of CPUs. > > > */ > > > static stringop_algs skylake_memcpy[2] = { > > > - {libcall, {{1024, rep_prefix_4_byte, true}, {-1, libcall, false}}}, > > > - {libcall, {{16, loo

Re: [PATCH 2/3] x86: Update memcpy/memset inline strategies for Skylake family CPUs

2021-04-06 Thread Hongyu Wang via Gcc-patches
> Do you know what of the three changes (preferring reps/stosb, > CLEAR_RATIO and algorithm choice changes) cause the two speedups > on eebmc? A extracted testcase from nnet_test in https://godbolt.org/z/c8KdsohTP This loop is transformed to builtin_memcpy and builtin_memset with size 280. Curre

Re: [PATCH 2/3] x86: Update memcpy/memset inline strategies for Skylake family CPUs

2021-04-06 Thread Jan Hubicka
> > Do you know what of the three changes (preferring reps/stosb, > > CLEAR_RATIO and algorithm choice changes) cause the two speedups > > on eebmc? > > A extracted testcase from nnet_test in https://godbolt.org/z/c8KdsohTP > > This loop is transformed to builtin_memcpy and builtin_memset with si

Re: [PATCH 2/3] x86: Update memcpy/memset inline strategies for Skylake family CPUs

2021-04-06 Thread H.J. Lu via Gcc-patches
On Tue, Apr 6, 2021 at 2:51 AM Jan Hubicka wrote: > > > > Do you know what of the three changes (preferring reps/stosb, > > > CLEAR_RATIO and algorithm choice changes) cause the two speedups > > > on eebmc? > > > > A extracted testcase from nnet_test in https://godbolt.org/z/c8KdsohTP > > > > This