Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Richard Biener
On Fri, Nov 24, 2017 at 2:00 PM, Martin Jambor wrote: > On Fri, Nov 24 2017, Richard Biener wrote: >> On Fri, Nov 24, 2017 at 12:53 PM, Martin Jambor wrote: >>> Hi Richi, >>> >>> On Fri, Nov 24 2017, Richard Biener wrote: On Fri, Nov 24, 2017 at 11:57 AM,

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Martin Jambor
On Fri, Nov 24 2017, Richard Biener wrote: > On Fri, Nov 24, 2017 at 12:53 PM, Martin Jambor wrote: >> Hi Richi, >> >> On Fri, Nov 24 2017, Richard Biener wrote: >>> On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener >>> wrote: On Fri, Nov 24, 2017

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Richard Biener
On Fri, Nov 24, 2017 at 12:53 PM, Martin Jambor wrote: > Hi Richi, > > On Fri, Nov 24 2017, Richard Biener wrote: >> On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener >> wrote: >>> On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener > > .. > >> And

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Martin Jambor
Hi Richi, On Fri, Nov 24 2017, Richard Biener wrote: > On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener > wrote: >> On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener .. > And yes, I've been worried about SRA as well here... it _does_ > have some early outs

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Richard Biener
On Fri, Nov 24, 2017 at 11:57 AM, Richard Biener wrote: > On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener > wrote: >> On Thu, Nov 23, 2017 at 4:32 PM, Martin Jambor wrote: >>> Hi, >>> >>> On Mon, Nov 13 2017, Richard

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Richard Biener
On Fri, Nov 24, 2017 at 11:31 AM, Richard Biener wrote: > On Thu, Nov 23, 2017 at 4:32 PM, Martin Jambor wrote: >> Hi, >> >> On Mon, Nov 13 2017, Richard Biener wrote: >>> The main concern here is that GIMPLE is not very well defined for >>> aggregate

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-24 Thread Richard Biener
On Thu, Nov 23, 2017 at 4:32 PM, Martin Jambor wrote: > Hi, > > On Mon, Nov 13 2017, Richard Biener wrote: >> The main concern here is that GIMPLE is not very well defined for >> aggregate copies and that gimple-fold.c happily optimizes >> memcpy (, , sizeof (a)) into a = b; >>

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-23 Thread Jakub Jelinek
On Thu, Nov 23, 2017 at 04:32:43PM +0100, Martin Jambor wrote: > > struct A { short s; long i; long j; }; > > struct A a, b; > > void foo () > > { > > struct A c; > > __builtin_memcpy (, , sizeof (struct A)); > > __builtin_memcpy (, , sizeof (struct A)); > > } > > int main() > > { > >

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-23 Thread Martin Jambor
Hi, On Mon, Nov 13 2017, Richard Biener wrote: > The main concern here is that GIMPLE is not very well defined for > aggregate copies and that gimple-fold.c happily optimizes > memcpy (, , sizeof (a)) into a = b; > > struct A { short s; long i; long j; }; > struct A a, b; > void foo () > { >

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-14 Thread Martin Jambor
Hi, I thought I sent the following email last Friday but found it in my drafts folder right now, so let me send it now so that anybody interested can see what the patch does on Haswell. I have only skimmed through new messages in the thread. I am now looking into something else right now but

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-13 Thread Eric Botcazou
> The chance here is, of course (find the PR, it exists...), that SRA then > decomposes the char[] copy bytewise... > > That said, memcpy folding is easy to fix. The question is of course > what the semantic of VIEW_CONVERTs is (SRA _does_ contain > bail-outs on those). Like if you have > >

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-13 Thread Richard Biener
On November 13, 2017 3:20:16 PM GMT+01:00, Michael Matz wrote: >Hi, > >On Mon, 13 Nov 2017, Richard Biener wrote: > >> The chance here is, of course (find the PR, it exists...), that SRA >then >> decomposes the char[] copy bytewise... >> >> That said, memcpy folding is easy to

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-13 Thread Michael Matz
Hi, On Mon, 13 Nov 2017, Richard Biener wrote: > The chance here is, of course (find the PR, it exists...), that SRA then > decomposes the char[] copy bytewise... > > That said, memcpy folding is easy to fix. The question is of course > what the semantic of VIEW_CONVERTs is (SRA _does_

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-13 Thread Richard Biener
On Mon, Nov 13, 2017 at 2:46 PM, Michael Matz wrote: > Hi, > > On Mon, 13 Nov 2017, Richard Biener wrote: > >> The main concern here is that GIMPLE is not very well defined for >> aggregate copies and that gimple-fold.c happily optimizes >> memcpy (, , sizeof (a)) into a = b; > >

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-13 Thread Michael Matz
Hi, On Mon, 13 Nov 2017, Richard Biener wrote: > The main concern here is that GIMPLE is not very well defined for > aggregate copies and that gimple-fold.c happily optimizes > memcpy (, , sizeof (a)) into a = b; What you missed to mention is that we then discussed about rectifying this

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-13 Thread Richard Biener
On Fri, Nov 3, 2017 at 5:38 PM, Martin Jambor wrote: > Hi, > > On Thu, Oct 26, 2017 at 02:43:02PM +0200, Richard Biener wrote: >> On Thu, Oct 26, 2017 at 2:18 PM, Martin Jambor wrote: >> > >> > Nevertheless, I still intend to experiment with the limit, I sent

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-11-03 Thread Martin Jambor
Hi, On Thu, Oct 26, 2017 at 02:43:02PM +0200, Richard Biener wrote: > On Thu, Oct 26, 2017 at 2:18 PM, Martin Jambor wrote: > > > > Nevertheless, I still intend to experiment with the limit, I sent out > > this RFC exactly so that I don't spend a lot of time benchmarking > >

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-27 Thread Jan Hubicka
> On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka wrote: > >> I think the limit should be on the number of generated copies and not > >> the overall size of the structure... If the struct were composed of > >> 32 individual chars we wouldn't want to emit 32 loads and 32 stores... >

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 4:38 PM, Richard Biener wrote: > On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka wrote: >>> I think the limit should be on the number of generated copies and not >>> the overall size of the structure... If the struct were composed

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka wrote: >> I think the limit should be on the number of generated copies and not >> the overall size of the structure... If the struct were composed of >> 32 individual chars we wouldn't want to emit 32 loads and 32 stores... >> >> I

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Michael Matz
Hi, On Thu, 26 Oct 2017, Martin Jambor wrote: > > 35 bytes seems to be much - what is the code-size impact? > > I will find out and report on that. I need at least 32 bytes (four > long ints) to fix imagemagick, where the problematic structure is: Surely the final heuristic should look at the

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Jan Hubicka
> I think the limit should be on the number of generated copies and not > the overall size of the structure... If the struct were composed of > 32 individual chars we wouldn't want to emit 32 loads and 32 stores... > > I wonder how rep; movb; interacts with store to load forwarding? Is > that

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Richard Biener
On Thu, Oct 26, 2017 at 2:18 PM, Martin Jambor wrote: > Hi, > > On Tue, Oct 17, 2017 at 01:34:54PM +0200, Richard Biener wrote: >> On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor wrote: >> > Hi, >> > >> > I'd like to request comments to the patch below which aims

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-26 Thread Martin Jambor
Hi, On Tue, Oct 17, 2017 at 01:34:54PM +0200, Richard Biener wrote: > On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor wrote: > > Hi, > > > > I'd like to request comments to the patch below which aims to fix PR > > 80689, which is an instance of a store-to-load forwarding stall on

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-17 Thread Richard Biener
On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor wrote: > Hi, > > I'd like to request comments to the patch below which aims to fix PR > 80689, which is an instance of a store-to-load forwarding stall on > x86_64 CPUs in the Image Magick benchmark, which is responsible for a > slow

[RFC, PR 80689] Copy small aggregates element-wise

2017-10-13 Thread Martin Jambor
Hi, I'd like to request comments to the patch below which aims to fix PR 80689, which is an instance of a store-to-load forwarding stall on x86_64 CPUs in the Image Magick benchmark, which is responsible for a slow down of up to 9% compared to gcc 6, depending on options and HW used. (Actually,