[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-15 Thread jamborm at gcc dot gnu dot org
--- Comment #18 from jamborm at gcc dot gnu dot org 2010-06-15 09:48 --- Subject: Bug 44423 Author: jamborm Date: Tue Jun 15 09:48:39 2010 New Revision: 160775 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=160775 Log: 2010-06-15 Martin Jambor mjam...@suse.cz PR

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-15 Thread jamborm at gcc dot gnu dot org
--- Comment #19 from jamborm at gcc dot gnu dot org 2010-06-15 10:04 --- This is now fixed on both the trunk and the 4.5 branch. -- jamborm at gcc dot gnu dot org changed: What|Removed |Added

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-14 Thread jamborm at gcc dot gnu dot org
--- Comment #15 from jamborm at gcc dot gnu dot org 2010-06-14 12:39 --- (In reply to comment #14) SSE performance is fine again, thanks a lot! One more question, if that's OK... Depending on ARRSZ the testcase uses wildly varying amounts of CPU time; it's about half a second for

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-14 Thread martin at mpa-garching dot mpg dot de
--- Comment #16 from martin at mpa-garching dot mpg dot de 2010-06-14 12:46 --- (In reply to comment #15) I have found the problem in the meantime ... it's my mistake, sorry about the noise :( The problem is that I did not explicitly zero the arrays in main(), so they apparently

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-14 Thread jamborm at gcc dot gnu dot org
--- Comment #17 from jamborm at gcc dot gnu dot org 2010-06-14 12:50 --- OK, I did not put much effort into my thinking about it :-) Yes, the testcase is fine as it is. I'm not testing the patch on the 4.5 branch and will commit it today if everything goes fine. --

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-09 Thread jamborm at gcc dot gnu dot org
--- Comment #11 from jamborm at gcc dot gnu dot org 2010-06-09 09:02 --- (In reply to comment #10) (In reply to comment #9) (In reply to comment #8) I don't think you need flow-sensitivity. Basically when you have only aggregate uses (as in this case) Vectors are

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-09 Thread jamborm at gcc dot gnu dot org
--- Comment #12 from jamborm at gcc dot gnu dot org 2010-06-09 09:05 --- (In reply to comment #11) D.2464.m[0] = D.2473_20; D.2464.m[1] = D.2472_19; D.2464.m[2] = D.2471_18; *b_1(D) = D.2464; D.2464 will be dead after scalarization. If D.2464 was larger than

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-09 Thread jamborm at gcc dot gnu dot org
--- Comment #13 from jamborm at gcc dot gnu dot org 2010-06-09 11:20 --- Subject: Bug 44423 Author: jamborm Date: Wed Jun 9 11:20:03 2010 New Revision: 160462 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=160462 Log: 2010-06-09 Martin Jambor mjam...@suse.cz PR

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-09 Thread martin at mpa-garching dot mpg dot de
--- Comment #14 from martin at mpa-garching dot mpg dot de 2010-06-09 12:06 --- SSE performance is fine again, thanks a lot! One more question, if that's OK... Depending on ARRSZ the testcase uses wildly varying amounts of CPU time; it's about half a second for ARRSZ=1024, but almost

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-08 Thread jamborm at gcc dot gnu dot org
--- Comment #4 from jamborm at gcc dot gnu dot org 2010-06-08 13:16 --- Mine -- jamborm at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-08 Thread martin at mpa-garching dot mpg dot de
--- Comment #5 from martin at mpa-garching dot mpg dot de 2010-06-08 13:54 --- (In reply to comment #2) We have (4.4): bb 2: va.f[0] = a-r; va.f[1] = a-g; va.f[2] = a-b; va.f[3] = 0.0; pretmp.40 = va.v; ivtmp.61 = 0; [...] Could you please tell me the compiler

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-08 Thread rguenth at gcc dot gnu dot org
--- Comment #6 from rguenth at gcc dot gnu dot org 2010-06-08 14:02 --- (In reply to comment #5) (In reply to comment #2) We have (4.4): bb 2: va.f[0] = a-r; va.f[1] = a-g; va.f[2] = a-b; va.f[3] = 0.0; pretmp.40 = va.v; ivtmp.61 = 0; [...] Could

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-08 Thread jamborm at gcc dot gnu dot org
--- Comment #7 from jamborm at gcc dot gnu dot org 2010-06-08 14:29 --- I don't think I can fix this bug in its most general form without doing some flow-sensitive decisions (which can be difficult for aggregates) and without causing PR 43846 again. (Aggregate copy-propagation and

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-08 Thread rguenth at gcc dot gnu dot org
--- Comment #8 from rguenth at gcc dot gnu dot org 2010-06-08 14:50 --- I don't think you need flow-sensitivity. Basically when you have only aggregate uses (as in this case) then you only want to scalarize if the destination of the use is scalarized as well (to be able to copyprop out

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-08 Thread jamborm at gcc dot gnu dot org
--- Comment #9 from jamborm at gcc dot gnu dot org 2010-06-08 15:00 --- (In reply to comment #8) I don't think you need flow-sensitivity. Basically when you have only aggregate uses (as in this case) Vectors are considered scalars in GCC. That is why the solutions described above

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-08 Thread rguenth at gcc dot gnu dot org
--- Comment #10 from rguenth at gcc dot gnu dot org 2010-06-08 15:11 --- (In reply to comment #9) (In reply to comment #8) I don't think you need flow-sensitivity. Basically when you have only aggregate uses (as in this case) Vectors are considered scalars in GCC. That is

[Bug tree-optimization/44423] [4.5/4.6 Regression] Massive performance regression in SSE code due to SRA

2010-06-05 Thread rguenth at gcc dot gnu dot org
--- Comment #3 from rguenth at gcc dot gnu dot org 2010-06-05 10:56 --- Ok. Fact is that no pass can move invariant store/load pairs. But that's pre-existing - the main issue is that the new SRA implementation ends up rematerializing the stores inside the loop! Diff of pre-esra vs.