[Bug fortran/116128] missed optimisation: fortran sum instrinsic performed in order

2024-08-15 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128 --- Comment #5 from mjr19 at cam dot ac.uk --- I think in general using partial sums improves accuracy. If one assumes that all of the data have the same sign and similar magnitude, then by the time the sum is nearly complete one is adding a sin

[Bug fortran/116128] missed optimisation: fortran sum instrinsic performed in order

2024-08-06 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128 --- Comment #3 from mjr19 at cam dot ac.uk --- It seems that most of these are in-line expanded by gfortran-14.1, at least in some cases. function foo(a,n) real(kind(1d0))::a(*),foo integer::n foo=sum(a(1:n)) end function foo and funct

[Bug tree-optimization/114767] gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal

2024-08-01 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767 --- Comment #8 from mjr19 at cam dot ac.uk --- If it is tricky to teach gfortran that it can flip the signs of alternate elements in a vector trivially with an xor, would a possible step to an improvement be to teach it that the cost of vpermpd (

[Bug fortran/116128] missed optimisation: fortran sum instrinsic performed in order

2024-07-31 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128 --- Comment #1 from mjr19 at cam dot ac.uk --- The same comment applies to maxval and minval, which vectorise with -Ofast only for -mavx2, although the answer will be independent of the ordering of the scalar min/max operations. In contrast, ial

[Bug tree-optimization/116109] Missed optimisation: unnecessary register dependency on reduction

2024-07-30 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116109 --- Comment #3 from mjr19 at cam dot ac.uk --- It might be helpful if GCC considered this optimisation separately from unrolling. Traditional unrolling attempts to reduce the overhead of the (integer) loop control instructions, but with floating

[Bug fortran/116128] New: missed optimisation: fortran sum instrinsic performed in order

2024-07-29 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128 Bug ID: 116128 Summary: missed optimisation: fortran sum instrinsic performed in order Product: gcc Version: 14.1.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/116109] New: Missed optimisation: unnecessary register dependency on reduction

2024-07-26 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116109 Bug ID: 116109 Summary: Missed optimisation: unnecessary register dependency on reduction Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/115709] missed optimisation: vperms not reordered to eliminate

2024-07-02 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115709 --- Comment #3 from mjr19 at cam dot ac.uk --- Created attachment 58558 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58558&action=edit Demo of effect of vperm rearrangement I still believe that my code is correct. To make what I propose

[Bug fortran/115711] New: Fortran: extra malloc and copy with transfer

2024-06-29 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115711 Bug ID: 115711 Summary: Fortran: extra malloc and copy with transfer Product: gcc Version: 14.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: f

[Bug fortran/115710] New: fortran complex abs does not vectorise

2024-06-29 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115710 Bug ID: 115710 Summary: fortran complex abs does not vectorise Product: gcc Version: 14.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran

[Bug tree-optimization/115709] New: missed optimisation: vperms not reordered to eliminate

2024-06-29 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115709 Bug ID: 115709 Summary: missed optimisation: vperms not reordered to eliminate Product: gcc Version: 14.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Co

[Bug tree-optimization/114324] [13/14/15 Regression] AVX2 vectorisation performance regression with gfortran 13/14

2024-06-25 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324 --- Comment #8 from mjr19 at cam dot ac.uk --- Ooops -- timings not ns/iteration as claimed, nor even comparable between the m3spf and m4spf examples, but they are consistent within each example.

[Bug tree-optimization/114324] [13/14/15 Regression] AVX2 vectorisation performance regression with gfortran 13/14

2024-06-25 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324 --- Comment #7 from mjr19 at cam dot ac.uk --- The patch to GCC 15 in commit r15-1508-g59221dc587f369695d9b0c2f73aedf8458931f0f from pr 68855 has made a significant improvement to the optimisation of these examples at -O3, causing the -Ofast ver

[Bug fortran/115563] Unnecessary brackets prevent fortran vectorisation

2024-06-24 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115563 --- Comment #6 from mjr19 at cam dot ac.uk --- A further comment to aid others reading this report. It is not just unnecessary brackets which used to prevent vectorisation, but also necessary ones. subroutine foo(a,b,c,n) complex (kind(1d0)) :

[Bug fortran/115563] Unnecessary brackets prevent fortran vectorisation

2024-06-21 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115563 --- Comment #5 from mjr19 at cam dot ac.uk --- I'm glad this was useful, and thanks for the impressively rapid fix. I stumbled across this by chance whilst trying to construct a minimal example for a rather different missed vectorisation case.

[Bug fortran/115563] New: Unnecessary brackets prevent fortran vectorisation

2024-06-20 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115563 Bug ID: 115563 Summary: Unnecessary brackets prevent fortran vectorisation Product: gcc Version: 14.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Compon

[Bug fortran/107294] Missed optimization: multiplying real with complex number in Fortran (only)

2024-06-17 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107294 mjr19 at cam dot ac.uk changed: What|Removed |Added CC||mjr19 at cam dot ac.uk --- Comm

[Bug tree-optimization/114767] gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal

2024-05-14 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767 --- Comment #7 from mjr19 at cam dot ac.uk --- Another manifestation of this issue in GCC 13.1 and 14.1 is that the loop do i=1,n c(i)=a(i)*c(i)*(0d0,1d0) enddo takes about twice as long to run as do i=1,n c(i)=a(i)*(0d0,1d0)*c

[Bug tree-optimization/114324] [13/14/15 Regression] AVX2 vectorisation performance regression with gfortran 13/14

2024-05-01 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324 --- Comment #5 from mjr19 at cam dot ac.uk --- Note that bug 114767 also turns out to be a case in which the inability to alternate neg and nop along a vector leads to poor performance with some operations on the complex type. That optimisation i

[Bug tree-optimization/114767] gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal

2024-04-19 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767 --- Comment #6 from mjr19 at cam dot ac.uk --- I was starting to wonder whether this issue might be related to that in bug 114324, which is a slightly more complicated example in which multiplication by a purely imaginary number destroys vectoris

[Bug tree-optimization/114767] gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal

2024-04-18 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767 --- Comment #4 from mjr19 at cam dot ac.uk --- An issue which I suspect is related is shown by subroutine zradd(c,n) integer :: i,n complex(kind(1d0)) :: c(*) do i=1,n c(i)=c(i)+1d0 enddo end subroutine If compiled with gfortran-1

[Bug tree-optimization/114767] gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal

2024-04-18 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767 --- Comment #2 from mjr19 at cam dot ac.uk --- Ah, I see. An inability to alternate negation with noop also means that conjugation is treated suboptimally. do i=1,n c(i)=conjg(c(i)) enddo Here gfortran-13 and -14 are differently subopt

[Bug fortran/114767] New: gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal

2024-04-18 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767 Bug ID: 114767 Summary: gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/114324] [13/14 Regression] AVX2 vectorisation performance regression with gfortran 13/14

2024-03-15 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324 --- Comment #4 from mjr19 at cam dot ac.uk --- Created attachment 57713 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57713&action=edit Second testcase, very similar to first Thank you for looking into this. The real code in question has

[Bug fortran/114324] New: AVX2 vectorisation performance regression with gfortran 13/14

2024-03-13 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324 Bug ID: 114324 Summary: AVX2 vectorisation performance regression with gfortran 13/14 Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal