https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128
--- Comment #5 from mjr19 at cam dot ac.uk ---
I think in general using partial sums improves accuracy.
If one assumes that all of the data have the same sign and similar magnitude,
then by the time the sum is nearly complete one is adding a sin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128
--- Comment #3 from mjr19 at cam dot ac.uk ---
It seems that most of these are in-line expanded by gfortran-14.1, at least in
some cases.
function foo(a,n)
real(kind(1d0))::a(*),foo
integer::n
foo=sum(a(1:n))
end function foo
and
funct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767
--- Comment #8 from mjr19 at cam dot ac.uk ---
If it is tricky to teach gfortran that it can flip the signs of alternate
elements in a vector trivially with an xor, would a possible step to an
improvement be to teach it that the cost of vpermpd (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128
--- Comment #1 from mjr19 at cam dot ac.uk ---
The same comment applies to maxval and minval, which vectorise with -Ofast only
for -mavx2, although the answer will be independent of the ordering of the
scalar min/max operations.
In contrast, ial
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116109
--- Comment #3 from mjr19 at cam dot ac.uk ---
It might be helpful if GCC considered this optimisation separately from
unrolling.
Traditional unrolling attempts to reduce the overhead of the (integer) loop
control instructions, but with floating
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128
Bug ID: 116128
Summary: missed optimisation: fortran sum instrinsic performed
in order
Product: gcc
Version: 14.1.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116109
Bug ID: 116109
Summary: Missed optimisation: unnecessary register dependency
on reduction
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115709
--- Comment #3 from mjr19 at cam dot ac.uk ---
Created attachment 58558
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58558&action=edit
Demo of effect of vperm rearrangement
I still believe that my code is correct. To make what I propose
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115711
Bug ID: 115711
Summary: Fortran: extra malloc and copy with transfer
Product: gcc
Version: 14.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115710
Bug ID: 115710
Summary: fortran complex abs does not vectorise
Product: gcc
Version: 14.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115709
Bug ID: 115709
Summary: missed optimisation: vperms not reordered to eliminate
Product: gcc
Version: 14.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Co
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324
--- Comment #8 from mjr19 at cam dot ac.uk ---
Ooops -- timings not ns/iteration as claimed, nor even comparable between the
m3spf and m4spf examples, but they are consistent within each example.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324
--- Comment #7 from mjr19 at cam dot ac.uk ---
The patch to GCC 15 in commit
r15-1508-g59221dc587f369695d9b0c2f73aedf8458931f0f from pr 68855 has made a
significant improvement to the optimisation of these examples at -O3, causing
the -Ofast ver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115563
--- Comment #6 from mjr19 at cam dot ac.uk ---
A further comment to aid others reading this report. It is not just unnecessary
brackets which used to prevent vectorisation, but also necessary ones.
subroutine foo(a,b,c,n)
complex (kind(1d0)) :
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115563
--- Comment #5 from mjr19 at cam dot ac.uk ---
I'm glad this was useful, and thanks for the impressively rapid fix. I stumbled
across this by chance whilst trying to construct a minimal example for a rather
different missed vectorisation case.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115563
Bug ID: 115563
Summary: Unnecessary brackets prevent fortran vectorisation
Product: gcc
Version: 14.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Compon
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107294
mjr19 at cam dot ac.uk changed:
What|Removed |Added
CC||mjr19 at cam dot ac.uk
--- Comm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767
--- Comment #7 from mjr19 at cam dot ac.uk ---
Another manifestation of this issue in GCC 13.1 and 14.1 is that the loop
do i=1,n
c(i)=a(i)*c(i)*(0d0,1d0)
enddo
takes about twice as long to run as
do i=1,n
c(i)=a(i)*(0d0,1d0)*c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324
--- Comment #5 from mjr19 at cam dot ac.uk ---
Note that bug 114767 also turns out to be a case in which the inability to
alternate neg and nop along a vector leads to poor performance with some
operations on the complex type. That optimisation i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767
--- Comment #6 from mjr19 at cam dot ac.uk ---
I was starting to wonder whether this issue might be related to that in bug
114324, which is a slightly more complicated example in which multiplication by
a purely imaginary number destroys vectoris
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767
--- Comment #4 from mjr19 at cam dot ac.uk ---
An issue which I suspect is related is shown by
subroutine zradd(c,n)
integer :: i,n
complex(kind(1d0)) :: c(*)
do i=1,n
c(i)=c(i)+1d0
enddo
end subroutine
If compiled with gfortran-1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767
--- Comment #2 from mjr19 at cam dot ac.uk ---
Ah, I see. An inability to alternate negation with noop also means that
conjugation is treated suboptimally.
do i=1,n
c(i)=conjg(c(i))
enddo
Here gfortran-13 and -14 are differently subopt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767
Bug ID: 114767
Summary: gfortran AVX2 complex multiplication by (0d0,1d0)
suboptimal
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324
--- Comment #4 from mjr19 at cam dot ac.uk ---
Created attachment 57713
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57713&action=edit
Second testcase, very similar to first
Thank you for looking into this. The real code in question has
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324
Bug ID: 114324
Summary: AVX2 vectorisation performance regression with
gfortran 13/14
Product: gcc
Version: 13.1.0
Status: UNCONFIRMED
Severity: normal
25 matches
Mail list logo