https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #60 from CVS Commits ---
The master branch has been updated by H.J. Lu :
https://gcc.gnu.org/g:737355072af4cd0c24a4a8967e1485c1f3a80bfe
commit r11-2200-g737355072af4cd0c24a4a8967e1485c1f3a80bfe
Author: H.J. Lu
Date: Mon Jul 13 09
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #59 from CVS Commits ---
The master branch has been updated by H.J. Lu :
https://gcc.gnu.org/g:fab263ab0fc10ea08409b80afa7e8569438b8d28
commit r11-1970-gfab263ab0fc10ea08409b80afa7e8569438b8d28
Author: H.J. Lu
Date: Wed Jan 23 06
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #58 from H.J. Lu ---
(In reply to Thomas Koenig from comment #57)
> (In reply to H.J. Lu from comment #56)
> > (In reply to Thomas Koenig from comment #55)
> > > (In reply to H.J. Lu from comment #45)
> > > > Created attachment 45510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #57 from Thomas Koenig ---
(In reply to H.J. Lu from comment #56)
> (In reply to Thomas Koenig from comment #55)
> > (In reply to H.J. Lu from comment #45)
> > > Created attachment 45510 [details]
> > > An updated patch
> >
> > HJ, d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #56 from H.J. Lu ---
(In reply to Thomas Koenig from comment #55)
> (In reply to H.J. Lu from comment #45)
> > Created attachment 45510 [details]
> > An updated patch
>
> HJ, do you plan on committing these?
We are collecting perfor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #55 from Thomas Koenig ---
(In reply to H.J. Lu from comment #45)
> Created attachment 45510 [details]
> An updated patch
HJ, do you plan on committing these?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #54 from Chris Elrod ---
I commented elsewhere, but I built trunk a few days ago with H.J.Lu's patches
(attached here) and Thomas Koenig's inlining patches.
With these patches, g++ and all versions of the Fortran code produced excelle
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #53 from rguenther at suse dot de ---
On Thu, 24 Jan 2019, glisse at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
>
> --- Comment #52 from Marc Glisse ---
> (In reply to Thomas Koenig from comment #49
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #52 from Marc Glisse ---
(In reply to Thomas Koenig from comment #49)
> Argh. Sacrificing performance for the sake of bugware...
But note that in this PR (specifically for avx512 vectors on this cpu), the OP
says that the recip vers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #51 from rguenther at suse dot de ---
On Thu, 24 Jan 2019, tkoenig at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
>
> --- Comment #49 from Thomas Koenig ---
> (In reply to Uroš Bizjak from comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #50 from Uroš Bizjak ---
(In reply to Thomas Koenig from comment #49)
> (In reply to Uroš Bizjak from comment #48)
> > (In reply to rguent...@suse.de from comment #47)
> > > >But why don't we generate sqrtps for vector sqrtf?
> > >
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #49 from Thomas Koenig ---
(In reply to Uroš Bizjak from comment #48)
> (In reply to rguent...@suse.de from comment #47)
> > >But why don't we generate sqrtps for vector sqrtf?
> >
> > That's the default for - mrecip back in time we
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #48 from Uroš Bizjak ---
(In reply to rguent...@suse.de from comment #47)
> >But why don't we generate sqrtps for vector sqrtf?
>
> That's the default for - mrecip back in time we benchmarked it and scalar
> recip miscompares sth.
I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #47 from rguenther at suse dot de ---
On January 23, 2019 5:13:12 PM GMT+01:00, "hjl.tools at gmail dot com"
wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
>
>--- Comment #46 from H.J. Lu ---
>We generate sqrtps for scal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #46 from H.J. Lu ---
We generate sqrtps for scalar sqrtf:
[hjl@gnu-skx-1 pr88713]$ cat s.i
extern float sqrtf(float x);
float
rsqrt(float r)
{
return sqrtf (r);
}
[hjl@gnu-skx-1 pr88713]$ gcc -Ofast -S s.i
[hjl@gnu-skx-1 pr88713]$
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
H.J. Lu changed:
What|Removed |Added
Attachment #45509|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
H.J. Lu changed:
What|Removed |Added
Attachment #45508|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
H.J. Lu changed:
What|Removed |Added
Attachment #45507|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #42 from H.J. Lu ---
Created attachment 45507
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45507&action=edit
A patch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #41 from Uroš Bizjak ---
(In reply to H.J. Lu from comment #40)
> (In reply to rguent...@suse.de from comment #39)
> > > >
> > > > Yes. The lack of an expander for the rqsrt operation is probably
> > > > more severe though (causing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #40 from H.J. Lu ---
(In reply to rguent...@suse.de from comment #39)
> > >
> > > Yes. The lack of an expander for the rqsrt operation is probably
> > > more severe though (causing sqrt + approx recip to appear)
> > >
> >
> > Can
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #39 from rguenther at suse dot de ---
On Wed, 23 Jan 2019, hjl.tools at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
>
> --- Comment #38 from H.J. Lu ---
> (In reply to rguent...@suse.de from comment #3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #38 from H.J. Lu ---
(In reply to rguent...@suse.de from comment #37)
> On Wed, 23 Jan 2019, hjl.tools at gmail dot com wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
> >
> > --- Comment #36 from H.J. Lu ---
> > (I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #37 from rguenther at suse dot de ---
On Wed, 23 Jan 2019, hjl.tools at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
>
> --- Comment #36 from H.J. Lu ---
> (In reply to Richard Biener from comment #34)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #36 from H.J. Lu ---
(In reply to Richard Biener from comment #34)
> GCC definitely fails to see the FMA use as opportunity in
> ix86_emit_swsqrtsf, the a == 0 checking is because of the missing
> expander w/o avx512er where we could
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #35 from Chris Elrod ---
> rsqrt:
> .LFB12:
> .cfi_startproc
> vrsqrt28ps (%rsi), %zmm0
> vmovups %zmm0, (%rdi)
> vzeroupper
> ret
>
> (huh? isn't there a NR step missing?)
>
I assume
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
Richard Biener changed:
What|Removed |Added
CC||hjl.tools at gmail dot com
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #33 from Marc Glisse ---
(In reply to Chris Elrod from comment #32)
> (In reply to Marc Glisse from comment #31)
> > What we need to understand is why gcc doesn't try to generate rsqrt
Without -mavx512er, we do not have an expander f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #32 from Chris Elrod ---
(In reply to Marc Glisse from comment #31)
> (In reply to Chris Elrod from comment #30)
> > gcc caclulates the rsqrt directly
>
> No, vrsqrt14ps is just the first step in calculating sqrt here (slightly
> dif
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #31 from Marc Glisse ---
(In reply to Chris Elrod from comment #30)
> gcc caclulates the rsqrt directly
No, vrsqrt14ps is just the first step in calculating sqrt here (slightly
different formula than rsqrt). vrcp14ps shows that it is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #30 from Chris Elrod ---
gcc still (In reply to Marc Glisse from comment #29)
> The main difference I can see is that clang computes rsqrt directly, while
> gcc first computes sqrt and then computes the inverse. Also gcc seems afraid
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #29 from Marc Glisse ---
The main difference I can see is that clang computes rsqrt directly, while gcc
first computes sqrt and then computes the inverse. Also gcc seems afraid of
getting NaN for sqrt(0) so it masks out this value. ix
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #28 from Chris Elrod ---
Created attachment 45501
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45501&action=edit
Minimum working example of the rsqrt problem. Can be compiled with: gcc -Ofast
-S -march=skylake-avx512 -mprefer-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #27 from Chris Elrod ---
g++ -mrecip=all -O3 -fno-signed-zeros -fassociative-math -freciprocal-math
-fno-math-errno -ffinite-math-only -fno-trapping-math -fdump-tree-optimized -S
-march=native -shared -fPIC -mprefer-vector-width=512
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #26 from Chris Elrod ---
> You can try enabling -mrecip to see RSQRT in .optimized - there's
> probably late 1/sqrt optimization on RTL.
No luck. The full commands I used:
gfortran -Ofast -mrecip -S -fdump-tree-optimized -march=nati
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #25 from rguenther at suse dot de ---
On Tue, 22 Jan 2019, elrodc at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
>
> --- Comment #24 from Chris Elrod ---
> The dump looks like this:
>
> vect__67.78_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #24 from Chris Elrod ---
The dump looks like this:
vect__67.78_217 = SQRT (vect__213.77_225);
vect_ui33_68.79_248 = { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0,
1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #23 from rguenther at suse dot de ---
On Tue, 22 Jan 2019, elrodc at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
>
> --- Comment #22 from Chris Elrod ---
> Okay. I did that, and the time went from abou
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #22 from Chris Elrod ---
Okay. I did that, and the time went from about 4.25 microseconds down to 4.0
microseconds. So that is an improvement, but accounts for only a small part of
the difference with the LLVM-compilers.
-O3 -fno-mat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #21 from rguenther at suse dot de ---
On Tue, 22 Jan 2019, elrodc at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
>
> --- Comment #19 from Chris Elrod ---
> To add a little more:
> I used inline asm for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #20 from Chris Elrod ---
To add a little more:
I used inline asm for direct access to the rsqrt instruction "vrsqrt14ps" in
Julia. Without adding a Newton step, the answers are wrong beyond just a couple
significant digits.
With the N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #19 from Chris Elrod ---
To add a little more:
I used inline asm for direct access to the rsqrt instruction "vrsqrt14ps" in
Julia. Without adding a Newton step, the answers are wrong beyond just a couple
significant digits.
With the N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #18 from Chris Elrod ---
I can confirm that the inlined packing does allow gfortran to vectorize the
loop. So allowing packing to inline does seem (to me) like an optimization well
worth making.
However, performance seems to be ab
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #17 from Thomas Koenig ---
What an inline packing would (approximately) produce is this:
subroutine processBPP(X, BPP, N)
integer,intent(in) :: N
real, dimension(N,3), intent(out)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
Thomas Koenig changed:
What|Removed |Added
CC||koenigni at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
Richard Biener changed:
What|Removed |Added
CC||rguenth at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #14 from Chris Elrod ---
It's not really reproducible across runs:
$ time ./gfortvectests
Transpose benchmark completed in 22.7010765
SIMD benchmark completed in 1.37529969
All are equal: F
All are approximately equa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
Jerry DeLisle changed:
What|Removed |Added
CC||jvdelisle at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #12 from Chris Elrod ---
Created attachment 45363
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45363&action=edit
Fortran program for running benchmarks.
Okay, thank you.
I attached a Fortran program you can run to benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
Thomas Koenig changed:
What|Removed |Added
Status|RESOLVED|REOPENED
Last reconfirmed|
50 matches
Mail list logo