https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
rsandifo at gcc dot gnu.org changed:
What|Removed |Added
CC||rsandifo at gcc dot
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #17 from Alexander Monakov ---
To me this suggests that in fact it's okay to carry the combined form in RTL up
to register allocation, but RA should decompose it to load+fma instead of
inserting a register copy that preserves the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #16 from Alexander Monakov ---
Mostly because prior to register allocation the compiler does not naturally see
that x = *mem + a*b will need an extra mov when both 'a' and 'b' are live (as
in that case registers allocated for them
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #15 from Michael_S ---
(In reply to Hongtao.liu from comment #14)
> > Still I don't understand why compiler does not compare the cost of full loop
> > body after combining to the cost before combining and does not come to
> >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #14 from Hongtao.liu ---
> Still I don't understand why compiler does not compare the cost of full loop
> body after combining to the cost before combining and does not come to
> conclusion that combining increased the cost.
As
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #13 from Michael_S ---
(In reply to Hongtao.liu from comment #11)
> (In reply to Michael_S from comment #10)
> > (In reply to Hongtao.liu from comment #9)
> > > (In reply to Michael_S from comment #8)
> > > > What are values of gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #12 from Hongtao.liu ---
Correct AVX256 load cost outside of register allocation and vectorizer
> they are
> 1. AVX256 Load 16
> 2. FMA3 ymm,ymm,ymm --- 16
> 3. AVX256 Regmove --- 2
> 4. FMA3 mem,ymm,ymm --- 32
That's why
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #11 from Hongtao.liu ---
(In reply to Michael_S from comment #10)
> (In reply to Hongtao.liu from comment #9)
> > (In reply to Michael_S from comment #8)
> > > What are values of gcc "loop" cost of the relevant instructions now?
> >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #10 from Michael_S ---
(In reply to Hongtao.liu from comment #9)
> (In reply to Michael_S from comment #8)
> > What are values of gcc "loop" cost of the relevant instructions now?
> > 1. AVX256 Load
> > 2. FMA3 ymm,ymm,ymm
> > 3.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #9 from Hongtao.liu ---
(In reply to Michael_S from comment #8)
> What are values of gcc "loop" cost of the relevant instructions now?
> 1. AVX256 Load
> 2. FMA3 ymm,ymm,ymm
> 3. AVX256 Regmove
> 4. FMA3 mem,ymm,ymm
For skylake,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #8 from Michael_S ---
What are values of gcc "loop" cost of the relevant instructions now?
1. AVX256 Load
2. FMA3 ymm,ymm,ymm
3. AVX256 Regmove
4. FMA3 mem,ymm,ymm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #7 from Hongtao.liu ---
(In reply to Michael_S from comment #6)
> Why do you see it as addition of peephole pattern?
> I see it as removal. Like, "do what's written in the source and don't try to
> be tricky".
> Probably, I am too
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #6 from Michael_S ---
Why do you see it as addition of peephole pattern?
I see it as removal. Like, "do what's written in the source and don't try to be
tricky".
Probably, I am too removed from how compilers work :(
Or, may be,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #5 from Hongtao.liu ---
(In reply to Michael_S from comment #3)
> (In reply to Alexander Monakov from comment #2)
> > Richard, though register moves are resolved by renaming, they still occupy a
> > uop in all stages except
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #4 from Alexander Monakov ---
> More so, gcc variant occupies 2 reservation station entries (2 fused uOps) vs
> 4 entries by de-transformed sequence.
I don't think this is true for the test at hand? With base+offset memory
operand
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #3 from Michael_S ---
(In reply to Alexander Monakov from comment #2)
> Richard, though register moves are resolved by renaming, they still occupy a
> uop in all stages except execution, and since renaming is one of the
> narrowest
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
Richard Biener changed:
What|Removed |Added
Target|i386,x86-64 |x86_64-*-* i?86-*-*
18 matches
Mail list logo