https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
Li Pan changed:
What|Removed |Added
CC||pan2.li at intel dot com
--- Comment #20 from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #19 from Richard Biener ---
(In reply to Robin Dapp from comment #18)
[...]
> Regarding the mentioned element-wise costing how should we proceed here?
> I'm going to remove the hunk in question, run SPEC2017 on x86 and post a
> patc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #18 from Robin Dapp ---
A bit of a follow-up: I'm working on a patch for reassociation that can handle
the mentioned cases and some more but it will still require a bit of time to
get everything regression free and correct. What it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #17 from rguenther at suse dot de ---
On Wed, 7 Feb 2024, juzhe.zhong at rivai dot ai wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
>
> --- Comment #16 from JuzheZhong ---
> The FMA is generated in widening_mul PASS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #14 from rguenther at suse dot de ---
On Wed, 7 Feb 2024, juzhe.zhong at rivai dot ai wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
>
> --- Comment #13 from JuzheZhong ---
> Ok. I found the optimized tree:
>
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #16 from JuzheZhong ---
The FMA is generated in widening_mul PASS:
Before widening_mul (fab1):
_5 = 3.33314829616256247390992939472198486328125e-1 - _4;
_6 = _5 * 1.22998223643160599749535322189331054687
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #15 from JuzheZhong ---
(In reply to rguent...@suse.de from comment #14)
> On Wed, 7 Feb 2024, juzhe.zhong at rivai dot ai wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
> >
> > --- Comment #13 from JuzheZhong --
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #13 from JuzheZhong ---
Ok. I found the optimized tree:
_5 = 3.33314829616256247390992939472198486328125e-1 - _4;
_8 = .FMA (_5, 1.229982236431605997495353221893310546875e-1, _4);
Let CST0 = 3.3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #12 from JuzheZhong ---
Ok. I found it even without vectorization:
GCC is worse than Clang:
https://godbolt.org/z/addr54Gc6
GCC (14 instructions inside the loop):
fld fa3,0(a0)
fld fa5,8(a0)
fld
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #11 from JuzheZhong ---
Hi, I think this RVV compiler codegen is that optimal codegen we want for RVV:
https://repo.hca.bsc.es/epic/z/P6QXCc
.LBB0_5:# %vector.body
sub a4, t0, a3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #10 from rguenther at suse dot de ---
On Fri, 26 Jan 2024, rdapp at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
>
> --- Comment #9 from Robin Dapp ---
> (In reply to rguent...@suse.de from comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #9 from Robin Dapp ---
(In reply to rguent...@suse.de from comment #6)
> t.c:47:21: missed: the size of the group of accesses is not a power of 2
> or not equal to 3
> t.c:47:21: missed: not falling back to elementwise accesses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #8 from Richard Biener ---
(In reply to JuzheZhong from comment #7)
>
> But I wonder if we see it is beneficial on some boards, could you teach us
> how we can enable vectorization for such case according to uarchs ?
If you figure h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #7 from JuzheZhong ---
(In reply to rguent...@suse.de from comment #6)
> On Thu, 25 Jan 2024, juzhe.zhong at rivai dot ai wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
> >
> > --- Comment #5 from JuzheZhong ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #6 from rguenther at suse dot de ---
On Thu, 25 Jan 2024, juzhe.zhong at rivai dot ai wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
>
> --- Comment #5 from JuzheZhong ---
> Both ICC and Clang X86 can vectorize SPEC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
Andrew Pinski changed:
What|Removed |Added
Severity|normal |enhancement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #5 from JuzheZhong ---
Both ICC and Clang X86 can vectorize SPEC 2017 lbm:
https://godbolt.org/z/MjbTbYf1G
But I am not sure X86 ICC is better or X86 Clang is better.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #4 from JuzheZhong ---
OK. Confirm on X86 GCC failed to vectorize it, wheras Clang X86 can vectorize
it.
https://godbolt.org/z/EaTjGbPGW
X86 Clang and RISC-V Clang IR are same:
%12 = tail call <8 x double> @llvm.masked.gather.v8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #3 from JuzheZhong ---
Ok I see.
If we change NN into 8, then we can vectorize it with load_lanes/store_lanes
with group size = 8:
https://godbolt.org/z/doe9c3hfo
We will use vlseg8e64 which is RVVM1DF[8] == RVVM1x8DFmode.
Here t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #2 from Robin Dapp ---
> It's interesting, for Clang only RISC-V can vectorize it.
The full loop can be vectorized on clang x86 as well when I remove the first
conditional (which is not in the snippet I posted above). So that's lik
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #1 from JuzheZhong ---
It's interesting, for Clang only RISC-V can vectorize it.
I think there are 2 topics:
1. Support vectorization of this codes of in loop vectorizer.
2. Transform gather/scatter into strided load/store for RISC
21 matches
Mail list logo